optipfair-bias-analyzer / DIAGNOSTIC_README.md
oopere's picture
feat: add diagnostic tools and configuration for timeout and memory issues in HF Spaces
b1f0789

A newer version of the Gradio SDK is available: 6.1.0

Upgrade

πŸ” Diagnostic Guide: Timeout vs Memory

How to identify the problem?

1️⃣ Run the diagnostic tool

In your HF Space, execute:

python hf-spaces/diagnostic_tool.py

This tool will tell you exactly if the problem is:

  • ❌ MEMORY_ERROR: The system ran out of RAM
  • ⏰ TIMEOUT_ERROR: The operation took too long
  • ❓ OTHER_ERROR: Another type of problem

2️⃣ Interpret the results

If you see "MEMORY_ERROR":

❌ PROBLEM DETECTED: OUT OF MEMORY
Memory used at failure: 15.8 GB (98.5%)

Cause: The model is too large for the available memory in HF Spaces.

Solutions:

  1. Use smaller models (1B-1.7B parameters)
  2. Upgrade to HF Spaces PRO (more RAM available)
  3. Use int8 quantization (reduces memory usage ~50%)
  4. Load models with low_cpu_mem_usage=True

If you see "TIMEOUT_ERROR":

⏰ TIMEOUT ERROR after 298.5s
Memory used: 8.2 GB (51.2%)

Cause: The model takes too long to load, but there is available memory.

Solutions:

  1. Increase timeout from 300s to 600s or 900s
  2. Cache pre-loaded models at startup
  3. Use faster models

πŸ› οΈ Implemented Solutions

Solution 1: Increase Timeout (Easy)

Edit hf-spaces/optipfair_frontend.py:

# Change from:
response = requests.post(url, json=payload, timeout=300)

# To:
response = requests.post(url, json=payload, timeout=600)  # 10 minutes

Solution 2: Use Quantization (For memory issues)

Edit model loading code in the backend:

from transformers import AutoModel, BitsAndBytesConfig

# Configure int8 quantization (reduces memory usage ~50%)
quantization_config = BitsAndBytesConfig(
    load_in_8bit=True,
    llm_int8_threshold=6.0,
)

model = AutoModel.from_pretrained(
    model_name,
    quantization_config=quantization_config,
    device_map="auto",
    low_cpu_mem_usage=True,
)

Solution 3: Model Cache (For timeout)

Pre-load models at startup in hf-spaces/app.py:

from transformers import AutoModel, AutoTokenizer
import logging

logger = logging.getLogger(__name__)

# Global model cache
MODEL_CACHE = {}

def preload_models():
    """Pre-load common models at startup"""
    common_models = [
        "meta-llama/Llama-3.2-1B",
        "oopere/pruned40-llama-3.2-1B",
    ]
    
    logger.info("πŸ”„ Pre-loading common models...")
    for model_name in common_models:
        try:
            logger.info(f"  Loading {model_name}...")
            MODEL_CACHE[model_name] = {
                "model": AutoModel.from_pretrained(model_name, low_cpu_mem_usage=True),
                "tokenizer": AutoTokenizer.from_pretrained(model_name)
            }
            logger.info(f"  βœ“ {model_name} loaded")
        except Exception as e:
            logger.warning(f"  βœ— Could not pre-load {model_name}: {e}")
    
    logger.info("βœ… Pre-loading complete")

def main():
    # Pre-load models before starting services
    preload_models()
    
    # Rest of the code...
    fastapi_thread = threading.Thread(target=run_fastapi, daemon=True)
    fastapi_thread.start()
    # ...

Solution 4: Improved Error Messages

Better error messages are already included to help you identify the problem:

except requests.exceptions.Timeout:
    return (
        None,
        "❌ **Timeout Error:**\nThe model took too long to load (>5min). "
        "This is normal with large models. Options:\n"
        "1. Try with a smaller model\n"
        "2. Wait and try again (model may be caching)\n"
        "3. Contact admin to increase timeout",
        ""
    )

except MemoryError:
    return (
        None,
        "❌ **Memory Error:**\nNot enough RAM for this model. Options:\n"
        "1. Use a smaller model (1B parameters)\n"
        "2. Model requires more memory than available in HF Spaces",
        ""
    )

πŸ“Š Model Size Comparison

Model Parameters RAM Needed* Load Time**
Llama-3.2-1B 1B ~4 GB ~30s
Llama-3.2-3B 3B ~12 GB ~90s
Llama-3-8B 8B ~32 GB ~240s
Llama-3-70B 70B ~280 GB ~600s+

*Without quantization, FP32 **On typical HF Spaces hardware

🎯 Recommended Action Plan

  1. Run the diagnostic:

    python hf-spaces/diagnostic_tool.py
    
  2. Read the results and follow the specific recommendations

  3. Apply the appropriate solution:

    • If timeout β†’ Increase timeout or use cache
    • If memory β†’ Use small models or quantization
  4. Test again with the adjusted configuration

πŸ“ Useful Logs in HF Spaces

Check the logs in HF Spaces for messages like:

πŸ” MODEL LOADING DIAGNOSTIC: meta-llama/Llama-3.2-1B
πŸ“Š INITIAL SYSTEM STATE:
  - Available memory: 12.50 GB
  - Used memory: 3.45 GB (21.6%)
⏳ Starting model loading (timeout: 300s)...
  [1/2] Loading tokenizer...
  βœ“ Tokenizer loaded in 2.31s
  - Memory used: 3.48 GB (21.8%)
  [2/2] Loading model...
  βœ“ Model loaded in 45.67s
βœ… LOADING SUCCESSFUL in 47.98s

This tells you exactly how much memory and time each step uses.