Spaces:

oopere
/

optipfair-bias-analyzer

Running

App Files Files Community

optipfair-bias-analyzer / DIAGNOSTIC_README.md

oopere

feat: add diagnostic tools and configuration for timeout and memory issues in HF Spaces

b1f0789 22 days ago

preview code

raw

history blame contribute delete

5.14 kB

A newer version of the Gradio SDK is available: 6.1.0

Upgrade

🔍 Diagnostic Guide: Timeout vs Memory

How to identify the problem?

1️⃣ Run the diagnostic tool

In your HF Space, execute:

python hf-spaces/diagnostic_tool.py

This tool will tell you exactly if the problem is:

❌ MEMORY_ERROR: The system ran out of RAM
⏰ TIMEOUT_ERROR: The operation took too long
❓ OTHER_ERROR: Another type of problem

2️⃣ Interpret the results

If you see "MEMORY_ERROR":

❌ PROBLEM DETECTED: OUT OF MEMORY
Memory used at failure: 15.8 GB (98.5%)

Cause: The model is too large for the available memory in HF Spaces.

Solutions:

Use smaller models (1B-1.7B parameters)
Upgrade to HF Spaces PRO (more RAM available)
Use int8 quantization (reduces memory usage ~50%)
Load models with low_cpu_mem_usage=True

If you see "TIMEOUT_ERROR":

⏰ TIMEOUT ERROR after 298.5s
Memory used: 8.2 GB (51.2%)

Cause: The model takes too long to load, but there is available memory.

Solutions:

Increase timeout from 300s to 600s or 900s
Cache pre-loaded models at startup
Use faster models

🛠️ Implemented Solutions

Solution 1: Increase Timeout (Easy)

Edit hf-spaces/optipfair_frontend.py:

# Change from:
response = requests.post(url, json=payload, timeout=300)

# To:
response = requests.post(url, json=payload, timeout=600)  # 10 minutes

Solution 2: Use Quantization (For memory issues)

Edit model loading code in the backend:

from transformers import AutoModel, BitsAndBytesConfig

# Configure int8 quantization (reduces memory usage ~50%)
quantization_config = BitsAndBytesConfig(
    load_in_8bit=True,
    llm_int8_threshold=6.0,
)

model = AutoModel.from_pretrained(
    model_name,
    quantization_config=quantization_config,
    device_map="auto",
    low_cpu_mem_usage=True,
)

Solution 3: Model Cache (For timeout)

Pre-load models at startup in hf-spaces/app.py:

from transformers import AutoModel, AutoTokenizer
import logging

logger = logging.getLogger(__name__)

# Global model cache
MODEL_CACHE = {}

def preload_models():
    """Pre-load common models at startup"""
    common_models = [
        "meta-llama/Llama-3.2-1B",
        "oopere/pruned40-llama-3.2-1B",
    ]
    
    logger.info("🔄 Pre-loading common models...")
    for model_name in common_models:
        try:
            logger.info(f"  Loading {model_name}...")
            MODEL_CACHE[model_name] = {
                "model": AutoModel.from_pretrained(model_name, low_cpu_mem_usage=True),
                "tokenizer": AutoTokenizer.from_pretrained(model_name)
            }
            logger.info(f"  ✓ {model_name} loaded")
        except Exception as e:
            logger.warning(f"  ✗ Could not pre-load {model_name}: {e}")
    
    logger.info("✅ Pre-loading complete")

def main():
    # Pre-load models before starting services
    preload_models()
    
    # Rest of the code...
    fastapi_thread = threading.Thread(target=run_fastapi, daemon=True)
    fastapi_thread.start()
    # ...

Solution 4: Improved Error Messages

Better error messages are already included to help you identify the problem:

except requests.exceptions.Timeout:
    return (
        None,
        "❌ **Timeout Error:**\nThe model took too long to load (>5min). "
        "This is normal with large models. Options:\n"
        "1. Try with a smaller model\n"
        "2. Wait and try again (model may be caching)\n"
        "3. Contact admin to increase timeout",
        ""
    )

except MemoryError:
    return (
        None,
        "❌ **Memory Error:**\nNot enough RAM for this model. Options:\n"
        "1. Use a smaller model (1B parameters)\n"
        "2. Model requires more memory than available in HF Spaces",
        ""
    )

📊 Model Size Comparison

Model	Parameters	RAM Needed*	Load Time**
Llama-3.2-1B	1B	~4 GB	~30s
Llama-3.2-3B	3B	~12 GB	~90s
Llama-3-8B	8B	~32 GB	~240s
Llama-3-70B	70B	~280 GB	~600s+

*Without quantization, FP32 **On typical HF Spaces hardware

🎯 Recommended Action Plan

Run the diagnostic:
```
python hf-spaces/diagnostic_tool.py
```
Read the results and follow the specific recommendations
Apply the appropriate solution:
- If timeout → Increase timeout or use cache
- If memory → Use small models or quantization
Test again with the adjusted configuration

📝 Useful Logs in HF Spaces

Check the logs in HF Spaces for messages like:

🔍 MODEL LOADING DIAGNOSTIC: meta-llama/Llama-3.2-1B
📊 INITIAL SYSTEM STATE:
  - Available memory: 12.50 GB
  - Used memory: 3.45 GB (21.6%)
⏳ Starting model loading (timeout: 300s)...
  [1/2] Loading tokenizer...
  ✓ Tokenizer loaded in 2.31s
  - Memory used: 3.48 GB (21.8%)
  [2/2] Loading model...
  ✓ Model loaded in 45.67s
✅ LOADING SUCCESSFUL in 47.98s

This tells you exactly how much memory and time each step uses.