Spaces:
Running
on
Zero
Running
on
Zero
| title: AI Building Blocks | |
| emoji: π | |
| colorFrom: purple | |
| colorTo: blue | |
| sdk: gradio | |
| sdk_version: 5.49.1 | |
| app_file: app.py | |
| pinned: false | |
| license: wtfpl | |
| short_description: A gallery of building blocks for building AI applications | |
| # AI Building Blocks | |
| A gallery of AI building blocks for building AI applications, featuring a Gradio web interface with multiple tabs for different AI tasks. | |
| ## Features | |
| This application provides the following AI building blocks: | |
| - **Text-to-image Generation**: Generate images from text prompts using Hugging Face Inference API | |
| - **Image-to-text (Image Captioning)**: Generate text descriptions of images using BLIP models | |
| - **Image Classification**: Classify recyclable items using Trash-Net model | |
| - **Text-to-speech (TTS)**: Convert text to speech audio | |
| - **Automatic Speech Recognition (ASR)**: Transcribe audio to text using Whisper models | |
| - **Chatbot**: Have conversations with AI chatbots supporting both modern chat models and seq2seq models | |
| ### Architecture: Local Models vs. Inference API | |
| This application uses a hybrid approach: | |
| - **Text-to-image Generation** and **Automatic Speech Recognition (ASR)** use the **Hugging Face Inference API** (via `InferenceClient`) instead of loading models locally. This is because: | |
| - Text-to-image models (like FLUX.1-dev) are extremely large and memory-intensive | |
| - ASR models (like Whisper-large-v3) are also large and can cause timeouts in constrained environments | |
| - Loading them locally can cause timeouts or out-of-memory errors, especially in constrained environments like Hugging Face Spaces with Zero GPU | |
| - Using the Inference API offloads the model loading and inference to Hugging Face's infrastructure, ensuring reliable operation | |
| - **All other tasks** (image classification, translation, image-to-text, text-to-speech, chatbot) load models **locally** to take advantage of Hugging Face Zero GPU for cost-effective hosting. These models are smaller and can be loaded efficiently within memory constraints. | |
| ## Prerequisites | |
| - Python 3.8 or higher | |
| - PyTorch with hardware acceleration (strongly recommended - see [PyTorch Installation](#pytorch-installation)) | |
| - CUDA-capable GPU (optional, but recommended for better performance) | |
| ## Installation | |
| 1. Clone this repository: | |
| ```bash | |
| git clone <repository-url> | |
| cd ai-building-blocks | |
| ``` | |
| 2. Create a virtual environment: | |
| ```bash | |
| python -m venv .venv | |
| source .venv/bin/activate # On Windows: .venv\Scripts\activate | |
| ``` | |
| 3. Install system dependencies (required for text-to-speech): | |
| ```bash | |
| # On Ubuntu/Debian: | |
| sudo apt-get update && sudo apt-get install -y espeak-ng | |
| # On macOS: | |
| brew install espeak-ng | |
| # On Fedora/RHEL: | |
| sudo dnf install espeak-ng | |
| ``` | |
| 4. Install PyTorch with CUDA support (see [PyTorch Installation](#pytorch-installation) below). | |
| 5. Install the remaining dependencies: | |
| ```bash | |
| pip install -r requirements.txt | |
| ``` | |
| ## PyTorch Installation | |
| PyTorch is not included in `requirements.txt` because installation varies based on your hardware and operating system. **It is strongly recommended to install PyTorch with hardware acceleration support** for optimal performance. | |
| For official installation instructions with CUDA support, please visit: | |
| - **Official PyTorch Installation Guide**: https://pytorch.org/get-started/locally/ | |
| Select your platform, package manager, Python version, and CUDA version to get the appropriate installation command. For example: | |
| - **CUDA 12.1** (recommended for modern NVIDIA GPUs): | |
| ```bash | |
| pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121 | |
| ``` | |
| - **CUDA 11.8**: | |
| ```bash | |
| pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118 | |
| ``` | |
| - **CPU only** (not recommended for production): | |
| ```bash | |
| pip install torch torchvision torchaudio | |
| ``` | |
| ## Configuration | |
| Create a `.env` file in the project root directory with the following environment variables: | |
| ### Required Environment Variables | |
| ```env | |
| # Hugging Face API Token (required for gated models and Inference API access) | |
| # Get your token from: https://huggingface.co/settings/tokens | |
| # Required fine-grained permissions: | |
| # 1. "Make calls to Inference Providers" | |
| # 2. "Read access to contents of all public gated repos you can access" | |
| HF_TOKEN=your_huggingface_token_here | |
| # Model IDs for each building block | |
| TEXT_TO_IMAGE_MODEL=model_id_for_text_to_image | |
| IMAGE_TO_TEXT_MODEL=model_id_for_image_captioning | |
| IMAGE_CLASSIFICATION_MODEL=model_id_for_image_classification | |
| TEXT_TO_SPEECH_MODEL=model_id_for_text_to_speech | |
| AUDIO_TRANSCRIPTION_MODEL=model_id_for_speech_recognition | |
| CHAT_MODEL=model_id_for_chatbot | |
| ``` | |
| ### Optional Environment Variables | |
| ```env | |
| # Request timeout in seconds (default: 45) | |
| REQUEST_TIMEOUT=45 | |
| # Enable reduced memory usage by using lower precision (float16) for all models (default: False). | |
| # Set to "True" to reduce GPU memory usage at the cost of slightly lower precision. | |
| # Sometimes this is still not enoughβin which case you must choose another model that will fit in memory. | |
| REDUCED_MEMORY=False | |
| ``` | |
| ### Example `.env` File | |
| ```env | |
| HF_TOKEN=hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx | |
| # Example model IDs (adjust based on your needs) | |
| TEXT_TO_IMAGE_MODEL=black-forest-labs/FLUX.1-dev | |
| IMAGE_CLASSIFICATION_MODEL=prithivMLmods/Trash-Net | |
| IMAGE_TO_TEXT_MODEL=Salesforce/blip-image-captioning-large | |
| TEXT_TO_SPEECH_MODEL=kakao-enterprise/vits-ljs | |
| AUDIO_TRANSCRIPTION_MODEL=openai/whisper-large-v3 | |
| CHAT_MODEL=Qwen/Qwen2.5-1.5B-Instruct | |
| REQUEST_TIMEOUT=45 | |
| ``` | |
| **Note**: `.env` should already be included in the `.gitignore` file. Make sure to never `git add --force --` it to prevent committing sensitive tokens. | |
| ## Running the Application | |
| 1. Activate your virtual environment (if not already activated): | |
| ```bash | |
| source .venv/bin/activate # On Windows: .venv\Scripts\activate | |
| ``` | |
| 2. Run the application: | |
| ```bash | |
| python app.py | |
| ``` | |
| 3. Open your web browser and navigate to the URL shown in the terminal (typically `http://127.0.0.1:7860`). | |
| 4. The Gradio interface will display multiple tabs, each corresponding to a different AI building block. | |
| ## Project Structure | |
| ``` | |
| ai-building-blocks/ | |
| βββ app.py # Main application entry point | |
| βββ text_to_image.py # Text-to-image generation module | |
| βββ image_to_text.py # Image captioning module | |
| βββ image_classification.py # Image classification module | |
| βββ text_to_speech.py # Text-to-speech module | |
| βββ automatic_speech_recognition.py # Speech recognition module | |
| βββ chatbot.py # Chatbot module | |
| βββ utils.py # Utility functions | |
| βββ requirements.txt # Python dependencies | |
| βββ packages.txt # System dependencies (for Hugging Face Spaces) | |
| βββ .env # Environment variables (create this) | |
| βββ README.md # This file | |
| ``` | |
| ## Hardware Acceleration | |
| This application is designed to leverage hardware acceleration when available: | |
| - **NVIDIA CUDA**: Automatically detected and used if available | |
| - **AMD ROCm**: Supported via CUDA compatibility | |
| - **Intel XPU**: Automatically detected if available | |
| - **Apple Silicon (MPS)**: Automatically detected and used on Apple devices | |
| - **CPU**: Falls back to CPU if no GPU acceleration is available | |
| The application automatically selects the best available device. For optimal performance, especially with local models (image-to-text, text-to-speech, chatbot), a CUDA-capable GPU is strongly recommended. This is _untested_ on other hardware. π | |
| ## Troubleshooting | |
| ### PyTorch Not Detecting GPU | |
| If PyTorch is not detecting your GPU: | |
| 1. Verify CUDA is installed: `nvidia-smi` | |
| 2. Ensure PyTorch was installed with CUDA support (see [PyTorch Installation](#pytorch-installation)) | |
| 3. Check PyTorch CUDA availability: `python -c "import torch; print(torch.cuda.is_available())"` | |
| ### Missing Environment Variables | |
| Ensure all required environment variables are set in your `.env` file. Missing variables will cause the application to fail when trying to use the corresponding feature. | |
| ### espeak Not Installed (Text-to-Speech) | |
| If you encounter a `RuntimeError: espeak not installed on your system` error: | |
| 1. Install `espeak-ng` using your system package manager (see [Installation](#installation) step 3). | |
| 2. On Hugging Face Spaces, ensure `packages.txt` exists with `espeak-ng` listed (this file is automatically used by Spaces). | |
| 3. Verify installation: `espeak --version` or `espeak-ng --version` | |
| ### Model Loading Errors | |
| If you encounter errors loading models: | |
| 1. Verify your `HF_TOKEN` is valid and has the required permissions: | |
| - "Make calls to Inference Providers" | |
| - "Read access to contents of all public gated repos you can access" | |
| Some models (like `black-forest-labs/FLUX.1-dev`) are gated and require these permissions. | |
| 2. Ensure you have accepted the terms of use for gated models on their Hugging Face model pages. | |
| 3. Check that model IDs in your `.env` file are correct. | |
| 4. Ensure you have sufficient disk space for model downloads. | |
| 5. For local models, ensure you have sufficient RAM or VRAM. | |
| ### CUDA Out of Memory Errors | |
| If you encounter `torch.OutOfMemoryError: CUDA out of memory` errors: | |
| 1. **Enable reduced memory mode**: Set `REDUCED_MEMORY=True` in your `.env` file to use lower precision (float16) for all models, which can reduce memory usage by approximately 50% at the cost of slightly lower precision. | |
| 2. **Reduce model size**: Use smaller models or quantized versions when available. | |
| 3. **Clear GPU cache**: The application automatically clears GPU memory after each inference, but you can manually clear it by restarting the application. | |
| 4. **Set environment variable**: To reduce memory fragmentation, you can set `PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True`. | |
| Add this to your shell profile (e.g., `~/.bashrc` or `~/.zshrc`) or set it before running the application. | |
| 5. **Use CPU fallback**: If GPU memory is insufficient, the application will automatically fall back to CPU (though this will be slower). | |
| 6. **Close other GPU applications**: Ensure no other applications are using the GPU simultaneously. | |