| # AI Chat Application with Qwen Coder | |
| This is a fully functional AI chat application built for HuggingFace Spaces, integrating the Qwen/Qwen3-Coder-30B-A3B-Instruct model with advanced OPENAI API compatibility features. | |
| ## Features | |
| - **Qwen Coder 3 Integration**: Direct integration with the Qwen/Qwen3-Coder-30B-A3B-Instruct model | |
| - **OPENAI API Compatibility**: Implements OPENAI API endpoints for seamless integration | |
| - **Streaming Responses**: Real-time response streaming for interactive chat experience | |
| - **Conversation History**: Persistent conversation history management | |
| - **Modern UI**: Responsive design inspired by Perplexity AI with TailwindCSS | |
| - **Dark/Light Mode**: Support for both dark and light themes | |
| - **Copy Responses**: One-click copying of AI responses | |
| - **Typing Indicators**: Visual indicators for AI response generation | |
| - **GPU Optimization**: Full GPU optimization for maximum performance | |
| - **Error Handling**: Robust error handling with automatic connection recovery | |
| - **Caching**: Efficient caching mechanisms for improved performance | |
| ## Project Structure | |
| ``` | |
| / | |
| βββ app.py # Main application entry point | |
| βββ requirements.txt # Python dependencies | |
| βββ README.md # This file | |
| βββ public/ # Frontend static files | |
| β βββ index.html # Main HTML file | |
| β βββ styles.css # TailwindCSS styles | |
| β βββ app.js # JavaScript logic | |
| βββ utils/ # Utility modules | |
| βββ model_utils.py # Model management utilities | |
| βββ conversation.py # Conversation management | |
| βββ api_compat.py # OPENAI API compatibility | |
| ``` | |
| ## Requirements | |
| - Python 3.8+ | |
| - GPU with CUDA support (recommended) | |
| - 32GB+ RAM (for optimal performance with Qwen Coder 3) | |
| ## Installation | |
| 1. Clone this repository: | |
| ```bash | |
| git clone <repository-url> | |
| cd <repository-name> | |
| ``` | |
| 2. Install dependencies: | |
| ```bash | |
| pip install -r requirements.txt | |
| ``` | |
| 3. Run the application: | |
| ```bash | |
| python app.py | |
| ``` | |
| ## Deployment to HuggingFace Spaces | |
| 1. Create a new Space on HuggingFace: | |
| - Go to https://huggingface.co/new-space | |
| - Choose "Gradio" as the Space SDK | |
| - Select a GPU hardware (recommended for Qwen Coder 3) | |
| 2. Upload files to your Space repository: | |
| - Upload all files from this repository | |
| - Make sure to include the `requirements.txt` file | |
| 3. Configure the Space: | |
| - The Space will automatically detect and install dependencies from `requirements.txt` | |
| - The application will start automatically on port 7860 | |
| 4. Access your deployed application: | |
| - Once the build is complete, your application will be available at the provided URL | |
| ## API Endpoints | |
| ### OPENAI API Compatible Endpoint | |
| ``` | |
| POST /v1/chat/completions | |
| ``` | |
| Request format: | |
| ```json | |
| { | |
| "messages": [ | |
| {"role": "system", "content": "You are a helpful assistant."}, | |
| {"role": "user", "content": "Hello!"} | |
| ], | |
| "model": "Qwen/Qwen3-Coder-30B-A3B-Instruct", | |
| "max_tokens": 1024, | |
| "temperature": 0.7 | |
| } | |
| ``` | |
| ### Frontend Chat Endpoint | |
| ``` | |
| POST /chat | |
| ``` | |
| Request format: | |
| ```json | |
| { | |
| "message": "Hello!", | |
| "history": [ | |
| {"role": "user", "content": "Previous message"}, | |
| {"role": "assistant", "content": "Previous response"} | |
| ] | |
| } | |
| ``` | |
| ## Customization | |
| ### Model Configuration | |
| You can customize the model behavior by modifying the parameters in `utils/model_utils.py`: | |
| - `DEFAULT_MAX_TOKENS`: Maximum tokens to generate | |
| - `DEFAULT_TEMPERATURE`: Sampling temperature | |
| ### UI Customization | |
| The UI can be customized by modifying: | |
| - `public/styles.css`: CSS styles with TailwindCSS | |
| - `public/app.js`: JavaScript logic | |
| - `public/index.html`: HTML structure | |
| ## Troubleshooting | |
| ### Common Issues | |
| 1. **Model Loading Errors**: | |
| - Ensure you have sufficient RAM and GPU memory | |
| - Check that the model name is correct in `utils/model_utils.py` | |
| 2. **CUDA Out of Memory**: | |
| - Reduce `DEFAULT_MAX_TOKENS` in `utils/model_utils.py` | |
| - Use a smaller model variant if available | |
| 3. **Dependency Installation Failures**: | |
| - Check the HuggingFace Space logs for specific error messages | |
| - Ensure all dependencies are listed in `requirements.txt` | |
| ### Performance Optimization | |
| 1. **GPU Usage**: | |
| - The application automatically detects and uses CUDA if available | |
| - For CPU-only environments, performance will be significantly slower | |
| 2. **Caching**: | |
| - Redis is used for caching if available | |
| - In-memory storage is used as fallback | |
| ## Contributing | |
| 1. Fork the repository | |
| 2. Create a feature branch | |
| 3. Commit your changes | |
| 4. Push to the branch | |
| 5. Create a Pull Request | |
| ## License | |
| This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details. | |
| ## Acknowledgments | |
| - Qwen team for the Qwen/Qwen3-Coder-30B-A3B-Instruct model | |
| - HuggingFace for providing the platform | |
| - Gradio team for the web interface framework |