File size: 4,935 Bytes
a2d424a ac5ebc8 a2d424a ac5ebc8 a2d424a ac5ebc8 a2d424a ac5ebc8 a2d424a ac5ebc8 a2d424a ac5ebc8 a2d424a ac5ebc8 a2d424a ac5ebc8 a2d424a ac5ebc8 a2d424a ac5ebc8 a2d424a ac5ebc8 a2d424a ac5ebc8 a2d424a ac5ebc8 a2d424a ac5ebc8 a2d424a ac5ebc8 a2d424a ac5ebc8 a2d424a ac5ebc8 a2d424a ac5ebc8 a2d424a ac5ebc8 a2d424a ac5ebc8 a2d424a ac5ebc8 a2d424a ac5ebc8 a2d424a ac5ebc8 a2d424a ac5ebc8 a2d424a ac5ebc8 a2d424a ac5ebc8 a2d424a |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 |
# AI Chat Application with Qwen Coder
This is a fully functional AI chat application built for HuggingFace Spaces, integrating the Qwen/Qwen3-Coder-30B-A3B-Instruct model with advanced OPENAI API compatibility features.
## Features
- **Qwen Coder 3 Integration**: Direct integration with the Qwen/Qwen3-Coder-30B-A3B-Instruct model
- **OPENAI API Compatibility**: Implements OPENAI API endpoints for seamless integration
- **Streaming Responses**: Real-time response streaming for interactive chat experience
- **Conversation History**: Persistent conversation history management
- **Modern UI**: Responsive design inspired by Perplexity AI with TailwindCSS
- **Dark/Light Mode**: Support for both dark and light themes
- **Copy Responses**: One-click copying of AI responses
- **Typing Indicators**: Visual indicators for AI response generation
- **GPU Optimization**: Full GPU optimization for maximum performance
- **Error Handling**: Robust error handling with automatic connection recovery
- **Caching**: Efficient caching mechanisms for improved performance
## Project Structure
```
/
βββ app.py # Main application entry point
βββ requirements.txt # Python dependencies
βββ README.md # This file
βββ public/ # Frontend static files
β βββ index.html # Main HTML file
β βββ styles.css # TailwindCSS styles
β βββ app.js # JavaScript logic
βββ utils/ # Utility modules
βββ model_utils.py # Model management utilities
βββ conversation.py # Conversation management
βββ api_compat.py # OPENAI API compatibility
```
## Requirements
- Python 3.8+
- GPU with CUDA support (recommended)
- 32GB+ RAM (for optimal performance with Qwen Coder 3)
## Installation
1. Clone this repository:
```bash
git clone <repository-url>
cd <repository-name>
```
2. Install dependencies:
```bash
pip install -r requirements.txt
```
3. Run the application:
```bash
python app.py
```
## Deployment to HuggingFace Spaces
1. Create a new Space on HuggingFace:
- Go to https://huggingface.co/new-space
- Choose "Gradio" as the Space SDK
- Select a GPU hardware (recommended for Qwen Coder 3)
2. Upload files to your Space repository:
- Upload all files from this repository
- Make sure to include the `requirements.txt` file
3. Configure the Space:
- The Space will automatically detect and install dependencies from `requirements.txt`
- The application will start automatically on port 7860
4. Access your deployed application:
- Once the build is complete, your application will be available at the provided URL
## API Endpoints
### OPENAI API Compatible Endpoint
```
POST /v1/chat/completions
```
Request format:
```json
{
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello!"}
],
"model": "Qwen/Qwen3-Coder-30B-A3B-Instruct",
"max_tokens": 1024,
"temperature": 0.7
}
```
### Frontend Chat Endpoint
```
POST /chat
```
Request format:
```json
{
"message": "Hello!",
"history": [
{"role": "user", "content": "Previous message"},
{"role": "assistant", "content": "Previous response"}
]
}
```
## Customization
### Model Configuration
You can customize the model behavior by modifying the parameters in `utils/model_utils.py`:
- `DEFAULT_MAX_TOKENS`: Maximum tokens to generate
- `DEFAULT_TEMPERATURE`: Sampling temperature
### UI Customization
The UI can be customized by modifying:
- `public/styles.css`: CSS styles with TailwindCSS
- `public/app.js`: JavaScript logic
- `public/index.html`: HTML structure
## Troubleshooting
### Common Issues
1. **Model Loading Errors**:
- Ensure you have sufficient RAM and GPU memory
- Check that the model name is correct in `utils/model_utils.py`
2. **CUDA Out of Memory**:
- Reduce `DEFAULT_MAX_TOKENS` in `utils/model_utils.py`
- Use a smaller model variant if available
3. **Dependency Installation Failures**:
- Check the HuggingFace Space logs for specific error messages
- Ensure all dependencies are listed in `requirements.txt`
### Performance Optimization
1. **GPU Usage**:
- The application automatically detects and uses CUDA if available
- For CPU-only environments, performance will be significantly slower
2. **Caching**:
- Redis is used for caching if available
- In-memory storage is used as fallback
## Contributing
1. Fork the repository
2. Create a feature branch
3. Commit your changes
4. Push to the branch
5. Create a Pull Request
## License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
## Acknowledgments
- Qwen team for the Qwen/Qwen3-Coder-30B-A3B-Instruct model
- HuggingFace for providing the platform
- Gradio team for the web interface framework |