File size: 4,935 Bytes
a2d424a
ac5ebc8
a2d424a
ac5ebc8
 
 
a2d424a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ac5ebc8
 
 
 
a2d424a
 
ac5ebc8
 
 
 
 
 
a2d424a
ac5ebc8
 
 
 
 
 
 
 
 
 
 
 
a2d424a
ac5ebc8
a2d424a
 
 
 
ac5ebc8
a2d424a
 
 
ac5ebc8
a2d424a
 
 
ac5ebc8
a2d424a
 
ac5ebc8
a2d424a
ac5ebc8
a2d424a
 
 
 
ac5ebc8
a2d424a
ac5ebc8
 
 
a2d424a
 
ac5ebc8
 
 
 
 
 
 
a2d424a
 
 
 
ac5ebc8
a2d424a
 
 
 
 
 
 
 
 
 
ac5ebc8
a2d424a
ac5ebc8
a2d424a
 
 
 
ac5ebc8
a2d424a
 
 
 
 
ac5ebc8
 
 
a2d424a
 
 
 
 
ac5ebc8
a2d424a
 
 
ac5ebc8
a2d424a
 
 
ac5ebc8
a2d424a
ac5ebc8
a2d424a
 
 
ac5ebc8
a2d424a
 
 
ac5ebc8
 
 
a2d424a
 
 
 
 
ac5ebc8
 
 
a2d424a
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
# AI Chat Application with Qwen Coder

This is a fully functional AI chat application built for HuggingFace Spaces, integrating the Qwen/Qwen3-Coder-30B-A3B-Instruct model with advanced OPENAI API compatibility features.

## Features

- **Qwen Coder 3 Integration**: Direct integration with the Qwen/Qwen3-Coder-30B-A3B-Instruct model
- **OPENAI API Compatibility**: Implements OPENAI API endpoints for seamless integration
- **Streaming Responses**: Real-time response streaming for interactive chat experience
- **Conversation History**: Persistent conversation history management
- **Modern UI**: Responsive design inspired by Perplexity AI with TailwindCSS
- **Dark/Light Mode**: Support for both dark and light themes
- **Copy Responses**: One-click copying of AI responses
- **Typing Indicators**: Visual indicators for AI response generation
- **GPU Optimization**: Full GPU optimization for maximum performance
- **Error Handling**: Robust error handling with automatic connection recovery
- **Caching**: Efficient caching mechanisms for improved performance

## Project Structure

```
/
β”œβ”€β”€ app.py                 # Main application entry point
β”œβ”€β”€ requirements.txt       # Python dependencies
β”œβ”€β”€ README.md             # This file
β”œβ”€β”€ public/               # Frontend static files
β”‚   β”œβ”€β”€ index.html        # Main HTML file
β”‚   β”œβ”€β”€ styles.css        # TailwindCSS styles
β”‚   └── app.js            # JavaScript logic
└── utils/                # Utility modules
    β”œβ”€β”€ model_utils.py    # Model management utilities
    β”œβ”€β”€ conversation.py   # Conversation management
    └── api_compat.py     # OPENAI API compatibility
```

## Requirements

- Python 3.8+
- GPU with CUDA support (recommended)
- 32GB+ RAM (for optimal performance with Qwen Coder 3)

## Installation

1. Clone this repository:
   ```bash
   git clone <repository-url>
   cd <repository-name>
   ```

2. Install dependencies:
   ```bash
   pip install -r requirements.txt
   ```

3. Run the application:
   ```bash
   python app.py
   ```

## Deployment to HuggingFace Spaces

1. Create a new Space on HuggingFace:
   - Go to https://huggingface.co/new-space
   - Choose "Gradio" as the Space SDK
   - Select a GPU hardware (recommended for Qwen Coder 3)

2. Upload files to your Space repository:
   - Upload all files from this repository
   - Make sure to include the `requirements.txt` file

3. Configure the Space:
   - The Space will automatically detect and install dependencies from `requirements.txt`
   - The application will start automatically on port 7860

4. Access your deployed application:
   - Once the build is complete, your application will be available at the provided URL

## API Endpoints

### OPENAI API Compatible Endpoint
```
POST /v1/chat/completions
```

Request format:
```json
{
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Hello!"}
  ],
  "model": "Qwen/Qwen3-Coder-30B-A3B-Instruct",
  "max_tokens": 1024,
  "temperature": 0.7
}
```

### Frontend Chat Endpoint
```
POST /chat
```

Request format:
```json
{
  "message": "Hello!",
  "history": [
    {"role": "user", "content": "Previous message"},
    {"role": "assistant", "content": "Previous response"}
  ]
}
```

## Customization

### Model Configuration
You can customize the model behavior by modifying the parameters in `utils/model_utils.py`:
- `DEFAULT_MAX_TOKENS`: Maximum tokens to generate
- `DEFAULT_TEMPERATURE`: Sampling temperature

### UI Customization
The UI can be customized by modifying:
- `public/styles.css`: CSS styles with TailwindCSS
- `public/app.js`: JavaScript logic
- `public/index.html`: HTML structure

## Troubleshooting

### Common Issues

1. **Model Loading Errors**:
   - Ensure you have sufficient RAM and GPU memory
   - Check that the model name is correct in `utils/model_utils.py`

2. **CUDA Out of Memory**:
   - Reduce `DEFAULT_MAX_TOKENS` in `utils/model_utils.py`
   - Use a smaller model variant if available

3. **Dependency Installation Failures**:
   - Check the HuggingFace Space logs for specific error messages
   - Ensure all dependencies are listed in `requirements.txt`

### Performance Optimization

1. **GPU Usage**:
   - The application automatically detects and uses CUDA if available
   - For CPU-only environments, performance will be significantly slower

2. **Caching**:
   - Redis is used for caching if available
   - In-memory storage is used as fallback

## Contributing

1. Fork the repository
2. Create a feature branch
3. Commit your changes
4. Push to the branch
5. Create a Pull Request

## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## Acknowledgments

- Qwen team for the Qwen/Qwen3-Coder-30B-A3B-Instruct model
- HuggingFace for providing the platform
- Gradio team for the web interface framework