Spaces:

Semnykcz
/

Qwen3

Paused

File size: 4,935 Bytes

a2d424a
ac5ebc8
a2d424a
ac5ebc8
 
 
a2d424a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ac5ebc8
 
 
 
a2d424a
 
ac5ebc8
 
 
 
 
 
a2d424a
ac5ebc8
 
 
 
 
 
 
 
 
 
 
 
a2d424a
ac5ebc8
a2d424a
 
 
 
ac5ebc8
a2d424a
 
 
ac5ebc8
a2d424a
 
 
ac5ebc8
a2d424a
 
ac5ebc8
a2d424a
ac5ebc8
a2d424a
 
 
 
ac5ebc8
a2d424a
ac5ebc8
 
 
a2d424a
 
ac5ebc8
 
 
 
 
 
 
a2d424a
 
 
 
ac5ebc8
a2d424a
 
 
 
 
 
 
 
 
 
ac5ebc8
a2d424a
ac5ebc8
a2d424a
 
 
 
ac5ebc8
a2d424a
 
 
 
 
ac5ebc8
 
 
a2d424a
 
 
 
 
ac5ebc8
a2d424a
 
 
ac5ebc8
a2d424a
 
 
ac5ebc8
a2d424a
ac5ebc8
a2d424a
 
 
ac5ebc8
a2d424a
 
 
ac5ebc8
 
 
a2d424a
 
 
 
 
ac5ebc8
 
 
a2d424a

# AI Chat Application with Qwen Coder

This is a fully functional AI chat application built for HuggingFace Spaces, integrating the Qwen/Qwen3-Coder-30B-A3B-Instruct model with advanced OPENAI API compatibility features.

## Features

- **Qwen Coder 3 Integration**: Direct integration with the Qwen/Qwen3-Coder-30B-A3B-Instruct model
- **OPENAI API Compatibility**: Implements OPENAI API endpoints for seamless integration
- **Streaming Responses**: Real-time response streaming for interactive chat experience
- **Conversation History**: Persistent conversation history management
- **Modern UI**: Responsive design inspired by Perplexity AI with TailwindCSS
- **Dark/Light Mode**: Support for both dark and light themes
- **Copy Responses**: One-click copying of AI responses
- **Typing Indicators**: Visual indicators for AI response generation
- **GPU Optimization**: Full GPU optimization for maximum performance
- **Error Handling**: Robust error handling with automatic connection recovery
- **Caching**: Efficient caching mechanisms for improved performance

## Project Structure

```
/
├── app.py                 # Main application entry point
├── requirements.txt       # Python dependencies
├── README.md             # This file
├── public/               # Frontend static files
│   ├── index.html        # Main HTML file
│   ├── styles.css        # TailwindCSS styles
│   └── app.js            # JavaScript logic
└── utils/                # Utility modules
    ├── model_utils.py    # Model management utilities
    ├── conversation.py   # Conversation management
    └── api_compat.py     # OPENAI API compatibility
```

## Requirements

- Python 3.8+
- GPU with CUDA support (recommended)
- 32GB+ RAM (for optimal performance with Qwen Coder 3)

## Installation

1. Clone this repository:
   ```bash
   git clone <repository-url>
   cd <repository-name>
   ```

2. Install dependencies:
   ```bash
   pip install -r requirements.txt
   ```

3. Run the application:
   ```bash
   python app.py
   ```

## Deployment to HuggingFace Spaces

1. Create a new Space on HuggingFace:
   - Go to https://huggingface.co/new-space
   - Choose "Gradio" as the Space SDK
   - Select a GPU hardware (recommended for Qwen Coder 3)

2. Upload files to your Space repository:
   - Upload all files from this repository
   - Make sure to include the `requirements.txt` file

3. Configure the Space:
   - The Space will automatically detect and install dependencies from `requirements.txt`
   - The application will start automatically on port 7860

4. Access your deployed application:
   - Once the build is complete, your application will be available at the provided URL

## API Endpoints

### OPENAI API Compatible Endpoint
```
POST /v1/chat/completions
```

Request format:
```json
{
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Hello!"}
  ],
  "model": "Qwen/Qwen3-Coder-30B-A3B-Instruct",
  "max_tokens": 1024,
  "temperature": 0.7
}
```

### Frontend Chat Endpoint
```
POST /chat
```

Request format:
```json
{
  "message": "Hello!",
  "history": [
    {"role": "user", "content": "Previous message"},
    {"role": "assistant", "content": "Previous response"}
  ]
}
```

## Customization

### Model Configuration
You can customize the model behavior by modifying the parameters in `utils/model_utils.py`:
- `DEFAULT_MAX_TOKENS`: Maximum tokens to generate
- `DEFAULT_TEMPERATURE`: Sampling temperature

### UI Customization
The UI can be customized by modifying:
- `public/styles.css`: CSS styles with TailwindCSS
- `public/app.js`: JavaScript logic
- `public/index.html`: HTML structure

## Troubleshooting

### Common Issues

1. **Model Loading Errors**:
   - Ensure you have sufficient RAM and GPU memory
   - Check that the model name is correct in `utils/model_utils.py`

2. **CUDA Out of Memory**:
   - Reduce `DEFAULT_MAX_TOKENS` in `utils/model_utils.py`
   - Use a smaller model variant if available

3. **Dependency Installation Failures**:
   - Check the HuggingFace Space logs for specific error messages
   - Ensure all dependencies are listed in `requirements.txt`

### Performance Optimization

1. **GPU Usage**:
   - The application automatically detects and uses CUDA if available
   - For CPU-only environments, performance will be significantly slower

2. **Caching**:
   - Redis is used for caching if available
   - In-memory storage is used as fallback

## Contributing

1. Fork the repository
2. Create a feature branch
3. Commit your changes
4. Push to the branch
5. Create a Pull Request

## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## Acknowledgments

- Qwen team for the Qwen/Qwen3-Coder-30B-A3B-Instruct model
- HuggingFace for providing the platform
- Gradio team for the web interface framework