qwen2.5-7b-instruct-onnx-qnn
qwen2.5-7b-instruct-onnx-qnn is an ONNX QNN int4 quantized version of Qwen2.5-7B-Instruct, providing a very fast inference implementation, optimized for AI PCs using Qualcommm NPU.
This is from the latest release series from Qwen.
Model Description
- Developed by: Qwen
- Quantized by: llmware
- Model type: qwen2.5
- Parameters: 7 billion
- Model Parent: Qwen/Qwen2.5-7B-Instruct
- Language(s) (NLP): English
- License: Apache 2.0
- Uses: Chat, general-purpose LLM
- Quantization: int4
- Backend: qairt 2.36, ort 1.22.2, ortg 0.9
Model Card Contact
- Downloads last month
- 20
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support