Upload complete model
Browse files
README.md
CHANGED
|
@@ -9,7 +9,7 @@ pipeline_tag: text-generation
|
|
| 9 |
### CURRENTLY UPLOADING
|
| 10 |
### CURRENTLY UPLOADING
|
| 11 |
|
| 12 |
-
**See DeepSeek-V3.2-Speciale 5.5bit MLX in action - [demonstration video
|
| 13 |
|
| 14 |
*q5.5bit quant typically achieves 1.141 perplexity in our testing*
|
| 15 |
| Quantization | Perplexity |
|
|
@@ -24,12 +24,12 @@ pipeline_tag: text-generation
|
|
| 24 |
## Usage Notes
|
| 25 |
|
| 26 |
* Tested remotely over the network via a M3 Ultra 512GB RAM using [Inferencer app v1.7.3](https://inferencer.com)
|
| 27 |
-
* Memory usage: ~
|
| 28 |
-
* For a context window
|
| 29 |
* sudo sysctl iogpu.wired_limit_mb=507000
|
| 30 |
* Expect ~16.5 tokens/s @ 1000 tokens
|
| 31 |
* Quantized with a modified version of [MLX](https://github.com/ml-explore/mlx) 0.28
|
| 32 |
-
* For more details see [demonstration video - coming soon](https://
|
| 33 |
|
| 34 |
## Disclaimer
|
| 35 |
|
|
|
|
| 9 |
### CURRENTLY UPLOADING
|
| 10 |
### CURRENTLY UPLOADING
|
| 11 |
|
| 12 |
+
**See DeepSeek-V3.2-Speciale 5.5bit MLX in action - [demonstration video](https://youtu.be/b6RgBIROK5o)**
|
| 13 |
|
| 14 |
*q5.5bit quant typically achieves 1.141 perplexity in our testing*
|
| 15 |
| Quantization | Perplexity |
|
|
|
|
| 24 |
## Usage Notes
|
| 25 |
|
| 26 |
* Tested remotely over the network via a M3 Ultra 512GB RAM using [Inferencer app v1.7.3](https://inferencer.com)
|
| 27 |
+
* Memory usage: ~450 GB
|
| 28 |
+
* For a larger context window you can expand the VRAM limit:
|
| 29 |
* sudo sysctl iogpu.wired_limit_mb=507000
|
| 30 |
* Expect ~16.5 tokens/s @ 1000 tokens
|
| 31 |
* Quantized with a modified version of [MLX](https://github.com/ml-explore/mlx) 0.28
|
| 32 |
+
* For more details see [demonstration video - coming soon](https://youtu.be/b6RgBIROK5o) or visit [DeepSeek-V3.2-Speciale](https://huggingface.co/deepseek-ai/DeepSeek-V3.2-Speciale).
|
| 33 |
|
| 34 |
## Disclaimer
|
| 35 |
|