Update README.md
Browse files
README.md
CHANGED
|
@@ -70,6 +70,50 @@ print(generated_text)
|
|
| 70 |
|
| 71 |
vLLM aslo supports OpenAI-compatible serving. See the [documentation](https://docs.vllm.ai/en/latest/) for more details.
|
| 72 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 73 |
|
| 74 |
## Evaluation
|
| 75 |
|
|
|
|
| 70 |
|
| 71 |
vLLM aslo supports OpenAI-compatible serving. See the [documentation](https://docs.vllm.ai/en/latest/) for more details.
|
| 72 |
|
| 73 |
+
## Creation
|
| 74 |
+
|
| 75 |
+
<details>
|
| 76 |
+
<summary>Creation details</summary>
|
| 77 |
+
This model was created with [llm-compressor](https://github.com/vllm-project/llm-compressor) by running the code snippet below.
|
| 78 |
+
|
| 79 |
+
|
| 80 |
+
```python
|
| 81 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 82 |
+
from llmcompressor.modifiers.quantization import QuantizationModifier
|
| 83 |
+
from llmcompressor.transformers import oneshot
|
| 84 |
+
|
| 85 |
+
# Load model
|
| 86 |
+
model_stub = "Qwen/Qwen2.5-7B-Instruct-FP8-dynamic"
|
| 87 |
+
model_name = model_stub.split("/")[-1]
|
| 88 |
+
|
| 89 |
+
tokenizer = AutoTokenizer.from_pretrained(model_stub)
|
| 90 |
+
|
| 91 |
+
model = AutoModelForCausalLM.from_pretrained(
|
| 92 |
+
model_stub,
|
| 93 |
+
device_map="auto",
|
| 94 |
+
torch_dtype="auto",
|
| 95 |
+
)
|
| 96 |
+
|
| 97 |
+
# Configure the quantization algorithm and scheme
|
| 98 |
+
recipe = QuantizationModifier(
|
| 99 |
+
targets="Linear",
|
| 100 |
+
scheme="FP8_dynamic",
|
| 101 |
+
ignore=["lm_head"],
|
| 102 |
+
)
|
| 103 |
+
|
| 104 |
+
# Apply quantization
|
| 105 |
+
oneshot(
|
| 106 |
+
model=model,
|
| 107 |
+
recipe=recipe,
|
| 108 |
+
)
|
| 109 |
+
|
| 110 |
+
# Save to disk in compressed-tensors format
|
| 111 |
+
save_path = model_name + "-FP8-dynamic"
|
| 112 |
+
model.save_pretrained(save_path)
|
| 113 |
+
tokenizer.save_pretrained(save_path)
|
| 114 |
+
print(f"Model and tokenizer saved to: {save_path}")
|
| 115 |
+
```
|
| 116 |
+
</details>
|
| 117 |
|
| 118 |
## Evaluation
|
| 119 |
|