RedHatAI
/

Llama-3.3-70B-Instruct-FP8-block

Text Generation

compressed-tensors

Model card Files Files and versions

krishnateja95 commited on Oct 22

Commit

40ea2ac

·

verified ·

1 Parent(s): d93e7a1

Update README.md

Files changed (1) hide show

README.md +5 -5

README.md CHANGED Viewed

@@ -94,7 +94,7 @@ from llmcompressor import oneshot
 from llmcompressor.modeling import replace_modules_for_calibration
 from llmcompressor.modifiers.quantization import QuantizationModifier
-MODEL_ID = "meta-llama/Llama-3.3-70B-Instruct"
 # Load model.
 model = LlamaForCausalLM.from_pretrained(MODEL_ID, dtype="auto")
@@ -135,7 +135,7 @@ The model was evaluated on the OpenLLMv1 leaderboard task, using [lm-evaluation-
   ```
   lm_eval \
     --model vllm \
-    --model_args pretrained=$model,dtype=auto,add_bos_token=True,max_model_len=16384,tensor_parallel_size=4,gpu_memory_utilization=0.9,enable_chunked_prefill=True,trust_remote_code=True \
     --tasks openllm \
     --write_out \
     --batch_size auto \
@@ -148,7 +148,7 @@ The model was evaluated on the OpenLLMv1 leaderboard task, using [lm-evaluation-
   ```
   lm_eval \
     --model vllm \
-    --model_args pretrained=$model,dtype=auto,add_bos_token=False,max_model_len=16384,tensor_parallel_size=4,gpu_memory_utilization=0.7,disable_log_stats=True,enable_chunked_prefill=True,trust_remote_code=True \
     --tasks leaderboard \
     --apply_chat_template \
     --fewshot_as_multiturn \
@@ -162,12 +162,12 @@ The model was evaluated on the OpenLLMv1 leaderboard task, using [lm-evaluation-
   **Coding Benchmarks**
   ```
-  evalplus.evaluate --model $model \
                     --dataset "humaneval" \
                     --backend vllm \
                     --tp 4 \
                     --greedy
-  evalplus.evaluate --model $model \
                   --dataset "mbpp" \
                   --backend vllm \
                   --tp 4 \

 from llmcompressor.modeling import replace_modules_for_calibration
 from llmcompressor.modifiers.quantization import QuantizationModifier
+MODEL_ID = "nm-testing/Llama-3.3-70B-Instruct-FP8-block"
 # Load model.
 model = LlamaForCausalLM.from_pretrained(MODEL_ID, dtype="auto")
   ```
   lm_eval \
     --model vllm \
+    --model_args pretrained="nm-testing/Llama-3.3-70B-Instruct-FP8-block",dtype=auto,add_bos_token=True,max_model_len=16384,tensor_parallel_size=4,gpu_memory_utilization=0.9,enable_chunked_prefill=True,trust_remote_code=True \
     --tasks openllm \
     --write_out \
     --batch_size auto \
   ```
   lm_eval \
     --model vllm \
+    --model_args pretrained="nm-testing/Llama-3.3-70B-Instruct-FP8-block",dtype=auto,add_bos_token=False,max_model_len=16384,tensor_parallel_size=4,gpu_memory_utilization=0.7,disable_log_stats=True,enable_chunked_prefill=True,trust_remote_code=True \
     --tasks leaderboard \
     --apply_chat_template \
     --fewshot_as_multiturn \
   **Coding Benchmarks**
   ```
+  evalplus.evaluate --model "nm-testing/Llama-3.3-70B-Instruct-FP8-block" \
                     --dataset "humaneval" \
                     --backend vllm \
                     --tp 4 \
                     --greedy
+  evalplus.evaluate --model "nm-testing/Llama-3.3-70B-Instruct-FP8-block" \
                   --dataset "mbpp" \
                   --backend vllm \
                   --tp 4 \