Update README.md
Browse files
README.md
CHANGED
|
@@ -15,13 +15,13 @@ tags:
|
|
| 15 |
- phi-4-mini
|
| 16 |
---
|
| 17 |
|
| 18 |
-
##
|
| 19 |
|
| 20 |
### Introduction
|
| 21 |
|
| 22 |
-
ONNX version of
|
| 23 |
|
| 24 |
-
This
|
| 25 |
|
| 26 |
To run this model with ONNX Runtime:
|
| 27 |
|
|
@@ -40,13 +40,12 @@ curl https://raw.githubusercontent.com/microsoft/onnxruntime-genai/refs/heads/ma
|
|
| 40 |
Run the script
|
| 41 |
|
| 42 |
```bash
|
| 43 |
-
python phi4-mm.py -m Phi-4-multimodal-instruct-onnx/
|
| 44 |
```
|
| 45 |
|
| 46 |
-
You will be prompted
|
| 47 |
|
| 48 |
-
|
| 49 |
-
The performance of the text component is similar the [phi4 mini ONNX models] (https://huggingface.co/microsoft/Phi-4-mini-instruct-onnx/blob/main/README.md)
|
| 50 |
|
| 51 |
### Model Description
|
| 52 |
|
|
|
|
| 15 |
- phi-4-mini
|
| 16 |
---
|
| 17 |
|
| 18 |
+
## Phi-4 Multimodal Instruct ONNX models
|
| 19 |
|
| 20 |
### Introduction
|
| 21 |
|
| 22 |
+
This is an ONNX version of the Phi-4 multimodal model to accelerate inference with ONNX Runtime.
|
| 23 |
|
| 24 |
+
This model is quantized to int4 precision and runs on GPU devices.
|
| 25 |
|
| 26 |
To run this model with ONNX Runtime:
|
| 27 |
|
|
|
|
| 40 |
Run the script
|
| 41 |
|
| 42 |
```bash
|
| 43 |
+
python phi4-mm.py -m Phi-4-multimodal-instruct-onnx/gpu/gpu-int4-rtn-block-32 -e cuda
|
| 44 |
```
|
| 45 |
|
| 46 |
+
You will be prompted to provide any images, audios, and a prompt.
|
| 47 |
|
| 48 |
+
The performance of the text component is similar to the [Phi-4 mini ONNX models] (https://huggingface.co/microsoft/Phi-4-mini-instruct-onnx/blob/main/README.md)
|
|
|
|
| 49 |
|
| 50 |
### Model Description
|
| 51 |
|