danielhanchen commited on
Commit
d015cc0
·
verified ·
1 Parent(s): 6638dec

Upload folder using huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +594 -0
README.md ADDED
@@ -0,0 +1,594 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: vllm
3
+ language:
4
+ - en
5
+ - fr
6
+ - es
7
+ - de
8
+ - it
9
+ - pt
10
+ - nl
11
+ - zh
12
+ - ja
13
+ - ko
14
+ - ar
15
+ license: apache-2.0
16
+ inference: false
17
+ base_model:
18
+ - mistralai/Ministral-3-3B-Reasoning-2512
19
+ extra_gated_description: >-
20
+ If you want to learn more about how we process your personal data, please read
21
+ our <a href="https://mistral.ai/terms/">Privacy Policy</a>.
22
+ tags:
23
+ - mistral-common
24
+ - unsloth
25
+ ---
26
+ <div>
27
+ <p style="margin-top: 0;margin-bottom: 0;">
28
+ <em><a href="https://docs.unsloth.ai/basics/unsloth-dynamic-v2.0-gguf">Unsloth Dynamic 2.0</a> achieves superior accuracy & outperforms other leading quants.</em>
29
+ </p>
30
+ <div style="display: flex; gap: 5px; align-items: center; ">
31
+ <a href="https://github.com/unslothai/unsloth/">
32
+ <img src="https://github.com/unslothai/unsloth/raw/main/images/unsloth%20new%20logo.png" width="133">
33
+ </a>
34
+ <a href="https://discord.gg/unsloth">
35
+ <img src="https://github.com/unslothai/unsloth/raw/main/images/Discord%20button.png" width="173">
36
+ </a>
37
+ <a href="https://docs.unsloth.ai/">
38
+ <img src="https://raw.githubusercontent.com/unslothai/unsloth/refs/heads/main/images/documentation%20green%20button.png" width="143">
39
+ </a>
40
+ </div>
41
+ </div>
42
+
43
+
44
+ # Ministral 3 3B Reasoning 2512
45
+ The smallest model in the Ministral 3 family, **Ministral 3 3B** is a powerful, efficient tiny language model with vision capabilities.
46
+
47
+ This model is the reasoning post-trained version, trained for reasoning tasks, making it ideal for math, coding and stem related use cases.
48
+
49
+ The Ministral 3 family is designed for edge deployment, capable of running on a wide range of hardware. Ministral 3 3B can even be deployed locally, fitting in 16GB of VRAM in BF16, and less than 8GB of RAM/VRAM when quantized.
50
+
51
+ ## Key Features
52
+ Ministral 3 3B consists of two main architectural components:
53
+ - **3.4B Language Model**
54
+ - **0.4B Vision Encoder**
55
+
56
+ The Ministral 3 3B Reasoning model offers the following capabilities:
57
+ - **Vision**: Enables the model to analyze images and provide insights based on visual content, in addition to text.
58
+ - **Multilingual**: Supports dozens of languages, including English, French, Spanish, German, Italian, Portuguese, Dutch, Chinese, Japanese, Korean, Arabic.
59
+ - **System Prompt**: Maintains strong adherence and support for system prompts.
60
+ - **Agentic**: Offers best-in-class agentic capabilities with native function calling and JSON outputting.
61
+ - **Reasoning**: Excels at complex, multi-step reasoning and dynamic problem-solving.
62
+ - **Edge-Optimized**: Delivers best-in-class performance at a small scale, deployable anywhere.
63
+ - **Apache 2.0 License**: Open-source license allowing usage and modification for both commercial and non-commercial purposes.
64
+ - **Large Context Window**: Supports a 256k context window.
65
+
66
+ ### Use Cases
67
+ Ideal for lightweight, real-time applications on edge or low-resource devices, such as:
68
+ - Image captioning
69
+ - Text classification
70
+ - Real-time efficient translation
71
+ - Data extraction
72
+ - Short content generation
73
+ - Fine-tuning and specialization
74
+ - And more...
75
+
76
+ Bringing advanced AI capabilities to edge and distributed environments for embedded systems.
77
+
78
+ ## Ministral 3 Family
79
+
80
+ | Model Name | Type | Precision | Link |
81
+ |--------------------------------|--------------------|-----------|------------------------------------------------------------------------------------------|
82
+ | Ministral 3 3B Base 2512 | Base pre-trained | BF16 | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-3B-Base-2512) |
83
+ | Ministral 3 3B Instruct 2512 | Instruct post-trained | BF16 | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-3B-Instruct-2512) |
84
+ | **Ministral 3 3B Reasoning 2512** | **Reasoning capable** | **BF16** | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-3B-Reasoning-2512) |
85
+ | Ministral 3 8B Base 2512 | Base pre-trained | BF16 | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-8B-Base-2512) |
86
+ | Ministral 3 8B Instruct 2512 | Instruct post-trained | BF16 | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-8B-Instruct-2512) |
87
+ | Ministral 3 8B Reasoning 2512 | Reasoning capable | BF16 | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-8B-Reasoning-2512) |
88
+ | Ministral 3 14B Base 2512 | Base pre-trained | BF16 | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-14B-Base-2512) |
89
+ | Ministral 3 14B Instruct 2512 | Instruct post-trained | BF16 | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-14B-Instruct-2512) |
90
+ | Ministral 3 14B Reasoning 2512 | Reasoning capable | BF16 | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-14B-Reasoning-2512) |
91
+
92
+ Other formats available [here](https://huggingface.co/collections/mistralai/ministral-3-quants).
93
+
94
+ ## Benchmark Results
95
+
96
+ We compare Ministral 3 to similar sized models.
97
+
98
+ ### Reasoning
99
+
100
+ | Model | AIME25 | AIME24 | GPQA Diamond | LiveCodeBench |
101
+ |---------------------------|-------------|-------------|--------------|---------------|
102
+ | **Ministral 3 14B** | <u>0.850</u>| <u>0.898</u>| <u>0.712</u> | <u>0.646</u> |
103
+ | Qwen3-14B (Thinking) | 0.737 | 0.837 | 0.663 | 0.593 |
104
+ | | | | | |
105
+ | **Ministral 3 8B** | 0.787 | <u>0.860</u>| 0.668 | <u>0.616</u> |
106
+ | Qwen3-VL-8B-Thinking | <u>0.798</u>| <u>0.860</u>| <u>0.671</u> | 0.580 |
107
+ | | | | | |
108
+ | **Ministral 3 3B** | <u>0.721</u>| <u>0.775</u>| 0.534 | <u>0.548</u> |
109
+ | Qwen3-VL-4B-Thinking | 0.697 | 0.729 | <u>0.601</u> | 0.513 |
110
+
111
+ ### Instruct
112
+
113
+ | Model | Arena Hard | WildBench | MATH Maj@1 | MM MTBench |
114
+ |---------------------------|-------------|------------|-------------|------------------|
115
+ | **Ministral 3 14B** | <u>0.551</u>| <u>68.5</u>| <u>0.904</u>| <u>8.49</u> |
116
+ | Qwen3 14B (Non-Thinking) | 0.427 | 65.1 | 0.870 | NOT MULTIMODAL |
117
+ | Gemma3-12B-Instruct | 0.436 | 63.2 | 0.854 | 6.70 |
118
+ | | | | | |
119
+ | **Ministral 3 8B** | 0.509 | <u>66.8</u>| 0.876 | <u>8.08</u> |
120
+ | Qwen3-VL-8B-Instruct | <u>0.528</u>| 66.3 | <u>0.946</u>| 8.00 |
121
+ | | | | | |
122
+ | **Ministral 3 3B** | 0.305 | <u>56.8</u>| 0.830 | 7.83 |
123
+ | Qwen3-VL-4B-Instruct | <u>0.438</u>| <u>56.8</u>| <u>0.900</u>| <u>8.01</u> |
124
+ | Qwen3-VL-2B-Instruct | 0.163 | 42.2 | 0.786 | 6.36 |
125
+ | Gemma3-4B-Instruct | 0.318 | 49.1 | 0.759 | 5.23 |
126
+
127
+ ### Base
128
+
129
+ | Model | Multilingual MMLU | MATH CoT 2-Shot | AGIEval 5-shot | MMLU Redux 5-shot | MMLU 5-shot | TriviaQA 5-shot |
130
+ |---------------------|-------------------|-----------------|----------------|-------------------|-------------|-----------------|
131
+ | **Ministral 3 14B** | 0.742 | <u>0.676</u> | 0.648 | 0.820 | 0.794 | 0.749 |
132
+ | Qwen3 14B Base | <u>0.754</u> | 0.620 | <u>0.661</u> | <u>0.837</u> | <u>0.804</u>| 0.703 |
133
+ | Gemma 3 12B Base | 0.690 | 0.487 | 0.587 | 0.766 | 0.745 | <u>0.788</u> |
134
+ | | | | | | | |
135
+ | **Ministral 3 8B** | <u>0.706</u> | <u>0.626</u> | 0.591 | 0.793 | <u>0.761</u>| <u>0.681</u> |
136
+ | Qwen 3 8B Base | 0.700 | 0.576 | <u>0.596</u> | <u>0.794</u> | 0.760 | 0.639 |
137
+ | | | | | | | |
138
+ | **Ministral 3 3B** | 0.652 | <u>0.601</u> | 0.511 | 0.735 | 0.707 | 0.592 |
139
+ | Qwen 3 4B Base | <u>0.677</u> | 0.405 | <u>0.570</u> | <u>0.759</u> | <u>0.713</u>| 0.530 |
140
+ | Gemma 3 4B Base | 0.516 | 0.294 | 0.430 | 0.626 | 0.589 | <u>0.640</u> |
141
+
142
+ ## Usage
143
+
144
+ The model can be used with the following frameworks;
145
+ - [`vllm`](https://github.com/vllm-project/vllm): See [here](#vllm)
146
+ - [`transformers`](https://github.com/huggingface/transformers): See [here](#transformers)
147
+
148
+ ### vLLM
149
+
150
+ We recommend using this model with [vLLM](https://github.com/vllm-project/vllm).
151
+
152
+ #### Installation
153
+
154
+ Make sure to install [`vLLM >= 0.12.0`](https://github.com/vllm-project/vllm/releases/tag/v0.12.0):
155
+
156
+ ```
157
+ pip install vllm --upgrade
158
+ ```
159
+
160
+ Doing so should automatically install [`mistral_common >= 1.8.6`](https://github.com/mistralai/mistral-common/releases/tag/v1.8.6).
161
+
162
+ To check:
163
+ ```
164
+ python -c "import mistral_common; print(mistral_common.__version__)"
165
+ ```
166
+
167
+ You can also make use of a ready-to-go [docker image](https://github.com/vllm-project/vllm/blob/main/Dockerfile) or on the [docker hub](https://hub.docker.com/layers/vllm/vllm-openai/latest/images/sha256-de9032a92ffea7b5c007dad80b38fd44aac11eddc31c435f8e52f3b7404bbf39).
168
+
169
+ #### Serve
170
+
171
+ Due to their size, `Ministral-3-3B-Reasoning-2512` and `Ministral-3-8B-Reasoning-2512` can run on a single 1xH200 GPU.
172
+
173
+ A simple launch command is:
174
+
175
+ ```bash
176
+
177
+ vllm serve mistralai/Ministral-3-3B-Reasoning-2512-FP8 \
178
+ --enable-auto-tool-choice --tool-call-parser mistral \
179
+ --reasoning-parser mistral
180
+ ```
181
+
182
+ Key parameter notes:
183
+
184
+ * enable-auto-tool-choice: Required when enabling tool usage.
185
+ * tool-call-parser mistral: Required when enabling tool usage.
186
+ * reasoning-parser mistral: Required when enabling reasoning.
187
+
188
+ Additional flags:
189
+
190
+ * You can set `--max-model-len` to preserve memory. By default it is set to `262144` which is quite large but not necessary for most scenarios.
191
+ * You can set `--max-num-batched-tokens` to balance throughput and latency, higher means higher throughput but higher latency.
192
+
193
+ #### Usage of the model
194
+
195
+ Here we asumme that the model `mistralai/Ministral-3-3B-Reasoning-2512` is served and you can ping it to the domain `localhost` with the port `8000` which is the default for vLLM.
196
+
197
+ <details>
198
+ <summary>Vision Reasoning</summary>
199
+
200
+ Let's see if the Ministral 3 model knows when to pick a fight !
201
+
202
+ ```python
203
+ from typing import Any
204
+
205
+ from openai import OpenAI
206
+ from huggingface_hub import hf_hub_download
207
+
208
+ # Modify OpenAI's API key and API base to use vLLM's API server.
209
+ openai_api_key = "EMPTY"
210
+ openai_api_base = "http://localhost:8000/v1"
211
+
212
+ TEMP = 0.7
213
+ TOP_P = 0.95
214
+ MAX_TOK = 262144
215
+ client = OpenAI(
216
+ api_key=openai_api_key,
217
+ base_url=openai_api_base,
218
+ )
219
+
220
+ models = client.models.list()
221
+ model = models.data[0].id
222
+
223
+
224
+ def load_system_prompt(repo_id: str, filename: str) -> dict[str, Any]:
225
+ file_path = hf_hub_download(repo_id=repo_id, filename=filename)
226
+ with open(file_path, "r") as file:
227
+ system_prompt = file.read()
228
+
229
+ index_begin_think = system_prompt.find("[THINK]")
230
+ index_end_think = system_prompt.find("[/THINK]")
231
+
232
+ return {
233
+ "role": "system",
234
+ "content": [
235
+ {"type": "text", "text": system_prompt[:index_begin_think]},
236
+ {
237
+ "type": "thinking",
238
+ "thinking": system_prompt[
239
+ index_begin_think + len("[THINK]") : index_end_think
240
+ ],
241
+ "closed": True,
242
+ },
243
+ {
244
+ "type": "text",
245
+ "text": system_prompt[index_end_think + len("[/THINK]") :],
246
+ },
247
+ ],
248
+ }
249
+
250
+
251
+ SYSTEM_PROMPT = load_system_prompt(model, "SYSTEM_PROMPT.txt")
252
+
253
+ image_url = "https://static.wikia.nocookie.net/essentialsdocs/images/7/70/Battle.png/revision/latest?cb=20220523172438"
254
+
255
+ messages = [
256
+ SYSTEM_PROMPT,
257
+ {
258
+ "role": "user",
259
+ "content": [
260
+ {
261
+ "type": "text",
262
+ "text": "What action do you think I should take in this situation? List all the possible actions and explain why you think they are good or bad.",
263
+ },
264
+ {"type": "image_url", "image_url": {"url": image_url}},
265
+ ],
266
+ },
267
+ ]
268
+
269
+
270
+ stream = client.chat.completions.create(
271
+ model=model,
272
+ messages=messages,
273
+ stream=True,
274
+ temperature=TEMP,
275
+ top_p=TOP_P,
276
+ max_tokens=MAX_TOK,
277
+ )
278
+
279
+ print("client: Start streaming chat completions...:\n")
280
+ printed_reasoning_content = False
281
+ answer = []
282
+
283
+ for chunk in stream:
284
+ reasoning_content = None
285
+ content = None
286
+ # Check the content is reasoning_content or content
287
+ if hasattr(chunk.choices[0].delta, "reasoning_content"):
288
+ reasoning_content = chunk.choices[0].delta.reasoning_content
289
+ if hasattr(chunk.choices[0].delta, "content"):
290
+ content = chunk.choices[0].delta.content
291
+
292
+ if reasoning_content is not None:
293
+ if not printed_reasoning_content:
294
+ printed_reasoning_content = True
295
+ print("Start reasoning:\n", end="", flush=True)
296
+ print(reasoning_content, end="", flush=True)
297
+ elif content is not None:
298
+ # Extract and print the content
299
+ if not reasoning_content and printed_reasoning_content:
300
+ answer.extend(content)
301
+ print(content, end="", flush=True)
302
+
303
+ if answer:
304
+ print("\n\n=============\nAnswer\n=============\n")
305
+ print("".join(answer))
306
+ else:
307
+ print("\n\n=============\nNo Answer\n=============\n")
308
+ print(
309
+ "No answer was generated by the model, probably because the maximum number of tokens was reached."
310
+ )
311
+ ```
312
+
313
+ Now we'll make it compute some maths !
314
+
315
+ ```python
316
+ from typing import Any
317
+
318
+ from openai import OpenAI
319
+ from huggingface_hub import hf_hub_download
320
+
321
+ # Modify OpenAI's API key and API base to use vLLM's API server.
322
+ openai_api_key = "EMPTY"
323
+ openai_api_base = "http://localhost:8000/v1"
324
+
325
+ TEMP = 0.7
326
+ TOP_P = 0.95
327
+ MAX_TOK = 262144
328
+ client = OpenAI(
329
+ api_key=openai_api_key,
330
+ base_url=openai_api_base,
331
+ )
332
+
333
+ models = client.models.list()
334
+ model = models.data[0].id
335
+
336
+
337
+ def load_system_prompt(repo_id: str, filename: str) -> dict[str, Any]:
338
+ file_path = hf_hub_download(repo_id=repo_id, filename=filename)
339
+ with open(file_path, "r") as file:
340
+ system_prompt = file.read()
341
+
342
+ index_begin_think = system_prompt.find("[THINK]")
343
+ index_end_think = system_prompt.find("[/THINK]")
344
+
345
+ return {
346
+ "role": "system",
347
+ "content": [
348
+ {"type": "text", "text": system_prompt[:index_begin_think]},
349
+ {
350
+ "type": "thinking",
351
+ "thinking": system_prompt[
352
+ index_begin_think + len("[THINK]") : index_end_think
353
+ ],
354
+ "closed": True,
355
+ },
356
+ {
357
+ "type": "text",
358
+ "text": system_prompt[index_end_think + len("[/THINK]") :],
359
+ },
360
+ ],
361
+ }
362
+
363
+
364
+ SYSTEM_PROMPT = load_system_prompt(model, "SYSTEM_PROMPT.txt")
365
+
366
+ image_url = "https://i.ytimg.com/vi/5Y3xLHeyKZU/hqdefault.jpg"
367
+
368
+ messages = [
369
+ SYSTEM_PROMPT,
370
+ {
371
+ "role": "user",
372
+ "content": [
373
+ {
374
+ "type": "text",
375
+ "text": "Solve the equations. If they contain only numbers, use your calculator, else only think. Answer in the language of the image.",
376
+ },
377
+ {"type": "image_url", "image_url": {"url": image_url}},
378
+ ],
379
+ },
380
+ ]
381
+
382
+ stream = client.chat.completions.create(
383
+ model=model,
384
+ messages=messages,
385
+ stream=True,
386
+ temperature=TEMP,
387
+ top_p=TOP_P,
388
+ max_tokens=MAX_TOK,
389
+ )
390
+
391
+ print("client: Start streaming chat completions...:\n")
392
+ printed_reasoning_content = False
393
+ answer = []
394
+
395
+ for chunk in stream:
396
+ reasoning_content = None
397
+ content = None
398
+ # Check the content is reasoning_content or content
399
+ if hasattr(chunk.choices[0].delta, "reasoning_content"):
400
+ reasoning_content = chunk.choices[0].delta.reasoning_content
401
+ if hasattr(chunk.choices[0].delta, "content"):
402
+ content = chunk.choices[0].delta.content
403
+
404
+ if reasoning_content is not None:
405
+ if not printed_reasoning_content:
406
+ printed_reasoning_content = True
407
+ print("Start reasoning:\n", end="", flush=True)
408
+ print(reasoning_content, end="", flush=True)
409
+ if content is not None:
410
+ # Extract and print the content
411
+ if not reasoning_content and printed_reasoning_content:
412
+ answer.extend(content)
413
+ print(content, end="", flush=True)
414
+
415
+ if answer:
416
+ print("\n\n=============\nAnswer\n=============\n")
417
+ print("".join(answer))
418
+ else:
419
+ print("\n\n=============\nNo Answer\n=============\n")
420
+ print(
421
+ "No answer was generated by the model, probably because the maximum number of tokens was reached."
422
+ )
423
+ ```
424
+
425
+ </details>
426
+
427
+ <details>
428
+ <summary>Text-Only Request</summary>
429
+
430
+ Let's do more maths and leave it up to the model to figure out how to achieve a result.
431
+
432
+ ```python
433
+ from typing import Any
434
+ from openai import OpenAI
435
+ from huggingface_hub import hf_hub_download
436
+
437
+ # Modify OpenAI's API key and API base to use vLLM's API server.
438
+ openai_api_key = "EMPTY"
439
+ openai_api_base = "http://localhost:8000/v1"
440
+
441
+ TEMP = 0.7
442
+ TOP_P = 0.95
443
+ MAX_TOK = 262144
444
+ client = OpenAI(
445
+ api_key=openai_api_key,
446
+ base_url=openai_api_base,
447
+ )
448
+
449
+ models = client.models.list()
450
+ model = models.data[0].id
451
+
452
+
453
+ def load_system_prompt(repo_id: str, filename: str) -> dict[str, Any]:
454
+ file_path = hf_hub_download(repo_id=repo_id, filename=filename)
455
+ with open(file_path, "r") as file:
456
+ system_prompt = file.read()
457
+
458
+ index_begin_think = system_prompt.find("[THINK]")
459
+ index_end_think = system_prompt.find("[/THINK]")
460
+
461
+ return {
462
+ "role": "system",
463
+ "content": [
464
+ {"type": "text", "text": system_prompt[:index_begin_think]},
465
+ {
466
+ "type": "thinking",
467
+ "thinking": system_prompt[
468
+ index_begin_think + len("[THINK]") : index_end_think
469
+ ],
470
+ "closed": True,
471
+ },
472
+ {
473
+ "type": "text",
474
+ "text": system_prompt[index_end_think + len("[/THINK]") :],
475
+ },
476
+ ],
477
+ }
478
+
479
+
480
+ SYSTEM_PROMPT = load_system_prompt(model, "SYSTEM_PROMPT.txt")
481
+
482
+ query = "Use each number in 2,5,6,3 exactly once, along with any combination of +, -, ×, ÷ (and parentheses for grouping), to make the number 24."
483
+
484
+ messages = [
485
+ SYSTEM_PROMPT,
486
+ {"role": "user", "content": query}
487
+ ]
488
+ stream = client.chat.completions.create(
489
+ model=model,
490
+ messages=messages,
491
+ stream=True,
492
+ temperature=TEMP,
493
+ top_p=TOP_P,
494
+ max_tokens=MAX_TOK,
495
+ )
496
+
497
+ print("client: Start streaming chat completions...:\n")
498
+ printed_reasoning_content = False
499
+ answer = []
500
+
501
+ for chunk in stream:
502
+ reasoning_content = None
503
+ content = None
504
+ # Check the content is reasoning_content or content
505
+ if hasattr(chunk.choices[0].delta, "reasoning_content"):
506
+ reasoning_content = chunk.choices[0].delta.reasoning_content
507
+ if hasattr(chunk.choices[0].delta, "content"):
508
+ content = chunk.choices[0].delta.content
509
+
510
+ if reasoning_content is not None:
511
+ if not printed_reasoning_content:
512
+ printed_reasoning_content = True
513
+ print("Start reasoning:\n", end="", flush=True)
514
+ print(reasoning_content, end="", flush=True)
515
+ if content is not None:
516
+ # Extract and print the content
517
+ if not reasoning_content and printed_reasoning_content:
518
+ answer.extend(content)
519
+ print(content, end="", flush=True)
520
+
521
+ if answer:
522
+ print("\n\n=============\nAnswer\n=============\n")
523
+ print("".join(answer))
524
+ else:
525
+ print("\n\n=============\nNo Answer\n=============\n")
526
+ print("No answer was generated by the model, probably because the maximum number of tokens was reached.")
527
+ ```
528
+
529
+ </details>
530
+
531
+ ### Transformers
532
+
533
+ You can also use Ministral 3 3B Reasoning 2512 with `Transformers` !
534
+
535
+ To make the best use of our model with `Transformers` make sure to have [installed](https://github.com/mistralai/mistral-common) `mistral-common >= 1.8.6` to use our tokenizer.
536
+
537
+ ```bash
538
+ pip install mistral-common --upgrade
539
+ ```
540
+
541
+ Then load our tokenizer along with the model and generate:
542
+
543
+ <details>
544
+ <summary>Python snippet</summary>
545
+
546
+ ```python
547
+ import torch
548
+ from transformers import Mistral3ForConditionalGeneration, MistralCommonBackend
549
+
550
+ model_id = "mistralai/Ministral-3-3B-Reasoning-2512"
551
+
552
+ tokenizer = MistralCommonBackend.from_pretrained(model_id)
553
+ model = Mistral3ForConditionalGeneration.from_pretrained(
554
+ model_id, torch_dtype=torch.bfloat16, device_map="auto"
555
+ )
556
+
557
+ image_url = "https://static.wikia.nocookie.net/essentialsdocs/images/7/70/Battle.png/revision/latest?cb=20220523172438"
558
+
559
+ messages = [
560
+ {
561
+ "role": "user",
562
+ "content": [
563
+ {
564
+ "type": "text",
565
+ "text": "What action do you think I should take in this situation? List all the possible actions and explain why you think they are good or bad.",
566
+ },
567
+ {"type": "image_url", "image_url": {"url": image_url}},
568
+ ],
569
+ },
570
+ ]
571
+
572
+ tokenized = tokenizer.apply_chat_template(messages, return_tensors="pt", return_dict=True)
573
+
574
+ tokenized["input_ids"] = tokenized["input_ids"].to(device="cuda")
575
+ tokenized["pixel_values"] = tokenized["pixel_values"].to(dtype=torch.bfloat16, device="cuda")
576
+ image_sizes = [tokenized["pixel_values"].shape[-2:]]
577
+
578
+ output = model.generate(
579
+ **tokenized,
580
+ image_sizes=image_sizes,
581
+ max_new_tokens=8092,
582
+ )[0]
583
+
584
+ decoded_output = tokenizer.decode(output[len(tokenized["input_ids"][0]):])
585
+ print(decoded_output)
586
+ ```
587
+
588
+ </details>
589
+
590
+ ## License
591
+
592
+ This model is licensed under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0.txt).
593
+
594
+ *You must not use this model in a manner that infringes, misappropriates, or otherwise violates any third party’s rights, including intellectual property rights.*