m2v-qwen25-as-an-embedding-test-1024d
π§ͺ First Model2Vec with Instruction-Awareness Across 7 Languages
...but that's the only feature, and it's limited!
β οΈ Important Disclaimer
This model was an experiment that didn't go as planned.
We tried to distill Qwen2.5-1.5B-Instruct (an instruction-tuned LLM) into static embeddings using Model2Vec, hoping to preserve instruction-awareness. While we did achieve some instruction-awareness in controlled tests, the model performs poorly on standard benchmarks.
We're keeping this model public for transparency and to help others avoid the same pitfalls.
π MTEB Benchmark Results (The Hard Truth)
| Task Category | Score | Verdict |
|---|---|---|
| STS (Semantic Similarity) | 0.2157 | β Very Poor |
| - STSBenchmark | 0.1705 | β |
| - SICK-R | 0.2609 | β |
| Classification (kNN) | 0.4188 | β οΈ Mediocre |
| - Banking77 | 0.5765 | β οΈ |
| - Emotion | 0.2611 | β |
| Clustering | 0.0125 | β Catastrophic |
| - TwentyNewsgroups | 0.0125 | β |
| Overall MTEB | 0.2157 | β Not Recommended |
Comparison with Better Alternatives
| Model | STS | Classification | Overall | Size | Recommendation |
|---|---|---|---|---|---|
| M2V-BGE-M3-1024d | 0.5831 | 0.6564 | 0.4722 | 499 MB | β Use this instead |
| M2V-Qwen3-0.6B-1024d | 0.4845 | 0.5949 | 0.4202 | 302 MB | β Good alternative |
| POTION-base-8M | ~0.52 | ~0.55 | ~0.45 | 30 MB | β Smallest option |
| This model | 0.2157 | 0.4188 | 0.2157 | 65 MB | β Avoid |
π€ What Went Wrong?
The Hypothesis
We thought: "If we distill an instruction-tuned LLM instead of a base model, the static embeddings might preserve instruction-awareness!"
The Reality
- Instruction-awareness partially works in controlled tests (96-99% on our custom benchmarks)
- But standard semantic understanding collapsed - the model doesn't discriminate well between similar and different concepts
- Embedding collapse: Similar and dissimilar pairs get almost the same similarity scores
The Lesson Learned
Model2Vec distillation works best with models designed for embeddings (like BGE-M3, Qwen3-Embedding), not instruction-tuned LLMs. The instruction-following capability doesn't translate well to static embeddings.
π What This Model CAN Do (Limited)
In our controlled monolingual tests, the model showed instruction-awareness:
# This works (instruction understood)
"Explain neural networks" β "Neural networks tutorial guide" # High similarity
# But general semantic understanding fails
"The cat sits on the mat" vs "A dog runs in the park" # Should be different, but scores are similar
Languages tested: EN, FR, ES, DE, ZH, AR, RU (monolingual only, no cross-lingual)
π― Should You Use This Model?
β NO, unless:
- You're doing research on Model2Vec limitations
- You want to understand what happens when distilling instruction-tuned LLMs
- You're specifically testing instruction-awareness (not general embeddings)
β Use these instead:
- tss-deposium/m2v-bge-m3-1024d - Our best Model2Vec (0.47 MTEB)
- tss-deposium/m2v-qwen3-embedding-0.6b-1024d - Smaller alternative (0.42 MTEB)
- BAAI/bge-m3 - Full transformer, best quality
π§ Technical Details
- Base Model: Qwen/Qwen2.5-1.5B-Instruct (NOT an embedding model!)
- Distillation: Model2Vec static embeddings
- Dimensions: 1024D
- Size: 65MB
- Problem: Base model wasn't designed for embeddings
π What We Learned
- Don't distill instruction-tuned LLMs for embeddings - use embedding-specific models
- Custom benchmarks can be misleading - always validate with standard benchmarks (MTEB)
- Instruction-awareness β semantic understanding - they're different capabilities
- Transparency matters - publishing failures helps the community
π Better Alternatives (Our Models)
| Model | MTEB Score | Size | Link |
|---|---|---|---|
| M2V-BGE-M3-1024d | 0.4722 | 499 MB | HuggingFace |
| M2V-Qwen3-0.6B-1024d | 0.4202 | 302 MB | HuggingFace |
π Citation
If you reference this model (as an example of what NOT to do):
@misc{m2v-qwen25-test,
author = {Nicolas Geysse - TSS Deposium},
title = {m2v-qwen25-as-an-embedding-test: An Experiment in Distilling Instruction-Tuned LLMs},
year = {2025},
note = {Experimental model with poor MTEB performance (21.57%). Published for transparency.},
url = {https://huggingface.co/tss-deposium/qwen25-deposium-1024d}
}
Built by Nicolas Geysse - TSS Deposium
"We tried something new. It didn't work great. But we learned, and now you can too!" π
- Downloads last month
- 132