m2v-qwen25-as-an-embedding-test-1024d

πŸ§ͺ First Model2Vec with Instruction-Awareness Across 7 Languages

...but that's the only feature, and it's limited!

Experimental MTEB Size


⚠️ Important Disclaimer

This model was an experiment that didn't go as planned.

We tried to distill Qwen2.5-1.5B-Instruct (an instruction-tuned LLM) into static embeddings using Model2Vec, hoping to preserve instruction-awareness. While we did achieve some instruction-awareness in controlled tests, the model performs poorly on standard benchmarks.

We're keeping this model public for transparency and to help others avoid the same pitfalls.


πŸ“‰ MTEB Benchmark Results (The Hard Truth)

Task Category Score Verdict
STS (Semantic Similarity) 0.2157 ❌ Very Poor
- STSBenchmark 0.1705 ❌
- SICK-R 0.2609 ❌
Classification (kNN) 0.4188 ⚠️ Mediocre
- Banking77 0.5765 ⚠️
- Emotion 0.2611 ❌
Clustering 0.0125 ❌ Catastrophic
- TwentyNewsgroups 0.0125 ❌
Overall MTEB 0.2157 ❌ Not Recommended

Comparison with Better Alternatives

Model STS Classification Overall Size Recommendation
M2V-BGE-M3-1024d 0.5831 0.6564 0.4722 499 MB βœ… Use this instead
M2V-Qwen3-0.6B-1024d 0.4845 0.5949 0.4202 302 MB βœ… Good alternative
POTION-base-8M ~0.52 ~0.55 ~0.45 30 MB βœ… Smallest option
This model 0.2157 0.4188 0.2157 65 MB ❌ Avoid

πŸ€” What Went Wrong?

The Hypothesis

We thought: "If we distill an instruction-tuned LLM instead of a base model, the static embeddings might preserve instruction-awareness!"

The Reality

  1. Instruction-awareness partially works in controlled tests (96-99% on our custom benchmarks)
  2. But standard semantic understanding collapsed - the model doesn't discriminate well between similar and different concepts
  3. Embedding collapse: Similar and dissimilar pairs get almost the same similarity scores

The Lesson Learned

Model2Vec distillation works best with models designed for embeddings (like BGE-M3, Qwen3-Embedding), not instruction-tuned LLMs. The instruction-following capability doesn't translate well to static embeddings.


πŸ“Š What This Model CAN Do (Limited)

In our controlled monolingual tests, the model showed instruction-awareness:

# This works (instruction understood)
"Explain neural networks" β†’ "Neural networks tutorial guide"  # High similarity

# But general semantic understanding fails
"The cat sits on the mat" vs "A dog runs in the park"  # Should be different, but scores are similar

Languages tested: EN, FR, ES, DE, ZH, AR, RU (monolingual only, no cross-lingual)


🎯 Should You Use This Model?

❌ NO, unless:

  • You're doing research on Model2Vec limitations
  • You want to understand what happens when distilling instruction-tuned LLMs
  • You're specifically testing instruction-awareness (not general embeddings)

βœ… Use these instead:


πŸ”§ Technical Details

  • Base Model: Qwen/Qwen2.5-1.5B-Instruct (NOT an embedding model!)
  • Distillation: Model2Vec static embeddings
  • Dimensions: 1024D
  • Size: 65MB
  • Problem: Base model wasn't designed for embeddings

πŸ“š What We Learned

  1. Don't distill instruction-tuned LLMs for embeddings - use embedding-specific models
  2. Custom benchmarks can be misleading - always validate with standard benchmarks (MTEB)
  3. Instruction-awareness β‰  semantic understanding - they're different capabilities
  4. Transparency matters - publishing failures helps the community

πŸ”— Better Alternatives (Our Models)

Model MTEB Score Size Link
M2V-BGE-M3-1024d 0.4722 499 MB HuggingFace
M2V-Qwen3-0.6B-1024d 0.4202 302 MB HuggingFace

πŸ“– Citation

If you reference this model (as an example of what NOT to do):

@misc{m2v-qwen25-test,
  author = {Nicolas Geysse - TSS Deposium},
  title = {m2v-qwen25-as-an-embedding-test: An Experiment in Distilling Instruction-Tuned LLMs},
  year = {2025},
  note = {Experimental model with poor MTEB performance (21.57%). Published for transparency.},
  url = {https://huggingface.co/tss-deposium/qwen25-deposium-1024d}
}

Built by Nicolas Geysse - TSS Deposium

"We tried something new. It didn't work great. But we learned, and now you can too!" πŸš€

Our Better Models β€’ Report Issues

Downloads last month
132
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for tss-deposium/qwen25-deposium-1024d

Base model

Qwen/Qwen2.5-1.5B
Finetuned
(1315)
this model