M2V-BGE-M3-1024d

A high-performance Model2Vec distilled embedding model based on BAAI/bge-m3.

Key Features:

  • Ultra-fast inference (<1ms for 4 sentences on CPU)
  • 1024-dimensional embeddings
  • Multilingual support (100+ languages from BGE-M3)
  • 8192 token context window (inherited from BGE-M3)
  • ~500MB model size

MTEB Benchmark Results

Task Category Score
STS (Semantic Similarity) 0.5831
- STSBenchmark 0.5714
- SICK-R 0.5947
Classification (kNN) 0.6564
- Banking77 0.8027
- Emotion 0.5101
Clustering 0.1771
- TwentyNewsgroups 0.1771
Overall MTEB 0.4722

Comparison with Other Models

Model STS Classification Size Latency
M2V-BGE-M3-1024d 0.5831 0.6564 499 MB <1ms
M2V-Qwen3-0.6B 0.4845 0.5949 302 MB ~1ms
POTION-base-8M ~0.52 ~0.55 30 MB <1ms

Installation

pip install model2vec
# or
pip install sentence-transformers

Usage

Using Model2Vec (Fastest)

from model2vec import StaticModel

# Load the model
model = StaticModel.from_pretrained("tss-deposium/m2v-bge-m3-1024d")

# Compute embeddings
embeddings = model.encode(["Hello world", "Bonjour le monde"])
print(embeddings.shape)  # (2, 1024)

Using Sentence Transformers

from sentence_transformers import SentenceTransformer

# Load the model
model = SentenceTransformer("tss-deposium/m2v-bge-m3-1024d")

# Compute embeddings
embeddings = model.encode(["Hello world", "Bonjour le monde"])

Semantic Similarity Example

from model2vec import StaticModel
import numpy as np

model = StaticModel.from_pretrained("tss-deposium/m2v-bge-m3-1024d")

# Similar sentences
sent1 = "I want to find financial documents"
sent2 = "Looking for finance-related files"

# Different sentence
sent3 = "The weather is nice today"

emb1, emb2, emb3 = model.encode([sent1, sent2, sent3])

def cosine_sim(a, b):
    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

print(f"Similar: {cosine_sim(emb1, emb2):.3f}")    # ~0.85
print(f"Different: {cosine_sim(emb1, emb3):.3f}")  # ~0.55

Model Details

  • Base Model: BAAI/bge-m3
  • Distillation Method: Model2Vec with PCA (1024 dimensions)
  • Embedding Dimension: 1024
  • Max Sequence Length: 8192 tokens
  • Languages: 100+ (multilingual)
  • Model Size: ~499 MB

Use Cases

  • Semantic search and retrieval
  • Document similarity
  • Text classification (via kNN)
  • Clustering
  • RAG (Retrieval Augmented Generation) pipelines
  • Real-time applications requiring ultra-low latency

Limitations

  • Static embeddings don't capture context as well as transformer models
  • Lower quality than full BGE-M3 (~58% vs ~80% on STS benchmarks)
  • Best suited for applications where speed is critical

How It Works

Model2Vec distills a Sentence Transformer by:

  1. Passing vocabulary through the base model (BGE-M3)
  2. Reducing dimensionality with PCA (to 1024)
  3. Applying SIF weighting
  4. During inference: mean pooling of token embeddings

This results in embeddings that are 500x faster with only moderate quality loss.

Citation

@article{minishlab2024model2vec,
  author = {Tulkens, Stephan and {van Dongen}, Thomas},
  title = {Model2Vec: Fast State-of-the-Art Static Embeddings},
  year = {2024},
  url = {https://github.com/MinishLab/model2vec}
}

@misc{bge-m3,
  title={BGE M3-Embedding},
  author={Chen, Jianlv and Xiao, Shitao and Zhang, Peitian and Luo, Kun and Lian, Defu and Liu, Zheng},
  year={2024},
  publisher={Hugging Face}
}

License

MIT License - Same as base BGE-M3 model.


Created by: The Seed Ship / Deposium Project

Downloads last month
17
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for tss-deposium/m2v-bge-m3-1024d

Base model

BAAI/bge-m3
Finetuned
(350)
this model