M2V-BGE-M3-1024d

A high-performance Model2Vec distilled embedding model based on BAAI/bge-m3.

Key Features:

Ultra-fast inference (<1ms for 4 sentences on CPU)
1024-dimensional embeddings
Multilingual support (100+ languages from BGE-M3)
8192 token context window (inherited from BGE-M3)
~500MB model size

MTEB Benchmark Results

Task Category	Score
STS (Semantic Similarity)	0.5831
- STSBenchmark	0.5714
- SICK-R	0.5947
Classification (kNN)	0.6564
- Banking77	0.8027
- Emotion	0.5101
Clustering	0.1771
- TwentyNewsgroups	0.1771
Overall MTEB	0.4722

Comparison with Other Models

Model	STS	Classification	Size	Latency
M2V-BGE-M3-1024d	0.5831	0.6564	499 MB	<1ms
M2V-Qwen3-0.6B	0.4845	0.5949	302 MB	~1ms
POTION-base-8M	~0.52	~0.55	30 MB	<1ms

Installation

pip install model2vec
# or
pip install sentence-transformers

Usage

Using Model2Vec (Fastest)

from model2vec import StaticModel

# Load the model
model = StaticModel.from_pretrained("tss-deposium/m2v-bge-m3-1024d")

# Compute embeddings
embeddings = model.encode(["Hello world", "Bonjour le monde"])
print(embeddings.shape)  # (2, 1024)

Using Sentence Transformers

from sentence_transformers import SentenceTransformer

# Load the model
model = SentenceTransformer("tss-deposium/m2v-bge-m3-1024d")

# Compute embeddings
embeddings = model.encode(["Hello world", "Bonjour le monde"])

Semantic Similarity Example

from model2vec import StaticModel
import numpy as np

model = StaticModel.from_pretrained("tss-deposium/m2v-bge-m3-1024d")

# Similar sentences
sent1 = "I want to find financial documents"
sent2 = "Looking for finance-related files"

# Different sentence
sent3 = "The weather is nice today"

emb1, emb2, emb3 = model.encode([sent1, sent2, sent3])

def cosine_sim(a, b):
    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

print(f"Similar: {cosine_sim(emb1, emb2):.3f}")    # ~0.85
print(f"Different: {cosine_sim(emb1, emb3):.3f}")  # ~0.55

Model Details

Base Model: BAAI/bge-m3
Distillation Method: Model2Vec with PCA (1024 dimensions)
Embedding Dimension: 1024
Max Sequence Length: 8192 tokens
Languages: 100+ (multilingual)
Model Size: ~499 MB

Use Cases

Semantic search and retrieval
Document similarity
Text classification (via kNN)
Clustering
RAG (Retrieval Augmented Generation) pipelines
Real-time applications requiring ultra-low latency

Limitations

Static embeddings don't capture context as well as transformer models
Lower quality than full BGE-M3 (~58% vs ~80% on STS benchmarks)
Best suited for applications where speed is critical

How It Works

Model2Vec distills a Sentence Transformer by:

Passing vocabulary through the base model (BGE-M3)
Reducing dimensionality with PCA (to 1024)
Applying SIF weighting
During inference: mean pooling of token embeddings

This results in embeddings that are 500x faster with only moderate quality loss.

Citation

@article{minishlab2024model2vec,
  author = {Tulkens, Stephan and {van Dongen}, Thomas},
  title = {Model2Vec: Fast State-of-the-Art Static Embeddings},
  year = {2024},
  url = {https://github.com/MinishLab/model2vec}
}

@misc{bge-m3,
  title={BGE M3-Embedding},
  author={Chen, Jianlv and Xiao, Shitao and Zhang, Peitian and Luo, Kun and Lian, Defu and Liu, Zheng},
  year={2024},
  publisher={Hugging Face}
}

License

MIT License - Same as base BGE-M3 model.

Created by: The Seed Ship / Deposium Project

Downloads last month: 17

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for tss-deposium/m2v-bge-m3-1024d

Base model

BAAI/bge-m3

Finetuned

(350)

this model