bge-m3

Multi-format version of BAAI/bge-m3

Converted for deployment on Modal.com and other platforms.

Model Information

Property Value
Source Model BAAI/bge-m3
Formats SafeTensors FP32 + ONNX FP32 + SafeTensors FP16 + ONNX INT8
Task feature-extraction
Trust Remote Code False

Available Versions

  • safetensors-fp32/: PyTorch FP32 (baseline, accuracy cao nhat)
  • onnx-fp32/: ONNX FP32 (portable, cross-platform)
  • safetensors-fp16/: PyTorch FP16 (GPU inference, giam ~50%)
  • onnx-int8/: ONNX INT8 Quantized (CPU inference, giam ~75%)

Usage

PyTorch (GPU - Modal.com)

from transformers import AutoModel, AutoTokenizer
import torch

# GPU inference voi FP16 (khuyen nghi cho Modal.com)
model = AutoModel.from_pretrained(
    "n24q02m/bge-m3/safetensors-fp16",
    torch_dtype=torch.float16
).cuda()
tokenizer = AutoTokenizer.from_pretrained(
    "n24q02m/bge-m3/safetensors-fp16"
)

# Inference
inputs = tokenizer("Hello world", return_tensors="pt").to("cuda")
with torch.no_grad():
    outputs = model(**inputs)
    embeddings = outputs.last_hidden_state.mean(dim=1)  # Mean pooling

ONNX Runtime (CPU)

import numpy as np
import onnxruntime as ort
from transformers import AutoTokenizer

# CPU inference voi ONNX
tokenizer = AutoTokenizer.from_pretrained("BAAI/bge-m3")
session = ort.InferenceSession(
    "n24q02m/bge-m3/onnx-int8/model_quantized.onnx",
    providers=["CPUExecutionProvider"]
)

# Inference
inputs = tokenizer("Hello world", return_tensors="np")
input_names = [inp.name for inp in session.get_inputs()]
feed_dict = {}
for name in input_names:
    if name in inputs:
        feed_dict[name] = inputs[name].astype(np.int64)
outputs = session.run(None, feed_dict)

Notes

  1. SafeTensors FP16 la format chinh cho GPU inference tren Modal.com
  2. Load tokenizer tu model goc hoac tu cung folder
  3. ONNX INT8 la format cho CPU fallback, kich thuoc nho nhat

License

Apache 2.0 (following the original model license)

Credits

  • Original model: BAAI/bge-m3
  • Conversion: Optimum + PyTorch
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for n24q02m/bge-m3

Base model

BAAI/bge-m3
Quantized
(67)
this model