bge-reranker-v2-m3

Multi-format version of BAAI/bge-reranker-v2-m3

Converted for deployment on Modal.com and other platforms.

Model Information

Property Value
Source Model BAAI/bge-reranker-v2-m3
Formats SafeTensors FP32 + ONNX FP32 + SafeTensors FP16 + ONNX INT8
Task text-classification
Trust Remote Code False

Available Versions

  • safetensors-fp32/: PyTorch FP32 (baseline, accuracy cao nhat)
  • onnx-fp32/: ONNX FP32 (portable, cross-platform)
  • safetensors-fp16/: PyTorch FP16 (GPU inference, giam ~50%)
  • onnx-int8/: ONNX INT8 Quantized (CPU inference, giam ~75%)

Usage

PyTorch (GPU - Modal.com)

from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch

# GPU inference voi FP16
model = AutoModelForSequenceClassification.from_pretrained(
    "n24q02m/bge-reranker-v2-m3/safetensors-fp16",
    torch_dtype=torch.float16
).cuda()
tokenizer = AutoTokenizer.from_pretrained(
    "n24q02m/bge-reranker-v2-m3/safetensors-fp16"
)

# Rerank inference
pairs = [["what is panda?", "The giant panda is a bear species."]]
with torch.no_grad():
    inputs = tokenizer(pairs, padding=True, truncation=True, return_tensors="pt").to("cuda")
    scores = model(**inputs).logits.view(-1,).float()
print(scores)

ONNX Runtime (CPU)

import numpy as np
import onnxruntime as ort
from transformers import AutoTokenizer

# CPU inference voi ONNX
tokenizer = AutoTokenizer.from_pretrained("BAAI/bge-reranker-v2-m3")
session = ort.InferenceSession(
    "n24q02m/bge-reranker-v2-m3/onnx-int8/model_quantized.onnx",
    providers=["CPUExecutionProvider"]
)

# Inference
inputs = tokenizer("Hello world", return_tensors="np")
input_names = [inp.name for inp in session.get_inputs()]
feed_dict = {}
for name in input_names:
    if name in inputs:
        feed_dict[name] = inputs[name].astype(np.int64)
outputs = session.run(None, feed_dict)

Notes

  1. SafeTensors FP16 la format chinh cho GPU inference tren Modal.com
  2. Load tokenizer tu model goc hoac tu cung folder
  3. ONNX INT8 la format cho CPU fallback, kich thuoc nho nhat

License

Apache 2.0 (following the original model license)

Credits

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for n24q02m/bge-reranker-v2-m3

Quantized
(30)
this model