s-emanuilov (Simeon Emanuilov)

posted an update 24 days ago

Post

289

Converted PaddleOCR models to ONNX for easier deployment and faster inference.

These have been working well in production at Monkt.com, so figured I'd share them with the community.

Just straight conversions of the original models—might save you some time if you're building OCR pipelines.

monkt/paddleocr-onnx

reacted to burtenshaw's post with ❤️ 3 months ago

Post

5515

Smol course has a distinctive approach to teaching post-training, so I'm posting about how it’s different to other post-training courses, including the llm course that’s already available.

In short, the smol course is just more direct that any of the other course, and intended for semi-pro post trainers.

- It’s a minimal set of instructions on the core parts.
- It’s intended to bootstrap real projects you're working on.
- The material handsover to existing documentation for details
- Likewise, it handsover to the LLM course for basics.
- Assessment is based on a leaderboard, without reading all the material.

To start the smol course, follow here:

smol-course

replied to their post 3 months ago

I got way better results now! Just needed to use the recommended version of transformers.

I'll edit the main post when I'm ready with the graphs.

Thanks once more time.

replied to their post 3 months ago

Hey Tom,

First, appreciate your work! Thanks for everything you're doing.

I did use the prompt dict for "intfloat/multilingual-e5-large", like: prompts = {"query": "query: ", "passage": "passage: "} to SentenceTransformer.

For "google/embeddinggemma-300m", I kept the default: model = SentenceTransformer("google/embeddinggemma-300m") and then evaluated with MTEB library, assuming that "MTEB will automatically detect and use these prompts if they are defined in your model's configuration," as written here https://sbert.net/docs/sentence_transformer/usage/mteb_evaluation.html

So in short, I did not add prompts for EmbeddingGemma, but added them to multilingual-e5-large, as per their instructions (didn't have time to check their model config, but I think it's not added by default).

BUT, I ran with transformers==4.55.4, so need to re-run maybe...
sentence-transformers==5.1.0, which is fine I guess.

Thanks!

posted an update 3 months ago

Post

292

Ran MTEB evaluation on Bulgarian tasks comparing EmbeddingGemma-300M ( google/embeddinggemma-300m) vs Multilingual-E5-Large ( intfloat/multilingual-e5-large)

EmbeddingGemma-300M scored 71.6% average while E5-Large got 75.9%. Pretty solid results for EmbeddingGemma considering it's half the size and uses way less resources.

EmbeddingGemma actually beats E5-Large on sentiment analysis and natural language inference. E5-Large wins on retrieval and bitext mining tasks.

The 300M model has 4x longer context window (2048 vs 512 tokens) and lower carbon footprint which is good.

Both models work great for Bulgarian but have different strengths depending what you need.

Blog article about the usage: https://huggingface.co/blog/embeddinggemma

PS: Don't forget to use the recommended libraries versions :D

pip install git+https://github.com/huggingface/transformers@v4.56.0-Embedding-Gemma-preview
pip install sentence-transformers>=5.0.0

4 replies

·

posted an update 3 months ago

Post

319

Embeddings are pretty useful, but mathematically limited.

Great insights from Google DeepMind: On the Theoretical Limitations of Embedding-Based Retrieval (2508.21038)

What could be the alternative? Cross-Encoders (good but can't scale), Multi-vector, Sparse models...or something new.

Hybrid retrieval is the current quick fix, imo.

reacted to danielhanchen's post with 🔥 6 months ago

Post

2511

Mistral releases Magistral, their new reasoning models! 🔥
GGUFs to run: unsloth/Magistral-Small-2506-GGUF

Magistral-Small-2506 excels at mathematics and coding.

You can run the 24B model locally with just 32GB RAM by using our Dynamic GGUFs.

replied to their post 10 months ago

try to reduce gpu_memory_utilization to some lower coefficient

replied to their post 10 months ago

Thank you.

I’m also a big fan of Qwen models. However, in this case, I don’t think they are appropriate because I’m not entirely confident in their capabilities regarding multilingual contexts. That’s why I chose Llama.

Overall, I agree that the Qwen series is excellent for most tasks.

posted an update 10 months ago

Post

5470

Tutorial 💥 Training a non-English reasoning model with GRPO and Unsloth

I wanted to share my experiment with training reasoning models in languages other than English/Chinese.

Using Llama 3.1 8B as base, GRPO trainer from trl, and Unsloth optimizations, I got a working prototype in Bulgarian after ~5 hours on an L40S GPU. The approach should work for any language where the base model has some pre-training coverage.

Full code and tutorial here: https://unfoldai.com/reasoning-in-a-non-english-language/

The model itself: s-emanuilov/LLMBG-Llama-3.1-8B-BG-Reasoning-v0.1

I hope this helps anyone looking to build reasoning models in their language.

4 replies

·

reacted to m-ric's post with 🔥 10 months ago

Post

10017

Introducing 𝗼𝗽𝗲𝗻 𝗗𝗲𝗲𝗽-𝗥𝗲𝘀𝗲𝗮𝗿𝗰𝗵 by Hugging Face! 💥

OpenAI's latest agentic app Deep Research seems really good... But it's closed, as usual.

⏱️ So with a team of cracked colleagues, we set ourselves a 24hours deadline to replicate and open-source Deep Research! ⏱️

➡️ We built open-Deep-Research, an entirely open agent that can: navigate the web autonomously, scroll and search through pages, download and manipulate files, run calculation on data...

We aimed for the best performance: are the agent's answers really rigorous?

On GAIA benchmark, Deep Research had 67% accuracy on the validation set.
➡️ open Deep Research is at 55% (powered by o1), it is:
- the best pass@1 solution submitted
- the best open solution 💪💪

And it's only getting started ! Please jump in, drop PRs, and let's bring it to the top !

Read the blog post 👉 https://huggingface.co/blog/open-deep-research

reacted to clem's post with ❤️ 10 months ago

Post

2482

The 🐳 just crossed 10,000 followers on HF

deepseek-ai

reacted to AdinaY's post with 🔥 11 months ago

Post

2879

BIG release by DeepSeek AI🔥🔥🔥

DeepSeek-R1 & DeepSeek-R1-Zero: two 660B reasoning models are here, alongside 6 distilled dense models (based on Llama & Qwen) for the community!

deepseek-ai
deepseek-ai/DeepSeek-R1

✨ MIT License : enabling distillation for custom models
✨ 32B & 70B models match OpenAI o1-mini in multiple capabilities
✨ API live now! Access Chain of Thought reasoning with model='deepseek-reasoner'

reacted to merve's post with ❤️ 11 months ago

Post

2659

Everything that happened this week in open AI, a recap 🤠 merve/jan-17-releases-678a673a9de4a4675f215bf5

👀 Multimodal
- MiniCPM-o 2.6 is a new sota any-to-any model by OpenBMB
(vision, speech and text!)
- VideoChat-Flash-Qwen2.5-2B is new video multimodal models by OpenGVLab that come in sizes 2B & 7B in resolutions 224 & 448
- ByteDance released larger SA2VA that comes in 26B parameters
- Dataset: VRC-Bench is a new diverse benchmark for multimodal LLM reasoning performance

💬 LLMs
- MiniMax-Text-01 is a new huge language model (456B passive 45.9B active params) by MiniMaxAI with context length of 4M tokens 🤯
- Dataset: Sky-T1-data-17k is a diverse dataset used to train Sky-T1-32B
- kyutai released Helium-1-Preview-2B is a new small multilingual LM
- Wayfarer-12B is a new LLM able to write D&D 🧙🏻‍♂️
- ReaderLM-v2 is a new HTML parsing model by Jina AI

- Dria released, Dria-Agent-a-3B, new agentic coding model (Pythonic function calling) based on Qwen2.5 Coder
- Unsloth released Phi-4, faster and memory efficient Llama 3.3

🖼️ Vision
- MatchAnything is a new foundation model for matching
- FitDit is a high-fidelity VTON model based on DiT architecture

🗣️ Audio
- OuteTTS-0.3-1B is a new multilingual text-to-speech model with voice cloning and emotion control capabilities

📖 Retrieval
- lightblue released a new reranker based on Qwen2.5 LB-reranker-0.5B-v1.0 that can handle 95+ languages
- cde-small-v2 is a new sota small retrieval model by
@jxm

reacted to tomaarsen's post with ❤️ 11 months ago

Post

4859

🏎️ Today I'm introducing a method to train static embedding models that run 100x to 400x faster on CPU than common embedding models, while retaining 85%+ of the quality! Including 2 fully open models: training scripts, datasets, metrics.

We apply our recipe to train 2 Static Embedding models that we release today! We release:
2️⃣ an English Retrieval model and a general-purpose Multilingual similarity model (e.g. classification, clustering, etc.), both Apache 2.0
🧠 my modern training strategy: ideation -> dataset choice -> implementation -> evaluation
📜 my training scripts, using the Sentence Transformers library
📊 my Weights & Biases reports with losses & metrics
📕 my list of 30 training and 13 evaluation datasets

The 2 Static Embedding models have the following properties:
🏎️ Extremely fast, e.g. 107500 sentences per second on a consumer CPU, compared to 270 for 'all-mpnet-base-v2' and 56 for 'gte-large-en-v1.5'
0️⃣ Zero active parameters: No Transformer blocks, no attention, not even a matrix multiplication. Super speed!
📏 No maximum sequence length! Embed texts at any length (note: longer texts may embed worse)
📐 Linear instead of exponential complexity: 2x longer text takes 2x longer, instead of 2.5x or more.
🪆 Matryoshka support: allow you to truncate embeddings with minimal performance loss (e.g. 4x smaller with a 0.56% perf. decrease for English Similarity tasks)

Check out the full blogpost if you'd like to 1) use these lightning-fast models or 2) learn how to train them with consumer-level hardware: https://huggingface.co/blog/static-embeddings

The blogpost contains a lengthy list of possible advancements; I'm very confident that our 2 models are only the tip of the iceberg, and we may be able to get even better performance.

Alternatively, check out the models:
* sentence-transformers/static-retrieval-mrl-en-v1
* sentence-transformers/static-similarity-mrl-multilingual-v1

1 reply

·

posted an update 11 months ago

Post

749

A new benchmark (DPAB-α) has been released that evaluates LLM function calling in both Pythonic and JSON approaches.

It shows that Pythonic function calling often outperforms traditional JSON-based methods, especially for complex multi-step tasks.

Key findings from benchmarks:
— Claude 3.5 Sonnet leads with 87% on Pythonic vs 45% on JSON
— Smaller models show impressive results (Dria-Agent-α-3B: 72% Pythonic)
— Even larger models like DeepSeek V3 (685B) show significant gaps (63% Pythonic vs 33% JSON)

If you're building or using LLM agents, these results suggest that how you implement function calling could impact performance - might be worth reconsidering JSON-only approaches.

The benchmark: https://github.com/firstbatchxyz/function-calling-eval
Blog post: https://huggingface.co/blog/andthattoo/dpab-a

reacted to AdinaY's post with 🔥 11 months ago

Post

3702

MiniMax, the company behind Hailuo_AI, has joined the open source community by releasing both models and demos of MiniMax-Text-01 & MiniMax-VL-01🔥
- Model
MiniMaxAI/MiniMax-VL-01
MiniMaxAI/MiniMax-Text-01
- Demo
MiniMaxAI/MiniMax-VL-01
MiniMaxAI/MiniMax-Text-01

✨ MiniMax-text-01:
- 456B with 45.9B activated per token
- Combines Lightning Attention, Softmax Attention, and MoE for optimal performance
- Training context up to 1M tokens, inference handles 4M tokens

✨ MiniMax-VL-01:
- ViT-MLP-LLM framework ( non-transformer👀)
- Handles image inputs from 336×336 to 2016×2016
- 694M image-caption pairs + 512B tokens processed across 4 stages

1 reply

·

posted an update 11 months ago

Post

583

New paper from Salesforce AI Research. The authors found that joint training, continual pre-training (CPT), and instruction tuning with a 50/50 data split achieve better results than sequential training. Their 8B parameter model outperformed larger 70B models on financial tasks.

Down-sampling CPT data to match IT data size improved performance on CFA Challenge exams from 34.44% to 55.56%, while maintaining strong general knowledge capabilities as shown by comparable or better performance on general knowledge benchmarks like AI2-ARC and MMLU.

Technical implementation involved two-stage training: Group 1 utilized 3.84B tokens from web and basic texts, followed by Group 2, which used 1.66B tokens from domain-specific books. Their preference alignment method used generative reward models to identify and correct reasoning errors rather than just rating full solutions.

Evaluation on 91,872 samples across 31 tasks showed their Llama-Fin model achieving 91.13% accuracy on sentiment analysis (FPB) and 95.32% on FiQA SA, exceeding GPT-4's performance of 82.16% and 68.51%, respectively, on these benchmarks.

It could be useful for many financial companies looking to build AI pipelines.

Interesting read, but neither the model nor GitHub repo is accessible yet. The key insight for AI builders is that with small models - it is fully possible to outperform much bigger models.

https://arxiv.org/abs/2501.04961

1 reply

·

reacted to danielhanchen's post with 🔥 11 months ago

Post

4870

We fixed many bugs in Phi-4 & uploaded fixed GGUF + 4-bit versions! ✨

Our fixed versions are even higher on the Open LLM Leaderboard than Microsoft's!

GGUFs: unsloth/phi-4-GGUF
Dynamic 4-bit: unsloth/phi-4-unsloth-bnb-4bit

You can also now finetune Phi-4 for free on Colab: https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Phi_4-Conversational.ipynb

Read our blogpost for more details on bug fixes etc: https://unsloth.ai/blog/phi4

replied to their post 11 months ago

Yeah, the issues with the tables.

For office formats, it's mostly fine. You tried using PDF or images?

I will work on improving this.

Simeon Emanuilov PRO

AI & ML interests

Recent Activity

Organizations

Simeon Emanuilov PRO

AI & ML interests

Recent Activity

Organizations

s-emanuilov's activity