Susant Achary
Susant-Achary
AI & ML interests
Tiny to Small Language Models, Building from India. Quantization and MLX
Recent Activity
liked
a model
21 days ago
mlx-community/medgemma-27b-it-8bit
upvoted
an
article
22 days ago
We’re open-sourcing our text-to-image model and the process behind it
liked
a model
29 days ago
vandijklab/C2S-Scale-Gemma-2-27B
Organizations
<7B Best of MoE 🧠
Collection of Small size big impact MoE.
-
LiquidAI/LFM2-8B-A1B
Text Generation • 8B • Updated • 15.8k • 261 -
ibm-granite/granite-4.0-h-tiny
Text Generation • 7B • Updated • 28.9k • 169 -
microsoft/Phi-4-multimodal-instruct
Automatic Speech Recognition • 6B • Updated • 403k • 1.54k -
google/gemma-3n-E4B-it
Image-Text-to-Text • 8B • Updated • 67.6k • 824
Audio Features
Feature Extraction with 🧠 Text Embeddings
models for turning text, images, audio (and combos) into useful vectors or feature maps. Ideal for search/RAG, clustering, recommendation, retrieval.
🪶 Sept’25 <Text Generation Language Models >(Top Releases)
coding models and pipelines released this month that boost repo-level reasoning, GUI automation, and tool use. Focused on practical editing.
🖼️ **Text2Image, i2i ** September ’25 (Top Releases)
Cutting-edge image generation & VLM updates from September ’25. This collection spotlights models that improved text rendering, layout control & more.
📄➡️🔊 Text-to-Speech (TTS)
Speech synthesis models that turn text into natural audio. Includes multilingual TTS, low-latency real-time models, and voice-cloning variants.
📚➡️🎨Text-to-Image
State-of-the-art diffusion and generative models that turn text prompts into detailed images. Includes lightweight CPU-friendly and photorealistic mdl
🎨➡️✍️ Image-to-Text
OCR, captioning, and visual QA models that turn pure images into descriptive or structured text.
-
Salesforce/blip-image-captioning-base
Image-to-Text • Updated • 2.36M • 820 -
Salesforce/blip-image-captioning-large
Image-to-Text • 0.5B • Updated • 1.05M • 1.44k -
nlpconnect/vit-gpt2-image-captioning
Image-to-Text • Updated • 1.32M • 920 -
microsoft/trocr-base-handwritten
Image-to-Text • 0.3B • Updated • 107k • 466
🌀 Any-to-Any Multimodal Models
Models that can flexibly convert across modalities (text, image, audio, video). Ideal for researchers exploring unified multimodal-AI.
👨💻Mathematical Reasoning 🧮
Datasets tackling AI Toughest Challenges
🧩 Long-Context Models (≥128k) CODING
10 CODING models that support ≥128k context (native or via officially documented scaling)
-
meta-llama/Llama-3.1-8B-Instruct
Text Generation • 8B • Updated • 5.08M • • 5.08k -
google/gemma-3-4b-it
Image-Text-to-Text • 4B • Updated • 1.04M • 1.01k -
Qwen/Qwen3-Coder-30B-A3B-Instruct
Text Generation • 31B • Updated • 1.14M • • 796 -
deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct
Text Generation • 16B • Updated • 229k • • 505
🧩 Long-Context Models (≥128k) under 8B
Qwen3
Best of Qwen3 Series of Models
-
Qwen/Qwen3-30B-A3B-Instruct-2507
Text Generation • 31B • Updated • 598k • • 682 -
Qwen/Qwen3-Next-80B-A3B-Thinking
Text Generation • 81B • Updated • 180k • • 452 -
Qwen/Qwen3-Coder-30B-A3B-Instruct
Text Generation • 31B • Updated • 1.14M • • 796 -
Qwen/Qwen3-Omni-30B-A3B-Instruct
Any-to-Any • 35B • Updated • 278k • 743
🛩️Qwen3-VL
the most powerful vision-language model in the Qwen series to date. Available in Dense and MoE architectures
-
Qwen/Qwen3-VL-30B-A3B-Thinking
Image-Text-to-Text • 31B • Updated • 54.3k • • 164 -
mlx-community/Qwen3-VL-30B-A3B-Instruct-4bit
Image-Text-to-Text • Updated • 205 • 5 -
mlx-community/Qwen3-VL-30B-A3B-Instruct-8bit
Image-Text-to-Text • Updated • 99 • 2 -
mlx-community/Qwen3-VL-8B-Instruct-4bit
Image-Text-to-Text • Updated • 296 • 3
🍎 MLX-Quantized Models (3/4/5/6-bit) Mac & iOS
Curated MLX-ready quantized LLMs that run fast on Apple Silicon (and some on iOS). Every card lists Bits · Group size · Peak UM (GB) · Stable context.
-
mlx-community/Apriel-1.5-15b-Thinker-3bit-MLX
Image-Text-to-Text • Updated • 7 -
mlx-community/Apriel-1.5-15b-Thinker-6bit-MLX
Image-Text-to-Text • Updated • 71 • 1 -
mlx-community/granite-4.0-h-tiny-3bit-MLX
Text Generation • 0.9B • Updated • 42 • 2 -
mlx-community/granite-4.0-tiny-preview-4bit
Text Generation • 1B • Updated • 8
🖼️ Vision Backbones & Image Embeddings
-
facebook/dinov2-base
Image Feature Extraction • 86.6M • Updated • 1.39M • 161 -
openai/clip-vit-large-patch14-336
Zero-Shot Image Classification • Updated • 5.66M • 281 -
google/siglip-so400m-patch14-384
Zero-Shot Image Classification • 0.9B • Updated • 2M • 624 -
BAAI/EVA-CLIP-8B
Feature Extraction • Updated • 2.76k • 50
🧊Sept 25 <Image-to-3D> [Top Releases]
Models that turn a single image (or image+prompt) into 3D assets meshes, Gaussians, or point clouds suited for AR/VR, product turntables, game props.
🎬 ✍️ Sept 25 <Video & Text2Video> (Top Releases)
open T2V & animation models emphasizing temporal coherence, controllability, and real-time playback. Great starting point for creative tools, Ads.
Top Apache 2.0 License
Free and Open Source provided you don't source model and claim right
-
openai/whisper-large-v3
Automatic Speech Recognition • 2B • Updated • 4.95M • • 5.17k -
facebook/wav2vec2-base-960h
Automatic Speech Recognition • 94.4M • Updated • 1.96M • 383 -
openai/whisper-small
Automatic Speech Recognition • 0.2B • Updated • 4.4M • 491 -
openai/whisper-tiny
Automatic Speech Recognition • 37.8M • Updated • 1.09M • 386
✍️➡️🎬 Text-to-Video
Models that create short videos from written prompts. Perfect for experimentation in generative video and creative storytelling.
🖌️ Image-to-Image
Image editing and transformation models :- from style transfer to super-resolution, inpainting, and diffusion-based edits.
🖼️➡️📚 Image-Text-to-Text
Multimodal models that take image + text as input and produce natural language output. Use cases: chart QA, visual document reasoning, VQA.
-
Qwen/Qwen2.5-VL-7B-Instruct
Image-Text-to-Text • 8B • Updated • 3.44M • • 1.38k -
Qwen/Qwen2.5-VL-3B-Instruct
Image-Text-to-Text • 4B • Updated • 7.96M • 566 -
google/gemma-3-4b-it
Image-Text-to-Text • 4B • Updated • 1.04M • 1.01k -
nvidia/Llama-3.1-Nemotron-Nano-VL-8B-V1
Image-Text-to-Text • Updated • 1.01M • 168
✍️ Text Generation
Collection of top open LLMs for writing, summarization, chat, reasoning, and document drafting. Includes small SLMs for devices and large models .
🧠General Purpose Dataset < 10M samples
Dataset that can 🌐chat, ⚡code and 🧮reasoning
🍎 MLX-Ready LLMs
MLX weights and proven for MLX inference
-
mlx-community/gpt-oss-20b-MXFP4-Q8
Text Generation • 21B • Updated • 766k • 20 -
lmstudio-community/Seed-OSS-36B-Instruct-MLX-4bit
Text Generation • 36B • Updated • 56.1k -
lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bit
Text Generation • 0.6B • Updated • 98.8k • 9 -
mlx-community/parakeet-tdt-0.6b-v2
Automatic Speech Recognition • 0.6B • Updated • 300k • 32
📱 OnDevice -Ready SLMs (≤4B)
Tiny, fast models that run on iPhone/iPad or Mac with very low memory. Great for quick replies, offline note-assist, and routing
-
lmstudio-community/Qwen3-4B-Thinking-2507-MLX-8bit
Text Generation • 1B • Updated • 96.6k • 7 -
lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-4bit
Text Generation • 1B • Updated • 234k • 6 -
lmstudio-community/gemma-3n-E4B-it-MLX-4bit
Image-Text-to-Text • Updated • 138k • 1 -
mlx-community/gemma-3-4b-it-qat-4bit
Image-Text-to-Text • 0.9B • Updated • 41.1k • 5
GPT2-JungleBook-from-Scratch-Models
The primary objective of project is to explore & analyze the impact of model size on text generation quality with GPT-2 arch trained from scratch.
Vision-LM
🛩️Qwen3-VL
the most powerful vision-language model in the Qwen series to date. Available in Dense and MoE architectures
-
Qwen/Qwen3-VL-30B-A3B-Thinking
Image-Text-to-Text • 31B • Updated • 54.3k • • 164 -
mlx-community/Qwen3-VL-30B-A3B-Instruct-4bit
Image-Text-to-Text • Updated • 205 • 5 -
mlx-community/Qwen3-VL-30B-A3B-Instruct-8bit
Image-Text-to-Text • Updated • 99 • 2 -
mlx-community/Qwen3-VL-8B-Instruct-4bit
Image-Text-to-Text • Updated • 296 • 3
<7B Best of MoE 🧠
Collection of Small size big impact MoE.
-
LiquidAI/LFM2-8B-A1B
Text Generation • 8B • Updated • 15.8k • 261 -
ibm-granite/granite-4.0-h-tiny
Text Generation • 7B • Updated • 28.9k • 169 -
microsoft/Phi-4-multimodal-instruct
Automatic Speech Recognition • 6B • Updated • 403k • 1.54k -
google/gemma-3n-E4B-it
Image-Text-to-Text • 8B • Updated • 67.6k • 824
🍎 MLX-Quantized Models (3/4/5/6-bit) Mac & iOS
Curated MLX-ready quantized LLMs that run fast on Apple Silicon (and some on iOS). Every card lists Bits · Group size · Peak UM (GB) · Stable context.
-
mlx-community/Apriel-1.5-15b-Thinker-3bit-MLX
Image-Text-to-Text • Updated • 7 -
mlx-community/Apriel-1.5-15b-Thinker-6bit-MLX
Image-Text-to-Text • Updated • 71 • 1 -
mlx-community/granite-4.0-h-tiny-3bit-MLX
Text Generation • 0.9B • Updated • 42 • 2 -
mlx-community/granite-4.0-tiny-preview-4bit
Text Generation • 1B • Updated • 8
Audio Features
🖼️ Vision Backbones & Image Embeddings
-
facebook/dinov2-base
Image Feature Extraction • 86.6M • Updated • 1.39M • 161 -
openai/clip-vit-large-patch14-336
Zero-Shot Image Classification • Updated • 5.66M • 281 -
google/siglip-so400m-patch14-384
Zero-Shot Image Classification • 0.9B • Updated • 2M • 624 -
BAAI/EVA-CLIP-8B
Feature Extraction • Updated • 2.76k • 50
Feature Extraction with 🧠 Text Embeddings
models for turning text, images, audio (and combos) into useful vectors or feature maps. Ideal for search/RAG, clustering, recommendation, retrieval.
🧊Sept 25 <Image-to-3D> [Top Releases]
Models that turn a single image (or image+prompt) into 3D assets meshes, Gaussians, or point clouds suited for AR/VR, product turntables, game props.
🪶 Sept’25 <Text Generation Language Models >(Top Releases)
coding models and pipelines released this month that boost repo-level reasoning, GUI automation, and tool use. Focused on practical editing.
🎬 ✍️ Sept 25 <Video & Text2Video> (Top Releases)
open T2V & animation models emphasizing temporal coherence, controllability, and real-time playback. Great starting point for creative tools, Ads.
🖼️ **Text2Image, i2i ** September ’25 (Top Releases)
Cutting-edge image generation & VLM updates from September ’25. This collection spotlights models that improved text rendering, layout control & more.
Top Apache 2.0 License
Free and Open Source provided you don't source model and claim right
-
openai/whisper-large-v3
Automatic Speech Recognition • 2B • Updated • 4.95M • • 5.17k -
facebook/wav2vec2-base-960h
Automatic Speech Recognition • 94.4M • Updated • 1.96M • 383 -
openai/whisper-small
Automatic Speech Recognition • 0.2B • Updated • 4.4M • 491 -
openai/whisper-tiny
Automatic Speech Recognition • 37.8M • Updated • 1.09M • 386
📄➡️🔊 Text-to-Speech (TTS)
Speech synthesis models that turn text into natural audio. Includes multilingual TTS, low-latency real-time models, and voice-cloning variants.
✍️➡️🎬 Text-to-Video
Models that create short videos from written prompts. Perfect for experimentation in generative video and creative storytelling.
📚➡️🎨Text-to-Image
State-of-the-art diffusion and generative models that turn text prompts into detailed images. Includes lightweight CPU-friendly and photorealistic mdl
🖌️ Image-to-Image
Image editing and transformation models :- from style transfer to super-resolution, inpainting, and diffusion-based edits.
🎨➡️✍️ Image-to-Text
OCR, captioning, and visual QA models that turn pure images into descriptive or structured text.
-
Salesforce/blip-image-captioning-base
Image-to-Text • Updated • 2.36M • 820 -
Salesforce/blip-image-captioning-large
Image-to-Text • 0.5B • Updated • 1.05M • 1.44k -
nlpconnect/vit-gpt2-image-captioning
Image-to-Text • Updated • 1.32M • 920 -
microsoft/trocr-base-handwritten
Image-to-Text • 0.3B • Updated • 107k • 466
🖼️➡️📚 Image-Text-to-Text
Multimodal models that take image + text as input and produce natural language output. Use cases: chart QA, visual document reasoning, VQA.
-
Qwen/Qwen2.5-VL-7B-Instruct
Image-Text-to-Text • 8B • Updated • 3.44M • • 1.38k -
Qwen/Qwen2.5-VL-3B-Instruct
Image-Text-to-Text • 4B • Updated • 7.96M • 566 -
google/gemma-3-4b-it
Image-Text-to-Text • 4B • Updated • 1.04M • 1.01k -
nvidia/Llama-3.1-Nemotron-Nano-VL-8B-V1
Image-Text-to-Text • Updated • 1.01M • 168
🌀 Any-to-Any Multimodal Models
Models that can flexibly convert across modalities (text, image, audio, video). Ideal for researchers exploring unified multimodal-AI.
✍️ Text Generation
Collection of top open LLMs for writing, summarization, chat, reasoning, and document drafting. Includes small SLMs for devices and large models .
👨💻Mathematical Reasoning 🧮
Datasets tackling AI Toughest Challenges
🧠General Purpose Dataset < 10M samples
Dataset that can 🌐chat, ⚡code and 🧮reasoning
🧩 Long-Context Models (≥128k) CODING
10 CODING models that support ≥128k context (native or via officially documented scaling)
-
meta-llama/Llama-3.1-8B-Instruct
Text Generation • 8B • Updated • 5.08M • • 5.08k -
google/gemma-3-4b-it
Image-Text-to-Text • 4B • Updated • 1.04M • 1.01k -
Qwen/Qwen3-Coder-30B-A3B-Instruct
Text Generation • 31B • Updated • 1.14M • • 796 -
deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct
Text Generation • 16B • Updated • 229k • • 505
🍎 MLX-Ready LLMs
MLX weights and proven for MLX inference
-
mlx-community/gpt-oss-20b-MXFP4-Q8
Text Generation • 21B • Updated • 766k • 20 -
lmstudio-community/Seed-OSS-36B-Instruct-MLX-4bit
Text Generation • 36B • Updated • 56.1k -
lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bit
Text Generation • 0.6B • Updated • 98.8k • 9 -
mlx-community/parakeet-tdt-0.6b-v2
Automatic Speech Recognition • 0.6B • Updated • 300k • 32
🧩 Long-Context Models (≥128k) under 8B
📱 OnDevice -Ready SLMs (≤4B)
Tiny, fast models that run on iPhone/iPad or Mac with very low memory. Great for quick replies, offline note-assist, and routing
-
lmstudio-community/Qwen3-4B-Thinking-2507-MLX-8bit
Text Generation • 1B • Updated • 96.6k • 7 -
lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-4bit
Text Generation • 1B • Updated • 234k • 6 -
lmstudio-community/gemma-3n-E4B-it-MLX-4bit
Image-Text-to-Text • Updated • 138k • 1 -
mlx-community/gemma-3-4b-it-qat-4bit
Image-Text-to-Text • 0.9B • Updated • 41.1k • 5
Qwen3
Best of Qwen3 Series of Models
-
Qwen/Qwen3-30B-A3B-Instruct-2507
Text Generation • 31B • Updated • 598k • • 682 -
Qwen/Qwen3-Next-80B-A3B-Thinking
Text Generation • 81B • Updated • 180k • • 452 -
Qwen/Qwen3-Coder-30B-A3B-Instruct
Text Generation • 31B • Updated • 1.14M • • 796 -
Qwen/Qwen3-Omni-30B-A3B-Instruct
Any-to-Any • 35B • Updated • 278k • 743
GPT2-JungleBook-from-Scratch-Models
The primary objective of project is to explore & analyze the impact of model size on text generation quality with GPT-2 arch trained from scratch.