CernisOCR

A vision language model OCR model fine-tuned on Qwen2.5-VL-7B-Instruct for handling mathematical formulas, handwritten text, and structured documents in a single model.

Model Description

CernisOCR is a vision language model, optimized for diverse OCR tasks across multiple document domains. Unlike domain-specific OCR models, CernisOCR unifies three traditionally separate OCR tasks into a single, efficient model:

Mathematical LaTeX conversion: Converts handwritten or printed mathematical formulas to LaTeX notation
Handwritten text transcription: Transcribes cursive and printed handwriting
Structured document extraction: Extracts structured data from invoices and receipts

Key Features:

Multi-domain capability in a single model
Handles varied image types, layouts, and text styles
Extracts both raw text and structured information
Robust to noise and variable image quality

Training Details

Base Model: Qwen2.5-VL-7B-Instruct
Training Data: 10,000 samples from three domains:
- LaTeX OCR: 3,978 samples (mathematical notation)
- Invoices & Receipts: 2,043 samples (structured documents)
- Handwritten Text: 3,978 samples (handwriting transcription)
Fine-tuning Method: LoRA (Low-Rank Adaptation)
Training Loss: Reduced from 4.802 to 0.116 (97.6% improvement)
Training Time: ~8.7 minutes on RTX 5090

Intended Use

This model is designed for:

Mathematical formula recognition and LaTeX conversion
Handwritten text transcription
Invoice and receipt data extraction
Multi-domain document processing workflows
Applications requiring unified OCR across different document types

How to Use

from unsloth import FastVisionModel
from transformers import AutoTokenizer
from PIL import Image

# Load model and tokenizer
model, tokenizer = FastVisionModel.from_pretrained(
    "coolAI/cernis-ocr",  # or "coolAI/cernis-vision-ocr" for merged model
    load_in_4bit=True,
)
FastVisionModel.for_inference(model)

# Example 1: LaTeX conversion
image = Image.open("formula.png")
messages = [{
    "role": "user",
    "content": [
        {"type": "image", "image": image},
        {"type": "text", "text": "Write the LaTeX representation for this image."}
    ]
}]

# Example 2: Handwritten transcription
messages = [{
    "role": "user",
    "content": [
        {"type": "image", "image": image},
        {"type": "text", "text": "Transcribe the handwritten text in this image."}
    ]
}]

# Example 3: Invoice extraction
messages = [{
    "role": "user",
    "content": [
        {"type": "image", "image": image},
        {"type": "text", "text": "Extract and structure all text content from this invoice/receipt image."}
    ]
}]

# Generate
inputs = tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=True, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=2048, temperature=0.7)
text = tokenizer.decode(outputs[0], skip_special_tokens=True)

Citation

If you use this model, please cite:

@misc{cernis-ocr,
  title={CernisOCR: A Unified Multi-Domain OCR Model},
  author={Cernis AI},
  year={2025},
  howpublished={\url{https://huggingface.co/coolAI/cernis-ocr}}
}

Acknowledgments

Built using Unsloth for efficient fine-tuning. Training data sourced from publicly available OCR datasets on Hugging Face.

Downloads last month: 16

Safetensors

Model size

8B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for cernis-intelligence/cernis-vision-ocr

Base model

Qwen/Qwen2.5-VL-7B-Instruct

Quantized

unsloth/Qwen2.5-VL-7B-Instruct-bnb-4bit

Adapter

(4)

this model