You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

German-OCR

High-performance German document OCR using fine-tuned Qwen2-VL-2B & Qwen2.5-VL-3B vision-language model

Model Description

German-OCR is specifically trained to extract text from German documents including invoices, receipts, forms, and other business documents. It outputs structured text in Markdown format.

Base Model: Qwen/Qwen2-VL-2B-Instruct
Fine-tuning: QLoRA (4-bit quantization)
Training Data: German invoices and business documents
Output Format: Markdown structured text

Model Variants

Model	Size	Base	HuggingFace
german-ocr	4.4 GB	Qwen2-VL-2B	Keyven/german-ocr
german-ocr-3b	7.5 GB	Qwen2.5-VL-3B	Keyven/german-ocr-3b

Usage

Option 1: Python Package (Recommended)

pip install german-ocr

from german_ocr import GermanOCR

# Using Ollama (fast, local)
ocr = GermanOCR(backend="ollama")
result = ocr.extract("document.png")
print(result)

# Using Transformers (more accurate)
ocr = GermanOCR(backend="transformers")
result = ocr.extract("document.png")
print(result)

Option 2: Ollama

[!WARNING]

In Entwicklung - Vision-Adapter Kompatibilität wird noch bearbeitet. Für stabile Nutzung: HuggingFace-Version empfohlen.

ollama run Keyvan/german-ocr "Extrahiere den Text: image.png"

Option 3: Transformers

from transformers import Qwen2VLForConditionalGeneration, AutoProcessor
from qwen_vl_utils import process_vision_info
from PIL import Image

model = Qwen2VLForConditionalGeneration.from_pretrained(
    "Keyven/german-ocr",
    device_map="auto"
)
processor = AutoProcessor.from_pretrained("Keyven/german-ocr")

image = Image.open("document.png")

messages = [{
    "role": "user",
    "content": [
        {"type": "image", "image": image},
        {"type": "text", "text": "Extrahiere den Text aus diesem Dokument."}
    ]
}]

text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
image_inputs, video_inputs = process_vision_info(messages)
inputs = processor(
    text=[text],
    images=image_inputs,
    videos=video_inputs,
    padding=True,
    return_tensors="pt"
).to(model.device)

output_ids = model.generate(**inputs, max_new_tokens=512)
result = processor.batch_decode(
    output_ids[:, inputs.input_ids.shape[1]:],
    skip_special_tokens=True
)[0]
print(result)

Performance

Metric	Value
Base Model	Qwen2-VL-2B-Instruct
Model Size	4.4 GB
VRAM (4-bit)	1.5 GB
Inference Time	~15s (GPU)

Training

Method: QLoRA (4-bit quantization)
Epochs: 3
Learning Rate: 2e-4
LoRA Rank: 64
Target Modules: All linear layers

Limitations

Optimized for German documents
Best results with clear, high-resolution images
May struggle with handwritten text

License

Apache 2.0

Author

Keyvan Hardani

Model tree for Keyven/german-ocr

Base model

Qwen/Qwen2-VL-2B

Finetuned

Qwen/Qwen2-VL-2B-Instruct