You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

German-OCR Logo

German-OCR

High-performance German document OCR using fine-tuned Qwen2-VL-2B & Qwen2.5-VL-3B vision-language model

Model Description

German-OCR is specifically trained to extract text from German documents including invoices, receipts, forms, and other business documents. It outputs structured text in Markdown format.

  • Base Model: Qwen/Qwen2-VL-2B-Instruct
  • Fine-tuning: QLoRA (4-bit quantization)
  • Training Data: German invoices and business documents
  • Output Format: Markdown structured text

Model Variants

Model Size Base HuggingFace
german-ocr 4.4 GB Qwen2-VL-2B Keyven/german-ocr
german-ocr-3b 7.5 GB Qwen2.5-VL-3B Keyven/german-ocr-3b

Usage

Option 1: Python Package (Recommended)

pip install german-ocr
from german_ocr import GermanOCR

# Using Ollama (fast, local)
ocr = GermanOCR(backend="ollama")
result = ocr.extract("document.png")
print(result)

# Using Transformers (more accurate)
ocr = GermanOCR(backend="transformers")
result = ocr.extract("document.png")
print(result)

Option 2: Ollama

[!WARNING]

In Entwicklung - Vision-Adapter Kompatibilität wird noch bearbeitet. Für stabile Nutzung: HuggingFace-Version empfohlen.

ollama run Keyvan/german-ocr "Extrahiere den Text: image.png"

Option 3: Transformers

from transformers import Qwen2VLForConditionalGeneration, AutoProcessor
from qwen_vl_utils import process_vision_info
from PIL import Image

model = Qwen2VLForConditionalGeneration.from_pretrained(
    "Keyven/german-ocr",
    device_map="auto"
)
processor = AutoProcessor.from_pretrained("Keyven/german-ocr")

image = Image.open("document.png")

messages = [{
    "role": "user",
    "content": [
        {"type": "image", "image": image},
        {"type": "text", "text": "Extrahiere den Text aus diesem Dokument."}
    ]
}]

text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
image_inputs, video_inputs = process_vision_info(messages)
inputs = processor(
    text=[text],
    images=image_inputs,
    videos=video_inputs,
    padding=True,
    return_tensors="pt"
).to(model.device)

output_ids = model.generate(**inputs, max_new_tokens=512)
result = processor.batch_decode(
    output_ids[:, inputs.input_ids.shape[1]:],
    skip_special_tokens=True
)[0]
print(result)

Performance

Metric Value
Base Model Qwen2-VL-2B-Instruct
Model Size 4.4 GB
VRAM (4-bit) 1.5 GB
Inference Time ~15s (GPU)

Training

  • Method: QLoRA (4-bit quantization)
  • Epochs: 3
  • Learning Rate: 2e-4
  • LoRA Rank: 64
  • Target Modules: All linear layers

Limitations

  • Optimized for German documents
  • Best results with clear, high-resolution images
  • May struggle with handwritten text

License

Apache 2.0

Author

Keyvan Hardani

Links

Downloads last month
4
Safetensors
Model size
2B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Keyven/german-ocr

Base model

Qwen/Qwen2-VL-2B
Finetuned
(315)
this model

Dataset used to train Keyven/german-ocr