OlmOCR Bench
v1.07,010 pass/fail tests built by Allen AI. Feed a page image in, get structured text out. Scored on five axes: does it render math correctly, preserve table layout, pick up headers and captions, suppress watermarks and page numbers, and maintain reading order. Sources include arXiv papers, old scans, financial filings, and multi-column layouts.
Overall Score = Macro average across 8 dataset categories
Rankings
# | Model | Overall | ArXiv Math | H&F | Long/Tiny | Multi-Col | Old Scans | Scans Math | Tables |
|---|---|---|---|---|---|---|---|---|---|
| 1 | Nanonets OCR-3Nanonets | 87.4 | 89.2 | 96.6 | 93.4 | 87.6 | 49.6 | 88.9 | 94.2 |
| 2 | Datalab MarkerDatalab | 83.2 | 83.8 | 88.6 | 90.3 | 80.9 | 51.9 | 86.7 | 83.4 |
| 3 | Nanonets OCR2+Nanonets | 82.0 | 83.8 | 96.8 | 89.4 | 83.9 | 42.4 | 74.5 | 86.3 |
| 4 | Qwen3-VL-PlusAlibaba | 77.9 | 88.3 | 34.6 | 88.0 | 86.7 | 51.0 | 88.0 | 86.6 |
| 5 | Qwen3.5-9BAlibaba | 77.2 | 86.0 | 48.8 | 84.8 | 83.4 | 46.8 | 82.5 | 86.3 |
| 6 | Qwen3-VL-235BAlibaba | 76.8 | 88.4 | 33.6 | 88.9 | 85.9 | 49.6 | 81.2 | 86.7 |
| 7 | Qwen3.5-4BAlibaba | 75.4 | 86.7 | 47.2 | 83.9 | 79.2 | 41.1 | 81.9 | 85.0 |
| 8 | GPT-5.4OpenAI | 73.4 | 83.1 | 20.1 | 82.6 | 83.7 | 43.9 | 82.3 | 91.1 |
| 9 | Qwen3.5-2BAlibaba | 71.9 | 82.1 | 49.6 | 77.1 | 74.7 | 38.0 | 75.3 | 80.7 |
| 10 | Mistral Small 4Mistral AI | 69.6 | 64.2 | 41.3 | 71.3 | 82.4 | 36.5 | 77.5 | 84.0 |
| 11 | Claude Sonnet 4.6Anthropic | 69.3 | 87.5 | 37.4 | 71.3 | 78.5 | 19.8 | 84.3 | 86.0 |
| 12 | Claude Opus 4.6Anthropic | 69.3 | 86.6 | 35.5 | 73.8 | 79.5 | 20.5 | 83.0 | 84.5 |
| 13 | GPT-5.2OpenAI | 68.7 | 78.4 | 26.2 | 83.3 | 81.0 | 40.9 | 84.5 | 86.8 |
| 14 | GLM-OCRZhipu AI | 68.4 | 67.3 | 89.2 | 35.7 | 80.3 | 41.3 | 75.5 | 59.0 |
| 15 | Gemini-3-ProGoogle | 67.7 | 87.1 | 26.7 | 90.5 | 77.3 | 43.7 | 75.1 | 73.6 |
| 16 | Gemini-3-FlashGoogle | 65.3 | 80.1 | 27.4 | 90.3 | 75.3 | 45.8 | 73.6 | 64.6 |
| 17 | Qwen3.5-0.8BAlibaba | 64.8 | 73.8 | 62.2 | 62.0 | 70.0 | 31.6 | 60.3 | 63.8 |
| 18 | Gemini 3.1 ProGoogle | 60.7 | 74.4 | 33.2 | 84.6 | 72.7 | 42.2 | 72.5 | 45.1 |
| 19 | GPT-5-MiniOpenAI | 59.3 | 50.1 | 25.5 | 84.6 | 82.9 | 41.1 | 60.5 | 70.6 |
| 20 | Qwen-VL-OCRAlibaba | 59.0 | 66.8 | 24.7 | 74.7 | 75.1 | 37.3 | 50.7 | 43.0 |
| 21 | Ministral-8BMistral AI | 58.7 | 48.9 | 53.0 | 56.6 | 82.4 | 27.9 | 62.9 | 79.3 |
| 22 | Claude Haiku 4.5Anthropic | 58.2 | 55.4 | 37.6 | 36.2 | 62.8 | 19.6 | 79.5 | 83.4 |
| 23 | GPT-4.1OpenAI | 54.0 | 58.2 | 32.1 | 53.4 | 66.3 | 37.6 | 71.4 | 59.2 |
| 24 | Llama-3.2-Vision-11BMeta | 49.1 | 37.2 | 68.3 | 34.2 | 59.5 | 29.8 | 53.7 | 60.9 |
| 25 | Pixtral-12BMistral AI | 38.3 | 30.0 | 72.8 | 12.2 | 26.8 | 29.1 | 50.7 | 46.8 |
| 26 | GPT-5-NanoOpenAI | 26.8 | 2.5 | 41.6 | 5.0 | 52.1 | 29.7 | 1.7 | 55.2 |
| 27 | Gemma-3-12B-ITGoogle | 20.6 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
Metrics
ArXiv MathHigher is better
H&FHigher is better
Long/TinyHigher is better
Multi-ColHigher is better
Old ScansHigher is better
Scans MathHigher is better
TablesHigher is better