All Benchmarks/OlmOCR Bench

OlmOCR Bench

v1.0

7,010 pass/fail tests built by Allen AI. Feed a page image in, get structured text out. Scored on five axes: does it render math correctly, preserve table layout, pick up headers and captions, suppress watermarks and page numbers, and maintain reading order. Sources include arXiv papers, old scans, financial filings, and multi-column layouts.

Models Evaluated

27

Dataset Size

1,403 pages ยท 7,010 tests

Metrics

7

Source

View on GitHub

Overall Score = Macro average across 8 dataset categories

Rankings

#
Model
Overall
ArXiv Math
H&F
Long/Tiny
Multi-Col
Old Scans
Scans Math
Tables
1Nanonets OCR-3Nanonets87.489.296.693.487.649.688.994.2
2Datalab MarkerDatalab83.283.888.690.380.951.986.783.4
3Nanonets OCR2+Nanonets82.083.896.889.483.942.474.586.3
4Qwen3-VL-PlusAlibaba77.988.334.688.086.751.088.086.6
5Qwen3.5-9BAlibaba77.286.048.884.883.446.882.586.3
6Qwen3-VL-235BAlibaba76.888.433.688.985.949.681.286.7
7Qwen3.5-4BAlibaba75.486.747.283.979.241.181.985.0
8GPT-5.4OpenAI73.483.120.182.683.743.982.391.1
9Qwen3.5-2BAlibaba71.982.149.677.174.738.075.380.7
10Mistral Small 4Mistral AI69.664.241.371.382.436.577.584.0
11Claude Sonnet 4.6Anthropic69.387.537.471.378.519.884.386.0
12Claude Opus 4.6Anthropic69.386.635.573.879.520.583.084.5
13GPT-5.2OpenAI68.778.426.283.381.040.984.586.8
14GLM-OCRZhipu AI68.467.389.235.780.341.375.559.0
15Gemini-3-ProGoogle67.787.126.790.577.343.775.173.6
16Gemini-3-FlashGoogle65.380.127.490.375.345.873.664.6
17Qwen3.5-0.8BAlibaba64.873.862.262.070.031.660.363.8
18Gemini 3.1 ProGoogle60.774.433.284.672.742.272.545.1
19GPT-5-MiniOpenAI59.350.125.584.682.941.160.570.6
20Qwen-VL-OCRAlibaba59.066.824.774.775.137.350.743.0
21Ministral-8BMistral AI58.748.953.056.682.427.962.979.3
22Claude Haiku 4.5Anthropic58.255.437.636.262.819.679.583.4
23GPT-4.1OpenAI54.058.232.153.466.337.671.459.2
24Llama-3.2-Vision-11BMeta49.137.268.334.259.529.853.760.9
25Pixtral-12BMistral AI38.330.072.812.226.829.150.746.8
26GPT-5-NanoOpenAI26.82.541.65.052.129.71.755.2
27Gemma-3-12B-ITGoogle20.60.00.00.00.00.00.00.0

Metrics

ArXiv MathHigher is better

H&FHigher is better

Long/TinyHigher is better

Multi-ColHigher is better

Old ScansHigher is better

Scans MathHigher is better

TablesHigher is better