All Benchmarks/OmniDocBench

OmniDocBench

v1.5

Built by OpenDataLab. 1,355 pages from papers, books, slides, exams, newspapers, and magazines. Scores text extraction via edit distance, formula recognition via CDM, table structure via TEDS, and reading order accuracy. Overall = ((1 − Text Edit) × 100 + Table TEDS + Formula CDM) / 3.

Models Evaluated

27

Dataset Size

1,355 pages

Metrics

5

Source

View on GitHub

Overall Score = ((1 - Text Edit) x 100 + Table TEDS + Formula CDM) / 3

Rankings

#
Model
Overall
Text Edit↓
CDM↑
TEDS↑
TEDS-S↑
Read Order↓
1Gemini-3-FlashGoogle90.10.07790.287.792.60.081
2Nanonets OCR-3Nanonets90.00.06887.788.993.30.100
3Nanonets OCR2+Nanonets89.50.05690.379.183.60.090
4Gemini-3-ProGoogle88.80.07887.387.091.70.084
5GPT-5.2OpenAI88.00.11190.184.989.50.098
6Claude Sonnet 4.6Anthropic86.90.16590.287.191.20.149
7Claude Opus 4.6Anthropic85.90.15188.584.489.10.136
8Datalab MarkerDatalab85.50.10988.379.183.70.106
9Gemini 3.1 ProGoogle85.30.08283.380.885.40.073
10GPT-5.4OpenAI85.30.08983.481.386.70.077
11Qwen3-VL-PlusAlibaba82.50.15776.686.690.70.099
12GPT-5-MiniOpenAI82.50.13886.774.680.10.121
13Qwen3-VL-235BAlibaba81.90.16275.186.890.60.101
14GPT-4.1OpenAI79.90.16782.274.083.80.115
15Claude Haiku 4.5Anthropic79.60.22484.277.183.80.178
16Ministral-8BMistral AI78.30.15783.367.173.80.125
17Qwen3.5-9BAlibaba76.70.25381.473.977.60.116
18Mistral Small 4Mistral AI76.40.24278.375.182.70.162
19GLM-OCRZhipu AI69.20.14484.737.439.30.141
20Qwen3.5-4BAlibaba67.60.29271.560.464.60.106
21GPT-5-NanoOpenAI63.40.31961.061.269.50.243
22Qwen3.5-2BAlibaba48.70.62162.945.348.20.401
23Qwen3.5-0.8BAlibaba47.30.58362.337.941.00.352
24Gemma-3-12B-ITGoogle44.60.47650.031.646.90.364
25Llama-3.2-Vision-11BMeta44.60.54155.432.642.90.340
26Pixtral-12BMistral AI42.30.64158.832.150.80.422
27Qwen-VL-OCRAlibaba34.10.82322.662.167.70.810

Metrics

Text EditLower is better

Character-level edit distance between predicted and ground-truth text blocks. Lower values indicate more accurate text extraction.

CDMHigher is better

Character Detection Matching score for display formulas. Measures structural and symbolic accuracy of recognized mathematical expressions.

TEDSHigher is better

Tree Edit Distance-based Similarity for tables. Evaluates both content and structure of extracted tables.

TEDS-SHigher is better

Structure-only TEDS that ignores cell content. Focuses purely on table layout and cell spanning.

Read OrderLower is better

Edit distance measuring how well the model preserves the correct reading order across multi-column and complex layouts.