OmniDocBench Comprehensive Document Parsing
Comprehensive document parsing annotation covering layout detection, text recognition, table structure, and formula recognition. Annotators draw bounding boxes and provide text transcriptions for document elements.
Fichier de configurationconfig.yaml
# OmniDocBench Comprehensive Document Parsing Configuration
# Based on Chen et al., CVPR 2025
annotation_task_name: "OmniDocBench Document Parsing"
task_dir: "."
data_files:
- "sample-data.json"
item_properties:
id_key: "id"
text_key: "image_url"
context_key: "document_type"
user_config:
allow_all_users: true
annotation_schemes:
- annotation_type: "text"
name: "element_bounding_boxes"
description: "Draw bounding boxes around each document element (format: element_type,x,y,width,height per line). Element types: Text, Title, Table, Figure, Formula, Header, Footer, Caption, Reference, Abstract"
- annotation_type: "text"
name: "ocr_transcription"
description: "Provide OCR text transcription for each detected element (format: element_id|transcribed_text per line)"
- annotation_type: "text"
name: "table_structure"
description: "For each table element, describe its structure (format: table_id|num_rows,num_cols|cell_content_summary)"
- annotation_type: "text"
name: "formula_latex"
description: "For each formula element, provide the LaTeX transcription (format: formula_id|latex_expression)"
- annotation_type: "multiselect"
name: "element_types_present"
description: "Select all element types present on this document page"
labels:
- name: "Text"
tooltip: "Regular paragraph body text"
- name: "Title"
tooltip: "Document title or major headings"
- name: "Table"
tooltip: "Tabular data with rows and columns"
- name: "Figure"
tooltip: "Images, charts, diagrams, plots"
- name: "Formula"
tooltip: "Mathematical equations or expressions"
- name: "Header"
tooltip: "Page headers or running titles"
- name: "Footer"
tooltip: "Page footers, page numbers"
- name: "Caption"
tooltip: "Captions for figures or tables"
- name: "Reference"
tooltip: "Bibliography or reference entries"
- name: "Abstract"
tooltip: "Paper or document abstract section"
- annotation_type: "radio"
name: "document_language"
description: "What is the primary language of this document?"
labels:
- name: "english"
tooltip: "English language document"
- name: "chinese"
tooltip: "Chinese language document"
- name: "mixed"
tooltip: "Multiple languages present"
- name: "other"
tooltip: "Other language"
- annotation_type: "radio"
name: "parsing_difficulty"
description: "Rate the overall parsing difficulty of this document page"
labels:
- name: "easy"
tooltip: "Simple layout, clear text, no complex elements"
- name: "moderate"
tooltip: "Some tables or figures, standard formatting"
- name: "hard"
tooltip: "Complex tables, formulas, multi-column, or degraded quality"
- name: "very_hard"
tooltip: "Highly complex layout with many overlapping element types"
interface_config:
item_display_format: "<img src='{{text}}' style='max-width:100%; max-height:600px; border:1px solid #ccc;'/><br/><small>Document type: {{document_type}} | Page: {{page_number}} | Source: {{source}}</small>"
output_annotation_format: "json"
output_annotation_dir: "annotations"
Données d'exemplesample-data.json
[
{
"id": "omnidoc_001",
"image_url": "https://upload.wikimedia.org/wikipedia/commons/thumb/8/8e/Pubmed_central_abstract.png/800px-Pubmed_central_abstract.png",
"document_type": "scientific_paper",
"page_number": 1,
"source": "arxiv"
},
{
"id": "omnidoc_002",
"image_url": "https://upload.wikimedia.org/wikipedia/commons/thumb/2/2e/US_patent_1.png/800px-US_patent_1.png",
"document_type": "patent",
"page_number": 1,
"source": "uspto"
}
]
// ... and 8 more itemsObtenir ce design
Clone or download from the repository
Démarrage rapide :
git clone https://github.com/davidjurgens/potato-showcase.git cd potato-showcase/image/omnidocbench-document-parsing potato start config.yaml
Détails
Types d'annotation
Domaine
Cas d'utilisation
Étiquettes
Vous avez trouvé un problème ou souhaitez améliorer ce design ?
Ouvrir un ticketDesigns associés
DocBank Document Layout Detection
Document layout analysis benchmark (Li et al., COLING 2020). Detect and classify document elements including titles, abstracts, paragraphs, figures, tables, and captions.
DocLayNet Document Layout Analysis
Document layout analysis with bounding box annotations. Annotators draw bounding boxes around layout elements (text blocks, tables, figures, headers, footers, lists) in document page images.
FLUTE: Figurative Language Understanding through Textual Explanations
Figurative language understanding via NLI. Annotators classify figurative sentences (sarcasm, simile, metaphor, idiom) and provide textual explanations of the figurative meaning. The task combines natural language inference with fine-grained figurative language type classification.