Skip to content
Showcase/OmniDocBench Comprehensive Document Parsing
advancedtext

OmniDocBench Comprehensive Document Parsing

Comprehensive document parsing annotation covering layout detection, text recognition, table structure, and formula recognition. Annotators draw bounding boxes and provide text transcriptions for document elements.

Q1: Rate your experience12345Q2: Primary use case?ResearchIndustryEducationQ3: Additional feedback

配置文件config.yaml

# OmniDocBench Comprehensive Document Parsing Configuration
# Based on Chen et al., CVPR 2025

annotation_task_name: "OmniDocBench Document Parsing"
task_dir: "."

data_files:
  - "sample-data.json"

item_properties:
  id_key: "id"
  text_key: "image_url"
  context_key: "document_type"

user_config:
  allow_all_users: true

annotation_schemes:
  - annotation_type: "text"
    name: "element_bounding_boxes"
    description: "Draw bounding boxes around each document element (format: element_type,x,y,width,height per line). Element types: Text, Title, Table, Figure, Formula, Header, Footer, Caption, Reference, Abstract"

  - annotation_type: "text"
    name: "ocr_transcription"
    description: "Provide OCR text transcription for each detected element (format: element_id|transcribed_text per line)"

  - annotation_type: "text"
    name: "table_structure"
    description: "For each table element, describe its structure (format: table_id|num_rows,num_cols|cell_content_summary)"

  - annotation_type: "text"
    name: "formula_latex"
    description: "For each formula element, provide the LaTeX transcription (format: formula_id|latex_expression)"

  - annotation_type: "multiselect"
    name: "element_types_present"
    description: "Select all element types present on this document page"
    labels:
      - name: "Text"
        tooltip: "Regular paragraph body text"
      - name: "Title"
        tooltip: "Document title or major headings"
      - name: "Table"
        tooltip: "Tabular data with rows and columns"
      - name: "Figure"
        tooltip: "Images, charts, diagrams, plots"
      - name: "Formula"
        tooltip: "Mathematical equations or expressions"
      - name: "Header"
        tooltip: "Page headers or running titles"
      - name: "Footer"
        tooltip: "Page footers, page numbers"
      - name: "Caption"
        tooltip: "Captions for figures or tables"
      - name: "Reference"
        tooltip: "Bibliography or reference entries"
      - name: "Abstract"
        tooltip: "Paper or document abstract section"

  - annotation_type: "radio"
    name: "document_language"
    description: "What is the primary language of this document?"
    labels:
      - name: "english"
        tooltip: "English language document"
      - name: "chinese"
        tooltip: "Chinese language document"
      - name: "mixed"
        tooltip: "Multiple languages present"
      - name: "other"
        tooltip: "Other language"

  - annotation_type: "radio"
    name: "parsing_difficulty"
    description: "Rate the overall parsing difficulty of this document page"
    labels:
      - name: "easy"
        tooltip: "Simple layout, clear text, no complex elements"
      - name: "moderate"
        tooltip: "Some tables or figures, standard formatting"
      - name: "hard"
        tooltip: "Complex tables, formulas, multi-column, or degraded quality"
      - name: "very_hard"
        tooltip: "Highly complex layout with many overlapping element types"

interface_config:
  item_display_format: "<img src='{{text}}' style='max-width:100%; max-height:600px; border:1px solid #ccc;'/><br/><small>Document type: {{document_type}} | Page: {{page_number}} | Source: {{source}}</small>"

output_annotation_format: "json"
output_annotation_dir: "annotations"

示例数据sample-data.json

[
  {
    "id": "omnidoc_001",
    "image_url": "https://upload.wikimedia.org/wikipedia/commons/thumb/8/8e/Pubmed_central_abstract.png/800px-Pubmed_central_abstract.png",
    "document_type": "scientific_paper",
    "page_number": 1,
    "source": "arxiv"
  },
  {
    "id": "omnidoc_002",
    "image_url": "https://upload.wikimedia.org/wikipedia/commons/thumb/2/2e/US_patent_1.png/800px-US_patent_1.png",
    "document_type": "patent",
    "page_number": 1,
    "source": "uspto"
  }
]

// ... and 8 more items

获取此设计

View on GitHub

Clone or download from the repository

快速开始:

git clone https://github.com/davidjurgens/potato-showcase.git
cd potato-showcase/image/omnidocbench-document-parsing
potato start config.yaml

详情

标注类型

multiselectradiotext

领域

Document AIPDF Parsing

应用场景

Document ParsingLayout DetectionOCRTable RecognitionFormula Recognition

标签

omnidocbenchdocument-parsingpdfocrtable-structureformulacvpr2025

发现问题或想改进此设计?

提交 Issue