Skip to content
Guides5 min read

法律文档标注最佳实践

标注合同、法庭文件和监管申报的专业技术,需要领域专业知识。

Potato Team·

法律文档标注最佳实践

法律文档因其复杂结构、领域特定术语和错误的高风险性,需要专门的标注方法。本指南涵盖了有效法律文本标注的策略。

法律标注的挑战

  • 密集术语:法律术语需要经过培训的标注者
  • 长文档:合同可能长达数百页
  • 交叉引用:各章节相互引用
  • 精确性要求:错误可能导致法律后果
  • 上下文依赖:含义取决于文档类型和管辖权

文档分段

拆分长文档

yaml
annotation_task_name: "Legal Document Annotation"
 
display:
  # Segment by section
  segmentation:
    enabled: true
    method: section_headers
    pattern: '^\d+\.\s+[A-Z]'
 
  # Show document context
  context:
    show_previous_section: true
    show_section_hierarchy: true
 
  # Navigation
  navigation:
    show_outline: true
    jump_to_section: true

章节级标注

yaml
data_files:
  - contracts.json
 
item_properties:
  id_key: id
  text_key: text
 
preprocessing:
  segment_by: sections
  preserve_metadata: true
  include_section_number: true
 
# Each section becomes an annotation item
# {
#   "id": "contract_001_section_3.2",
#   "text": "The Licensor grants...",
#   "section_number": "3.2",
#   "section_title": "License Grant",
#   "document_id": "contract_001"
# }

法律实体识别

合同特定实体

yaml
annotation_schemes:
  - annotation_type: span
    name: legal_entities
    labels:
      - name: PARTY
        color: "#FECACA"
        description: "Contracting parties (Licensor, Licensee, Company, etc.)"
 
      - name: DEFINED_TERM
        color: "#FDE68A"
        description: "Defined terms (usually capitalized)"
 
      - name: DATE
        color: "#BBF7D0"
        description: "Dates and time periods"
 
      - name: MONETARY
        color: "#C4B5FD"
        description: "Dollar amounts, fees, penalties"
 
      - name: OBLIGATION
        color: "#BFDBFE"
        description: "Must, shall, will obligations"
 
      - name: CONDITION
        color: "#FED7AA"
        description: "If, unless, provided that conditions"
 
      - name: REFERENCE
        color: "#E0E7FF"
        description: "References to other sections or documents"

义务检测

yaml
annotation_schemes:
  - annotation_type: multiselect
    name: obligation_type
    question: "What type of obligation is this?"
    options:
      - name: performance
        label: "Performance Obligation"
        description: "Party must do something"
 
      - name: payment
        label: "Payment Obligation"
        description: "Party must pay"
 
      - name: restriction
        label: "Restriction/Prohibition"
        description: "Party must not do something"
 
      - name: condition
        label: "Conditional Obligation"
        description: "Obligation triggered by condition"
 
      - name: warranty
        label: "Warranty/Representation"
        description: "Statement of fact or promise"

条款分类

合同条款类型

yaml
annotation_schemes:
  - annotation_type: radio
    name: clause_type
    question: "What type of clause is this?"
    options:
      - name: definitions
        label: "Definitions"
      - name: grant
        label: "Grant of Rights/License"
      - name: consideration
        label: "Consideration/Payment"
      - name: term
        label: "Term and Termination"
      - name: representations
        label: "Representations & Warranties"
      - name: indemnification
        label: "Indemnification"
      - name: limitation
        label: "Limitation of Liability"
      - name: confidentiality
        label: "Confidentiality"
      - name: ip
        label: "Intellectual Property"
      - name: dispute
        label: "Dispute Resolution"
      - name: boilerplate
        label: "Boilerplate/Miscellaneous"

风险评估

yaml
annotation_schemes:
  - annotation_type: likert
    name: risk_level
    question: "Rate the risk level of this clause for [Party]"
    min_label: "Low Risk"
    max_label: "High Risk"
    size: 5
 
  - annotation_type: text
    name: risk_notes
    question: "Explain the risk factors"
    multiline: true
    required_if:
      field: risk_level
      operator: ">="
      value: 4

法庭文件标注

案件信息提取

yaml
annotation_schemes:
  - annotation_type: span
    name: case_entities
    labels:
      - name: CASE_NUMBER
        description: "Case identifier"
 
      - name: COURT
        description: "Court name and jurisdiction"
 
      - name: JUDGE
        description: "Presiding judge"
 
      - name: PLAINTIFF
        description: "Plaintiff/Petitioner"
 
      - name: DEFENDANT
        description: "Defendant/Respondent"
 
      - name: ATTORNEY
        description: "Attorneys/Legal representatives"
 
      - name: LEGAL_CITATION
        description: "Citations to cases, statutes, regulations"
 
      - name: RULING
        description: "Court's ruling or order"

论证结构

yaml
annotation_schemes:
  - annotation_type: span
    name: argument_structure
    labels:
      - name: CLAIM
        color: "#FECACA"
        description: "Main claim or assertion"
 
      - name: PREMISE
        color: "#BBF7D0"
        description: "Supporting premise"
 
      - name: EVIDENCE
        color: "#BFDBFE"
        description: "Evidence cited"
 
      - name: REBUTTAL
        color: "#FED7AA"
        description: "Counter-argument"
 
      - name: CONCLUSION
        color: "#E0E7FF"
        description: "Conclusion drawn"

法律术语高亮

yaml
display:
  keyword_highlighting:
    enabled: true
 
    categories:
      - name: obligation_words
        color: "#FEE2E2"
        keywords:
          - shall
          - must
          - will
          - agrees to
          - is required to
          - is obligated to
 
      - name: permission_words
        color: "#D1FAE5"
        keywords:
          - may
          - is permitted to
          - has the right to
          - is entitled to
 
      - name: prohibition_words
        color: "#FEF3C7"
        keywords:
          - shall not
          - must not
          - may not
          - is prohibited from
 
      - name: condition_words
        color: "#DBEAFE"
        keywords:
          - if
          - unless
          - provided that
          - subject to
          - contingent upon
          - in the event that

法律标注的质量控制

yaml
quality_control:
  # Require legal training
  qualification:
    required_training: legal_annotation_training
    training_accuracy: 0.85
 
  # Domain expertise check
  attention_checks:
    enabled: true
    items:
      - text: |
          "Notwithstanding any provision herein to the contrary,
          Licensee shall indemnify Licensor against all claims."
        expected:
          obligation_type: indemnification
          obligated_party: "Licensee"
        type: domain_knowledge
 
  # High agreement required
  redundancy:
    annotations_per_item: 3
    agreement_threshold: 0.8
    on_disagreement: expert_review
 
  # Expert review layer
  expert_review:
    enabled: true
    review_threshold: 0.7
    expert_users: [legal_expert_1, legal_expert_2]

完整法律标注配置

yaml
annotation_task_name: "Contract Clause Analysis"
 
display:
  text_display: html
 
  # Section context
  context:
    show_document_metadata: true
    show_section_hierarchy: true
 
  # Legal term highlighting
  keyword_highlighting:
    enabled: true
    categories:
      - name: obligations
        color: "#FEE2E2"
        keywords: [shall, must, will, agrees]
      - name: conditions
        color: "#DBEAFE"
        keywords: [if, unless, provided that, subject to]
      - name: defined_terms
        pattern: '\b[A-Z][a-zA-Z]+(?:\s+[A-Z][a-zA-Z]+)*\b'
        color: "#FEF3C7"
 
annotation_schemes:
  # Clause type
  - annotation_type: radio
    name: clause_type
    question: "Classify this clause"
    options:
      - name: license_grant
        label: "License Grant"
      - name: payment
        label: "Payment/Consideration"
      - name: term
        label: "Term/Termination"
      - name: indemnification
        label: "Indemnification"
      - name: limitation
        label: "Limitation of Liability"
      - name: confidentiality
        label: "Confidentiality"
      - name: other
        label: "Other"
 
  # Entity spans
  - annotation_type: span
    name: entities
    labels:
      - name: PARTY
        color: "#FECACA"
      - name: DEFINED_TERM
        color: "#FDE68A"
      - name: MONETARY
        color: "#C4B5FD"
      - name: DATE
        color: "#BBF7D0"
      - name: OBLIGATION
        color: "#BFDBFE"
 
  # Risk assessment
  - annotation_type: likert
    name: risk
    question: "Risk level for the receiving party?"
    size: 5
    min_label: "Low"
    max_label: "High"
 
  # Key issues
  - annotation_type: text
    name: issues
    question: "Note any unusual or problematic language"
    multiline: true
 
quality_control:
  redundancy:
    annotations_per_item: 2
    agreement_threshold: 0.75
 
  qualification:
    required_training: true
    training_items: 20
    training_accuracy: 0.8

标注者指南示例

创建法律标注指南时:

  1. 定义范围:哪些文档,哪些管辖权
  2. 术语表:为标注者定义法律术语
  3. 边缘案例:如何处理模糊语言
  4. 交叉引用:何时标注 vs 忽略引用
  5. 精确性要求:精确的 span 边界

最佳实践

  1. 使用经过培训的标注者:法律标注需要领域知识
  2. 分段长文档:拆分为可管理的章节
  3. 高亮关键术语:引导注意力到法律语言
  4. 高冗余度:法律错误代价高昂
  5. 专家审核层:让律师审核边缘案例
  6. 清晰的指南:明确定义每个标签的含义
  7. 上下文标注:展示文档结构和相关章节

完整文档请参阅 /docs/core-concepts/annotation-types