# Legal Document Annotation Best Practices

Source: https://www.potatoannotator.com/blog/legal-document-annotation

Legal text is its own kind of annotation problem. The structure is dense, the vocabulary is specialized, and a mislabel can carry real legal consequences. This guide covers how to annotate it well.

## What makes legal annotation hard

The jargon alone means you need trained annotators, not crowd workers picked at random. Documents run long, with contracts sometimes stretching to hundreds of pages, and sections constantly point at other sections. Precision matters more than usual, since a sloppy span can change the meaning of a clause. And much of that meaning depends on the document type and the jurisdiction, so context is never optional.

For the underlying span and text annotation mechanics, see the [source documentation](https://github.com/davidjurgens/potato/blob/master/docs/annotation-types/text/text_annotation.md).

## Document segmentation

### Breaking down long documents

```yaml
annotation_task_name: "Legal Document Annotation"

display:
  # Segment by section
  segmentation:
    enabled: true
    method: section_headers
    pattern: '^\d+\.\s+[A-Z]'

  # Show document context
  context:
    show_previous_section: true
    show_section_hierarchy: true

  # Navigation
  navigation:
    show_outline: true
    jump_to_section: true
```

### Section-Level Annotation

```yaml
data_files:
  - contracts.json

item_properties:
  id_key: id
  text_key: text

preprocessing:
  segment_by: sections
  preserve_metadata: true
  include_section_number: true

# Each section becomes an annotation item
# {
#   "id": "contract_001_section_3.2",
#   "text": "The Licensor grants...",
#   "section_number": "3.2",
#   "section_title": "License Grant",
#   "document_id": "contract_001"
# }
```

## Legal Entity Recognition

### Contract-Specific Entities

```yaml
annotation_schemes:
  - annotation_type: span
    name: legal_entities
    labels:
      - name: PARTY
        color: "#FECACA"
        description: "Contracting parties (Licensor, Licensee, Company, etc.)"

      - name: DEFINED_TERM
        color: "#FDE68A"
        description: "Defined terms (usually capitalized)"

      - name: DATE
        color: "#BBF7D0"
        description: "Dates and time periods"

      - name: MONETARY
        color: "#C4B5FD"
        description: "Dollar amounts, fees, penalties"

      - name: OBLIGATION
        color: "#BFDBFE"
        description: "Must, shall, will obligations"

      - name: CONDITION
        color: "#FED7AA"
        description: "If, unless, provided that conditions"

      - name: REFERENCE
        color: "#E0E7FF"
        description: "References to other sections or documents"
```

### Obligation Detection

```yaml
annotation_schemes:
  - annotation_type: multiselect
    name: obligation_type
    question: "What type of obligation is this?"
    options:
      - name: performance
        label: "Performance Obligation"
        description: "Party must do something"

      - name: payment
        label: "Payment Obligation"
        description: "Party must pay"

      - name: restriction
        label: "Restriction/Prohibition"
        description: "Party must not do something"

      - name: condition
        label: "Conditional Obligation"
        description: "Obligation triggered by condition"

      - name: warranty
        label: "Warranty/Representation"
        description: "Statement of fact or promise"
```

## Clause Classification

### Contract Clause Types

```yaml
annotation_schemes:
  - annotation_type: radio
    name: clause_type
    question: "What type of clause is this?"
    options:
      - name: definitions
        label: "Definitions"
      - name: grant
        label: "Grant of Rights/License"
      - name: consideration
        label: "Consideration/Payment"
      - name: term
        label: "Term and Termination"
      - name: representations
        label: "Representations & Warranties"
      - name: indemnification
        label: "Indemnification"
      - name: limitation
        label: "Limitation of Liability"
      - name: confidentiality
        label: "Confidentiality"
      - name: ip
        label: "Intellectual Property"
      - name: dispute
        label: "Dispute Resolution"
      - name: boilerplate
        label: "Boilerplate/Miscellaneous"
```

### Risk Assessment

```yaml
annotation_schemes:
  - annotation_type: likert
    name: risk_level
    question: "Rate the risk level of this clause for [Party]"
    min_label: "Low Risk"
    max_label: "High Risk"
    size: 5

  - annotation_type: text
    name: risk_notes
    question: "Explain the risk factors"
    multiline: true
    required_if:
      field: risk_level
      operator: ">="
      value: 4
```

## Court Document Annotation

### Case Information Extraction

```yaml
annotation_schemes:
  - annotation_type: span
    name: case_entities
    labels:
      - name: CASE_NUMBER
        description: "Case identifier"

      - name: COURT
        description: "Court name and jurisdiction"

      - name: JUDGE
        description: "Presiding judge"

      - name: PLAINTIFF
        description: "Plaintiff/Petitioner"

      - name: DEFENDANT
        description: "Defendant/Respondent"

      - name: ATTORNEY
        description: "Attorneys/Legal representatives"

      - name: LEGAL_CITATION
        description: "Citations to cases, statutes, regulations"

      - name: RULING
        description: "Court's ruling or order"
```

### Argument Structure

```yaml
annotation_schemes:
  - annotation_type: span
    name: argument_structure
    labels:
      - name: CLAIM
        color: "#FECACA"
        description: "Main claim or assertion"

      - name: PREMISE
        color: "#BBF7D0"
        description: "Supporting premise"

      - name: EVIDENCE
        color: "#BFDBFE"
        description: "Evidence cited"

      - name: REBUTTAL
        color: "#FED7AA"
        description: "Counter-argument"

      - name: CONCLUSION
        color: "#E0E7FF"
        description: "Conclusion drawn"
```

## Highlighting Legal Terms

```yaml
display:
  keyword_highlighting:
    enabled: true

    categories:
      - name: obligation_words
        color: "#FEE2E2"
        keywords:
          - shall
          - must
          - will
          - agrees to
          - is required to
          - is obligated to

      - name: permission_words
        color: "#D1FAE5"
        keywords:
          - may
          - is permitted to
          - has the right to
          - is entitled to

      - name: prohibition_words
        color: "#FEF3C7"
        keywords:
          - shall not
          - must not
          - may not
          - is prohibited from

      - name: condition_words
        color: "#DBEAFE"
        keywords:
          - if
          - unless
          - provided that
          - subject to
          - contingent upon
          - in the event that
```

## Quality Control for Legal

```yaml
quality_control:
  # Require legal training
  qualification:
    required_training: legal_annotation_training
    training_accuracy: 0.85

  # Domain expertise check
  attention_checks:
    enabled: true
    items:
      - text: |
          "Notwithstanding any provision herein to the contrary,
          Licensee shall indemnify Licensor against all claims."
        expected:
          obligation_type: indemnification
          obligated_party: "Licensee"
        type: domain_knowledge

  # High agreement required
  redundancy:
    annotations_per_item: 3
    agreement_threshold: 0.8
    on_disagreement: expert_review

  # Expert review layer
  expert_review:
    enabled: true
    review_threshold: 0.7
    expert_users: [legal_expert_1, legal_expert_2]
```

## Complete Legal Annotation Config

```yaml
annotation_task_name: "Contract Clause Analysis"

display:
  text_display: html

  # Section context
  context:
    show_document_metadata: true
    show_section_hierarchy: true

  # Legal term highlighting
  keyword_highlighting:
    enabled: true
    categories:
      - name: obligations
        color: "#FEE2E2"
        keywords: [shall, must, will, agrees]
      - name: conditions
        color: "#DBEAFE"
        keywords: [if, unless, provided that, subject to]
      - name: defined_terms
        pattern: '\b[A-Z][a-zA-Z]+(?:\s+[A-Z][a-zA-Z]+)*\b'
        color: "#FEF3C7"

annotation_schemes:
  # Clause type
  - annotation_type: radio
    name: clause_type
    question: "Classify this clause"
    options:
      - name: license_grant
        label: "License Grant"
      - name: payment
        label: "Payment/Consideration"
      - name: term
        label: "Term/Termination"
      - name: indemnification
        label: "Indemnification"
      - name: limitation
        label: "Limitation of Liability"
      - name: confidentiality
        label: "Confidentiality"
      - name: other
        label: "Other"

  # Entity spans
  - annotation_type: span
    name: entities
    labels:
      - name: PARTY
        color: "#FECACA"
      - name: DEFINED_TERM
        color: "#FDE68A"
      - name: MONETARY
        color: "#C4B5FD"
      - name: DATE
        color: "#BBF7D0"
      - name: OBLIGATION
        color: "#BFDBFE"

  # Risk assessment
  - annotation_type: likert
    name: risk
    question: "Risk level for the receiving party?"
    size: 5
    min_label: "Low"
    max_label: "High"

  # Key issues
  - annotation_type: text
    name: issues
    question: "Note any unusual or problematic language"
    multiline: true

quality_control:
  redundancy:
    annotations_per_item: 2
    agreement_threshold: 0.75

  qualification:
    required_training: true
    training_items: 20
    training_accuracy: 0.8
```

## Writing annotator guidelines

Good guidelines for a legal task spell out the scope (which documents, which jurisdictions) and include a glossary so annotators read terms the same way you do. Be explicit about ambiguous language and about when a cross-reference should be annotated versus ignored. And say what counts as a correct span boundary, since "close enough" rarely is in this domain.

## Best practices

Use annotators who actually know the domain, because legal labeling does not work without it. Break long documents into sections people can hold in their head. Highlight the legal language that carries weight so it does not get skimmed past. Keep redundancy high, since the cost of an error is high. Add an expert review layer where attorneys handle the edge cases. Define each label precisely. And give annotators the surrounding structure and related sections, because a clause in isolation can mislead.

---

*Full documentation at [/docs/core-concepts/annotation-schemes](/docs/core-concepts/annotation-schemes).*
