Guides6 min read
Legal Document Annotation Best Practices
Specialized techniques for annotating contracts, court documents, and regulatory filings with domain expertise.
By Potato Teamยท
Legal Document Annotation Best Practices
Legal documents require specialized annotation approaches due to their complex structure, domain-specific terminology, and the high stakes of errors. This guide covers strategies for effective legal text annotation.
Challenges in Legal Annotation
- Dense terminology: Legal jargon requires trained annotators
- Long documents: Contracts can span hundreds of pages
- Cross-references: Sections reference other sections
- Precision required: Errors can have legal consequences
- Context dependence: Meaning depends on document type and jurisdiction
Document Segmentation
Breaking Down Long Documents
annotation_task_name: "Legal Document Annotation"
display:
# Segment by section
segmentation:
enabled: true
method: section_headers
pattern: '^\d+\.\s+[A-Z]'
# Show document context
context:
show_previous_section: true
show_section_hierarchy: true
# Navigation
navigation:
show_outline: true
jump_to_section: trueSection-Level Annotation
data_files:
- contracts.json
item_properties:
id_key: id
text_key: text
preprocessing:
segment_by: sections
preserve_metadata: true
include_section_number: true
# Each section becomes an annotation item
# {
# "id": "contract_001_section_3.2",
# "text": "The Licensor grants...",
# "section_number": "3.2",
# "section_title": "License Grant",
# "document_id": "contract_001"
# }Legal Entity Recognition
Contract-Specific Entities
annotation_schemes:
- annotation_type: span
name: legal_entities
labels:
- name: PARTY
color: "#FECACA"
description: "Contracting parties (Licensor, Licensee, Company, etc.)"
- name: DEFINED_TERM
color: "#FDE68A"
description: "Defined terms (usually capitalized)"
- name: DATE
color: "#BBF7D0"
description: "Dates and time periods"
- name: MONETARY
color: "#C4B5FD"
description: "Dollar amounts, fees, penalties"
- name: OBLIGATION
color: "#BFDBFE"
description: "Must, shall, will obligations"
- name: CONDITION
color: "#FED7AA"
description: "If, unless, provided that conditions"
- name: REFERENCE
color: "#E0E7FF"
description: "References to other sections or documents"Obligation Detection
annotation_schemes:
- annotation_type: multiselect
name: obligation_type
question: "What type of obligation is this?"
options:
- name: performance
label: "Performance Obligation"
description: "Party must do something"
- name: payment
label: "Payment Obligation"
description: "Party must pay"
- name: restriction
label: "Restriction/Prohibition"
description: "Party must not do something"
- name: condition
label: "Conditional Obligation"
description: "Obligation triggered by condition"
- name: warranty
label: "Warranty/Representation"
description: "Statement of fact or promise"Clause Classification
Contract Clause Types
annotation_schemes:
- annotation_type: radio
name: clause_type
question: "What type of clause is this?"
options:
- name: definitions
label: "Definitions"
- name: grant
label: "Grant of Rights/License"
- name: consideration
label: "Consideration/Payment"
- name: term
label: "Term and Termination"
- name: representations
label: "Representations & Warranties"
- name: indemnification
label: "Indemnification"
- name: limitation
label: "Limitation of Liability"
- name: confidentiality
label: "Confidentiality"
- name: ip
label: "Intellectual Property"
- name: dispute
label: "Dispute Resolution"
- name: boilerplate
label: "Boilerplate/Miscellaneous"Risk Assessment
annotation_schemes:
- annotation_type: likert
name: risk_level
question: "Rate the risk level of this clause for [Party]"
min_label: "Low Risk"
max_label: "High Risk"
size: 5
- annotation_type: text
name: risk_notes
question: "Explain the risk factors"
multiline: true
required_if:
field: risk_level
operator: ">="
value: 4Court Document Annotation
Case Information Extraction
annotation_schemes:
- annotation_type: span
name: case_entities
labels:
- name: CASE_NUMBER
description: "Case identifier"
- name: COURT
description: "Court name and jurisdiction"
- name: JUDGE
description: "Presiding judge"
- name: PLAINTIFF
description: "Plaintiff/Petitioner"
- name: DEFENDANT
description: "Defendant/Respondent"
- name: ATTORNEY
description: "Attorneys/Legal representatives"
- name: LEGAL_CITATION
description: "Citations to cases, statutes, regulations"
- name: RULING
description: "Court's ruling or order"Argument Structure
annotation_schemes:
- annotation_type: span
name: argument_structure
labels:
- name: CLAIM
color: "#FECACA"
description: "Main claim or assertion"
- name: PREMISE
color: "#BBF7D0"
description: "Supporting premise"
- name: EVIDENCE
color: "#BFDBFE"
description: "Evidence cited"
- name: REBUTTAL
color: "#FED7AA"
description: "Counter-argument"
- name: CONCLUSION
color: "#E0E7FF"
description: "Conclusion drawn"Highlighting Legal Terms
display:
keyword_highlighting:
enabled: true
categories:
- name: obligation_words
color: "#FEE2E2"
keywords:
- shall
- must
- will
- agrees to
- is required to
- is obligated to
- name: permission_words
color: "#D1FAE5"
keywords:
- may
- is permitted to
- has the right to
- is entitled to
- name: prohibition_words
color: "#FEF3C7"
keywords:
- shall not
- must not
- may not
- is prohibited from
- name: condition_words
color: "#DBEAFE"
keywords:
- if
- unless
- provided that
- subject to
- contingent upon
- in the event thatQuality Control for Legal
quality_control:
# Require legal training
qualification:
required_training: legal_annotation_training
training_accuracy: 0.85
# Domain expertise check
attention_checks:
enabled: true
items:
- text: |
"Notwithstanding any provision herein to the contrary,
Licensee shall indemnify Licensor against all claims."
expected:
obligation_type: indemnification
obligated_party: "Licensee"
type: domain_knowledge
# High agreement required
redundancy:
annotations_per_item: 3
agreement_threshold: 0.8
on_disagreement: expert_review
# Expert review layer
expert_review:
enabled: true
review_threshold: 0.7
expert_users: [legal_expert_1, legal_expert_2]Complete Legal Annotation Config
annotation_task_name: "Contract Clause Analysis"
display:
text_display: html
# Section context
context:
show_document_metadata: true
show_section_hierarchy: true
# Legal term highlighting
keyword_highlighting:
enabled: true
categories:
- name: obligations
color: "#FEE2E2"
keywords: [shall, must, will, agrees]
- name: conditions
color: "#DBEAFE"
keywords: [if, unless, provided that, subject to]
- name: defined_terms
pattern: '\b[A-Z][a-zA-Z]+(?:\s+[A-Z][a-zA-Z]+)*\b'
color: "#FEF3C7"
annotation_schemes:
# Clause type
- annotation_type: radio
name: clause_type
question: "Classify this clause"
options:
- name: license_grant
label: "License Grant"
- name: payment
label: "Payment/Consideration"
- name: term
label: "Term/Termination"
- name: indemnification
label: "Indemnification"
- name: limitation
label: "Limitation of Liability"
- name: confidentiality
label: "Confidentiality"
- name: other
label: "Other"
# Entity spans
- annotation_type: span
name: entities
labels:
- name: PARTY
color: "#FECACA"
- name: DEFINED_TERM
color: "#FDE68A"
- name: MONETARY
color: "#C4B5FD"
- name: DATE
color: "#BBF7D0"
- name: OBLIGATION
color: "#BFDBFE"
# Risk assessment
- annotation_type: likert
name: risk
question: "Risk level for the receiving party?"
size: 5
min_label: "Low"
max_label: "High"
# Key issues
- annotation_type: text
name: issues
question: "Note any unusual or problematic language"
multiline: true
quality_control:
redundancy:
annotations_per_item: 2
agreement_threshold: 0.75
qualification:
required_training: true
training_items: 20
training_accuracy: 0.8Annotator Guidelines Example
When creating guidelines for legal annotation:
- Define scope: What documents, what jurisdictions
- Terminology glossary: Define legal terms for annotators
- Edge cases: How to handle ambiguous language
- Cross-references: When to annotate vs. ignore references
- Precision requirements: Exact span boundaries
Best Practices
- Use trained annotators: Legal annotation requires domain knowledge
- Segment long documents: Break into manageable sections
- Highlight key terms: Guide attention to legal language
- High redundancy: Legal errors are costly
- Expert review layer: Have attorneys review edge cases
- Clear guidelines: Define exactly what each label means
- Contextual annotation: Show document structure and related sections
Full documentation at /docs/core-concepts/annotation-types.