Skip to content
Guides7 min read

Bonnes pratiques d'annotation de documents juridiques

Annotez des documents juridiques dans Potato, contrats, dépôts judiciaires et textes réglementaires, avec étiquetage de spans, extraction d'entités et déploiement auto-hébergé respectueux de la confidentialité.

Potato Team

Le texte juridique constitue un problème d'annotation à part entière. La structure est complexe, le vocabulaire est spécialisé, et une erreur d'étiquetage peut avoir de réelles conséquences juridiques. Ce guide explique comment bien l'annoter.

Ce qui rend l'annotation juridique difficile

Le jargon à lui seul impose de recourir à des annotateurs formés, et non à des contributeurs choisis au hasard. Les documents sont longs, les contrats s'étendant parfois sur des centaines de pages, et les sections renvoient sans cesse à d'autres sections. La précision compte plus que d'habitude, car un span imprécis peut changer le sens d'une clause. Et une grande partie de ce sens dépend du type de document et de la juridiction, si bien que le contexte n'est jamais optionnel.

Pour les mécanismes sous-jacents d'annotation de spans et de texte, consultez la documentation source.

Segmentation de documents

Découpage de documents longs

yaml
annotation_task_name: "Legal Document Annotation"
 
display:
  # Segment by section
  segmentation:
    enabled: true
    method: section_headers
    pattern: '^\d+\.\s+[A-Z]'
 
  # Show document context
  context:
    show_previous_section: true
    show_section_hierarchy: true
 
  # Navigation
  navigation:
    show_outline: true
    jump_to_section: true

Annotation au niveau des sections

yaml
data_files:
  - contracts.json
 
item_properties:
  id_key: id
  text_key: text
 
preprocessing:
  segment_by: sections
  preserve_metadata: true
  include_section_number: true
 
# Each section becomes an annotation item
# {
#   "id": "contract_001_section_3.2",
#   "text": "The Licensor grants...",
#   "section_number": "3.2",
#   "section_title": "License Grant",
#   "document_id": "contract_001"
# }

Reconnaissance d'entités juridiques

Entités spécifiques aux contrats

yaml
annotation_schemes:
  - annotation_type: span
    name: legal_entities
    labels:
      - name: PARTY
        color: "#FECACA"
        description: "Contracting parties (Licensor, Licensee, Company, etc.)"
 
      - name: DEFINED_TERM
        color: "#FDE68A"
        description: "Defined terms (usually capitalized)"
 
      - name: DATE
        color: "#BBF7D0"
        description: "Dates and time periods"
 
      - name: MONETARY
        color: "#C4B5FD"
        description: "Dollar amounts, fees, penalties"
 
      - name: OBLIGATION
        color: "#BFDBFE"
        description: "Must, shall, will obligations"
 
      - name: CONDITION
        color: "#FED7AA"
        description: "If, unless, provided that conditions"
 
      - name: REFERENCE
        color: "#E0E7FF"
        description: "References to other sections or documents"

Détection des obligations

yaml
annotation_schemes:
  - annotation_type: multiselect
    name: obligation_type
    question: "What type of obligation is this?"
    options:
      - name: performance
        label: "Performance Obligation"
        description: "Party must do something"
 
      - name: payment
        label: "Payment Obligation"
        description: "Party must pay"
 
      - name: restriction
        label: "Restriction/Prohibition"
        description: "Party must not do something"
 
      - name: condition
        label: "Conditional Obligation"
        description: "Obligation triggered by condition"
 
      - name: warranty
        label: "Warranty/Representation"
        description: "Statement of fact or promise"

Classification des clauses

Types de clauses contractuelles

yaml
annotation_schemes:
  - annotation_type: radio
    name: clause_type
    question: "What type of clause is this?"
    options:
      - name: definitions
        label: "Definitions"
      - name: grant
        label: "Grant of Rights/License"
      - name: consideration
        label: "Consideration/Payment"
      - name: term
        label: "Term and Termination"
      - name: representations
        label: "Representations & Warranties"
      - name: indemnification
        label: "Indemnification"
      - name: limitation
        label: "Limitation of Liability"
      - name: confidentiality
        label: "Confidentiality"
      - name: ip
        label: "Intellectual Property"
      - name: dispute
        label: "Dispute Resolution"
      - name: boilerplate
        label: "Boilerplate/Miscellaneous"

Évaluation des risques

yaml
annotation_schemes:
  - annotation_type: likert
    name: risk_level
    question: "Rate the risk level of this clause for [Party]"
    min_label: "Low Risk"
    max_label: "High Risk"
    size: 5
 
  - annotation_type: text
    name: risk_notes
    question: "Explain the risk factors"
    multiline: true
    required_if:
      field: risk_level
      operator: ">="
      value: 4

Annotation de documents judiciaires

Extraction des informations de l'affaire

yaml
annotation_schemes:
  - annotation_type: span
    name: case_entities
    labels:
      - name: CASE_NUMBER
        description: "Case identifier"
 
      - name: COURT
        description: "Court name and jurisdiction"
 
      - name: JUDGE
        description: "Presiding judge"
 
      - name: PLAINTIFF
        description: "Plaintiff/Petitioner"
 
      - name: DEFENDANT
        description: "Defendant/Respondent"
 
      - name: ATTORNEY
        description: "Attorneys/Legal representatives"
 
      - name: LEGAL_CITATION
        description: "Citations to cases, statutes, regulations"
 
      - name: RULING
        description: "Court's ruling or order"

Structure de l'argumentation

yaml
annotation_schemes:
  - annotation_type: span
    name: argument_structure
    labels:
      - name: CLAIM
        color: "#FECACA"
        description: "Main claim or assertion"
 
      - name: PREMISE
        color: "#BBF7D0"
        description: "Supporting premise"
 
      - name: EVIDENCE
        color: "#BFDBFE"
        description: "Evidence cited"
 
      - name: REBUTTAL
        color: "#FED7AA"
        description: "Counter-argument"
 
      - name: CONCLUSION
        color: "#E0E7FF"
        description: "Conclusion drawn"

Surlignage des termes juridiques

yaml
display:
  keyword_highlighting:
    enabled: true
 
    categories:
      - name: obligation_words
        color: "#FEE2E2"
        keywords:
          - shall
          - must
          - will
          - agrees to
          - is required to
          - is obligated to
 
      - name: permission_words
        color: "#D1FAE5"
        keywords:
          - may
          - is permitted to
          - has the right to
          - is entitled to
 
      - name: prohibition_words
        color: "#FEF3C7"
        keywords:
          - shall not
          - must not
          - may not
          - is prohibited from
 
      - name: condition_words
        color: "#DBEAFE"
        keywords:
          - if
          - unless
          - provided that
          - subject to
          - contingent upon
          - in the event that

Contrôle qualité pour le juridique

yaml
quality_control:
  # Require legal training
  qualification:
    required_training: legal_annotation_training
    training_accuracy: 0.85
 
  # Domain expertise check
  attention_checks:
    enabled: true
    items:
      - text: |
          "Notwithstanding any provision herein to the contrary,
          Licensee shall indemnify Licensor against all claims."
        expected:
          obligation_type: indemnification
          obligated_party: "Licensee"
        type: domain_knowledge
 
  # High agreement required
  redundancy:
    annotations_per_item: 3
    agreement_threshold: 0.8
    on_disagreement: expert_review
 
  # Expert review layer
  expert_review:
    enabled: true
    review_threshold: 0.7
    expert_users: [legal_expert_1, legal_expert_2]

Configuration complète d'annotation juridique

yaml
annotation_task_name: "Contract Clause Analysis"
 
display:
  text_display: html
 
  # Section context
  context:
    show_document_metadata: true
    show_section_hierarchy: true
 
  # Legal term highlighting
  keyword_highlighting:
    enabled: true
    categories:
      - name: obligations
        color: "#FEE2E2"
        keywords: [shall, must, will, agrees]
      - name: conditions
        color: "#DBEAFE"
        keywords: [if, unless, provided that, subject to]
      - name: defined_terms
        pattern: '\b[A-Z][a-zA-Z]+(?:\s+[A-Z][a-zA-Z]+)*\b'
        color: "#FEF3C7"
 
annotation_schemes:
  # Clause type
  - annotation_type: radio
    name: clause_type
    question: "Classify this clause"
    options:
      - name: license_grant
        label: "License Grant"
      - name: payment
        label: "Payment/Consideration"
      - name: term
        label: "Term/Termination"
      - name: indemnification
        label: "Indemnification"
      - name: limitation
        label: "Limitation of Liability"
      - name: confidentiality
        label: "Confidentiality"
      - name: other
        label: "Other"
 
  # Entity spans
  - annotation_type: span
    name: entities
    labels:
      - name: PARTY
        color: "#FECACA"
      - name: DEFINED_TERM
        color: "#FDE68A"
      - name: MONETARY
        color: "#C4B5FD"
      - name: DATE
        color: "#BBF7D0"
      - name: OBLIGATION
        color: "#BFDBFE"
 
  # Risk assessment
  - annotation_type: likert
    name: risk
    question: "Risk level for the receiving party?"
    size: 5
    min_label: "Low"
    max_label: "High"
 
  # Key issues
  - annotation_type: text
    name: issues
    question: "Note any unusual or problematic language"
    multiline: true
 
quality_control:
  redundancy:
    annotations_per_item: 2
    agreement_threshold: 0.75
 
  qualification:
    required_training: true
    training_items: 20
    training_accuracy: 0.8

Rédiger les consignes pour les annotateurs

De bonnes consignes pour une tâche juridique précisent la portée (quels documents, quelles juridictions) et incluent un glossaire afin que les annotateurs interprètent les termes de la même façon que vous. Soyez explicite sur le langage ambigu et sur les cas où une référence croisée doit être annotée plutôt qu'ignorée. Et indiquez ce qui constitue une délimitation de span correcte, car « à peu près » suffit rarement dans ce domaine.

Bonnes pratiques

Faites appel à des annotateurs qui connaissent réellement le domaine, car l'étiquetage juridique ne fonctionne pas sans cela. Découpez les documents longs en sections que l'on peut garder à l'esprit. Surlignez le langage juridique qui pèse, pour qu'il ne soit pas survolé. Maintenez une redondance élevée, car le coût d'une erreur est élevé. Ajoutez une couche de révision experte où des avocats traitent les cas limites. Définissez chaque étiquette avec précision. Et donnez aux annotateurs la structure environnante et les sections connexes, car une clause isolée peut induire en erreur.


Documentation complète sur /docs/core-concepts/annotation-schemes.