Skip to content
Guides6 min read

법률 문서 주석 모범 사례

스팬 라벨링, 개체 추출, 프라이버시를 우선하는 자체 호스팅 배포로 Potato에서 계약서, 법원 제출 서류, 규제 텍스트 등 법률 문서에 주석을 답니다.

Potato Team

법률 텍스트는 그 자체로 별개의 주석 문제입니다. 구조가 복잡하고, 어휘가 전문적이며, 라벨 하나가 잘못되면 실제 법적 결과로 이어질 수 있습니다. 이 가이드는 그런 텍스트에 주석을 잘 다는 방법을 다룹니다.

법률 주석이 어려운 이유

전문 용어만으로도 무작위로 뽑은 크라우드 작업자가 아니라 훈련된 안노테이터가 필요합니다. 문서는 길어서 계약서가 때로는 수백 페이지에 이르고, 각 조항은 끊임없이 다른 조항을 가리킵니다. 엉성한 스팬 하나가 조항의 의미를 바꿀 수 있으므로 정확성이 평소보다 더 중요합니다. 그리고 그 의미의 상당 부분이 문서 유형과 관할권에 좌우되므로, 맥락은 결코 선택 사항이 아닙니다.

기반이 되는 스팬 및 텍스트 주석 메커니즘은 원본 문서를 참고하십시오.

문서 분할

긴 문서 나누기

yaml
annotation_task_name: "Legal Document Annotation"
 
display:
  # Segment by section
  segmentation:
    enabled: true
    method: section_headers
    pattern: '^\d+\.\s+[A-Z]'
 
  # Show document context
  context:
    show_previous_section: true
    show_section_hierarchy: true
 
  # Navigation
  navigation:
    show_outline: true
    jump_to_section: true

섹션 단위 주석

yaml
data_files:
  - contracts.json
 
item_properties:
  id_key: id
  text_key: text
 
preprocessing:
  segment_by: sections
  preserve_metadata: true
  include_section_number: true
 
# Each section becomes an annotation item
# {
#   "id": "contract_001_section_3.2",
#   "text": "The Licensor grants...",
#   "section_number": "3.2",
#   "section_title": "License Grant",
#   "document_id": "contract_001"
# }

법률 개체 인식

계약 특화 개체

yaml
annotation_schemes:
  - annotation_type: span
    name: legal_entities
    labels:
      - name: PARTY
        color: "#FECACA"
        description: "Contracting parties (Licensor, Licensee, Company, etc.)"
 
      - name: DEFINED_TERM
        color: "#FDE68A"
        description: "Defined terms (usually capitalized)"
 
      - name: DATE
        color: "#BBF7D0"
        description: "Dates and time periods"
 
      - name: MONETARY
        color: "#C4B5FD"
        description: "Dollar amounts, fees, penalties"
 
      - name: OBLIGATION
        color: "#BFDBFE"
        description: "Must, shall, will obligations"
 
      - name: CONDITION
        color: "#FED7AA"
        description: "If, unless, provided that conditions"
 
      - name: REFERENCE
        color: "#E0E7FF"
        description: "References to other sections or documents"

의무 탐지

yaml
annotation_schemes:
  - annotation_type: multiselect
    name: obligation_type
    question: "What type of obligation is this?"
    options:
      - name: performance
        label: "Performance Obligation"
        description: "Party must do something"
 
      - name: payment
        label: "Payment Obligation"
        description: "Party must pay"
 
      - name: restriction
        label: "Restriction/Prohibition"
        description: "Party must not do something"
 
      - name: condition
        label: "Conditional Obligation"
        description: "Obligation triggered by condition"
 
      - name: warranty
        label: "Warranty/Representation"
        description: "Statement of fact or promise"

조항 분류

계약 조항 유형

yaml
annotation_schemes:
  - annotation_type: radio
    name: clause_type
    question: "What type of clause is this?"
    options:
      - name: definitions
        label: "Definitions"
      - name: grant
        label: "Grant of Rights/License"
      - name: consideration
        label: "Consideration/Payment"
      - name: term
        label: "Term and Termination"
      - name: representations
        label: "Representations & Warranties"
      - name: indemnification
        label: "Indemnification"
      - name: limitation
        label: "Limitation of Liability"
      - name: confidentiality
        label: "Confidentiality"
      - name: ip
        label: "Intellectual Property"
      - name: dispute
        label: "Dispute Resolution"
      - name: boilerplate
        label: "Boilerplate/Miscellaneous"

위험 평가

yaml
annotation_schemes:
  - annotation_type: likert
    name: risk_level
    question: "Rate the risk level of this clause for [Party]"
    min_label: "Low Risk"
    max_label: "High Risk"
    size: 5
 
  - annotation_type: text
    name: risk_notes
    question: "Explain the risk factors"
    multiline: true
    required_if:
      field: risk_level
      operator: ">="
      value: 4

법원 문서 주석

사건 정보 추출

yaml
annotation_schemes:
  - annotation_type: span
    name: case_entities
    labels:
      - name: CASE_NUMBER
        description: "Case identifier"
 
      - name: COURT
        description: "Court name and jurisdiction"
 
      - name: JUDGE
        description: "Presiding judge"
 
      - name: PLAINTIFF
        description: "Plaintiff/Petitioner"
 
      - name: DEFENDANT
        description: "Defendant/Respondent"
 
      - name: ATTORNEY
        description: "Attorneys/Legal representatives"
 
      - name: LEGAL_CITATION
        description: "Citations to cases, statutes, regulations"
 
      - name: RULING
        description: "Court's ruling or order"

논증 구조

yaml
annotation_schemes:
  - annotation_type: span
    name: argument_structure
    labels:
      - name: CLAIM
        color: "#FECACA"
        description: "Main claim or assertion"
 
      - name: PREMISE
        color: "#BBF7D0"
        description: "Supporting premise"
 
      - name: EVIDENCE
        color: "#BFDBFE"
        description: "Evidence cited"
 
      - name: REBUTTAL
        color: "#FED7AA"
        description: "Counter-argument"
 
      - name: CONCLUSION
        color: "#E0E7FF"
        description: "Conclusion drawn"

법률 용어 강조

yaml
display:
  keyword_highlighting:
    enabled: true
 
    categories:
      - name: obligation_words
        color: "#FEE2E2"
        keywords:
          - shall
          - must
          - will
          - agrees to
          - is required to
          - is obligated to
 
      - name: permission_words
        color: "#D1FAE5"
        keywords:
          - may
          - is permitted to
          - has the right to
          - is entitled to
 
      - name: prohibition_words
        color: "#FEF3C7"
        keywords:
          - shall not
          - must not
          - may not
          - is prohibited from
 
      - name: condition_words
        color: "#DBEAFE"
        keywords:
          - if
          - unless
          - provided that
          - subject to
          - contingent upon
          - in the event that

법률 분야 품질 관리

yaml
quality_control:
  # Require legal training
  qualification:
    required_training: legal_annotation_training
    training_accuracy: 0.85
 
  # Domain expertise check
  attention_checks:
    enabled: true
    items:
      - text: |
          "Notwithstanding any provision herein to the contrary,
          Licensee shall indemnify Licensor against all claims."
        expected:
          obligation_type: indemnification
          obligated_party: "Licensee"
        type: domain_knowledge
 
  # High agreement required
  redundancy:
    annotations_per_item: 3
    agreement_threshold: 0.8
    on_disagreement: expert_review
 
  # Expert review layer
  expert_review:
    enabled: true
    review_threshold: 0.7
    expert_users: [legal_expert_1, legal_expert_2]

완전한 법률 주석 설정

yaml
annotation_task_name: "Contract Clause Analysis"
 
display:
  text_display: html
 
  # Section context
  context:
    show_document_metadata: true
    show_section_hierarchy: true
 
  # Legal term highlighting
  keyword_highlighting:
    enabled: true
    categories:
      - name: obligations
        color: "#FEE2E2"
        keywords: [shall, must, will, agrees]
      - name: conditions
        color: "#DBEAFE"
        keywords: [if, unless, provided that, subject to]
      - name: defined_terms
        pattern: '\b[A-Z][a-zA-Z]+(?:\s+[A-Z][a-zA-Z]+)*\b'
        color: "#FEF3C7"
 
annotation_schemes:
  # Clause type
  - annotation_type: radio
    name: clause_type
    question: "Classify this clause"
    options:
      - name: license_grant
        label: "License Grant"
      - name: payment
        label: "Payment/Consideration"
      - name: term
        label: "Term/Termination"
      - name: indemnification
        label: "Indemnification"
      - name: limitation
        label: "Limitation of Liability"
      - name: confidentiality
        label: "Confidentiality"
      - name: other
        label: "Other"
 
  # Entity spans
  - annotation_type: span
    name: entities
    labels:
      - name: PARTY
        color: "#FECACA"
      - name: DEFINED_TERM
        color: "#FDE68A"
      - name: MONETARY
        color: "#C4B5FD"
      - name: DATE
        color: "#BBF7D0"
      - name: OBLIGATION
        color: "#BFDBFE"
 
  # Risk assessment
  - annotation_type: likert
    name: risk
    question: "Risk level for the receiving party?"
    size: 5
    min_label: "Low"
    max_label: "High"
 
  # Key issues
  - annotation_type: text
    name: issues
    question: "Note any unusual or problematic language"
    multiline: true
 
quality_control:
  redundancy:
    annotations_per_item: 2
    agreement_threshold: 0.75
 
  qualification:
    required_training: true
    training_items: 20
    training_accuracy: 0.8

안노테이터 지침 작성

법률 작업을 위한 좋은 지침은 범위(어떤 문서, 어떤 관할권)를 명확히 하고, 안노테이터가 용어를 여러분과 똑같이 읽도록 용어집을 포함합니다. 모호한 표현에 대해, 그리고 상호 참조를 언제 주석으로 달고 언제 무시해야 하는지에 대해 분명히 밝히십시오. 그리고 올바른 스팬 경계로 무엇이 인정되는지 말하십시오. 이 분야에서는 "대충 비슷한 정도"로는 거의 충분하지 않기 때문입니다.

모범 사례

해당 분야를 실제로 아는 안노테이터를 쓰십시오. 법률 라벨링은 그것 없이는 작동하지 않기 때문입니다. 긴 문서는 사람이 머릿속에 담아둘 수 있는 섹션으로 나누십시오. 비중 있는 법률 표현은 강조하여 그냥 훑고 지나치지 않도록 하십시오. 오류의 비용이 크므로 중복성을 높게 유지하십시오. 변호사가 경계 사례를 처리하는 전문가 검토 계층을 추가하십시오. 각 라벨을 정확하게 정의하십시오. 그리고 안노테이터에게 주변 구조와 관련 섹션을 제공하십시오. 따로 떼어놓은 조항은 오해를 부를 수 있기 때문입니다.


전체 문서는 /docs/core-concepts/annotation-schemes에 있습니다.