독성 콘텐츠를 라벨링하는 일은 영화 리뷰를 라벨링하는 것과 다릅니다. 이 작업은 사람을 지치게 하고, 지침은 늘 조금 모호하며, 가장 중요한 사례들은 보통 어노테이터들이 서로 의견이 갈리는 것들입니다. 이 가이드는 이러한 문제들을 진지하게 다루는 콘텐츠 모더레이션 작업을 Potato에서 구성하는 방법을 다루며, 그 일을 하는 사람들에서 출발합니다.

어노테이터 웰빙

혐오스럽고 충격적인 콘텐츠를 몇 시간씩 읽는 것은 그 일을 하는 사람들에게 실제로 영향을 미칩니다. 몇 가지 설정이 작업을 견딜 만하게 유지하는 데 도움이 됩니다.

웰빙 구성

yaml

wellbeing:
  # Content warnings
  warnings:
    enabled: true
    show_before_session: true
    message: |
      This task involves reviewing potentially offensive content including
      hate speech, harassment, and explicit material. Take breaks as needed.
 
  # Break reminders
  breaks:
    enabled: true
    reminder_interval: 30  # minutes
    break_duration: 5  # suggested minutes
    message: "Consider taking a short break. Your wellbeing matters."
 
  # Session limits
  limits:
    max_session_duration: 120  # minutes
    max_items_per_session: 100
    cooldown_between_sessions: 60  # minutes
 
  # Easy exit
  exit:
    allow_immediate_exit: true
    no_penalty_exit: true
    exit_button_prominent: true
    exit_message: "No problem. Take care of yourself."
 
  # Support resources
  resources:
    show_support_link: true
    support_url: "https://yourorg.com/support"
    hotline_number: "1-800-XXX-XXXX"

콘텐츠 흐리게 처리

yaml

display:
  # Blur images by default
  image_display:
    blur_by_default: true
    blur_amount: 20
    click_to_reveal: true
    reveal_duration: 10  # auto-blur after 10 seconds
 
  # Text content warnings
  text_display:
    show_severity_indicator: true
    expandable_content: true
    default_collapsed: true

독성 분류

다단계 독성

yaml

annotation_schemes:
  - annotation_type: radio
    name: toxicity_level
    question: "Rate the toxicity level of this content"
    options:
      - name: none
        label: "Not Toxic"
        description: "No harmful content"
 
      - name: mild
        label: "Mildly Toxic"
        description: "Rude or insensitive but not severe"
 
      - name: moderate
        label: "Moderately Toxic"
        description: "Clearly offensive or harmful"
 
      - name: severe
        label: "Severely Toxic"
        description: "Extremely offensive, threatening, or dangerous"

독성 범주

yaml

annotation_schemes:
  - annotation_type: multiselect
    name: toxicity_types
    question: "Select all types of toxicity present"
    options:
      - name: profanity
        label: "Profanity/Obscenity"
        description: "Swear words, vulgar language"
 
      - name: insult
        label: "Insults"
        description: "Personal attacks, name-calling"
 
      - name: threat
        label: "Threats"
        description: "Threats of violence or harm"
 
      - name: hate_speech
        label: "Hate Speech"
        description: "Targeting protected groups"
 
      - name: harassment
        label: "Harassment"
        description: "Targeted, persistent hostility"
 
      - name: sexual
        label: "Sexual Content"
        description: "Explicit or suggestive content"
 
      - name: self_harm
        label: "Self-Harm/Suicide"
        description: "Promoting or glorifying self-harm"
 
      - name: misinformation
        label: "Misinformation"
        description: "Demonstrably false claims"
 
      - name: spam
        label: "Spam/Scam"
        description: "Unwanted promotional content"

혐오 발언 탐지

대상 집단

yaml

annotation_schemes:
  - annotation_type: multiselect
    name: target_groups
    question: "Which groups are targeted? (if hate speech detected)"
    depends_on:
      field: toxicity_types
      contains: hate_speech
 
    options:
      - name: race_ethnicity
        label: "Race/Ethnicity"
 
      - name: religion
        label: "Religion"
 
      - name: gender
        label: "Gender"
 
      - name: sexual_orientation
        label: "Sexual Orientation"
 
      - name: disability
        label: "Disability"
 
      - name: nationality
        label: "Nationality/Origin"
 
      - name: age
        label: "Age"
 
      - name: other
        label: "Other Protected Group"

혐오 발언 심각도

yaml

annotation_schemes:
  - annotation_type: radio
    name: hate_severity
    question: "Severity of hate speech"
    depends_on:
      field: toxicity_types
      contains: hate_speech
 
    options:
      - name: implicit
        label: "Implicit"
        description: "Coded language, dog whistles"
 
      - name: explicit_mild
        label: "Explicit - Mild"
        description: "Clear but not threatening"
 
      - name: explicit_severe
        label: "Explicit - Severe"
        description: "Dehumanizing, threatening, or violent"

맥락 기반 모더레이션

플랫폼별 규칙

yaml

# Context affects what's acceptable
annotation_schemes:
  - annotation_type: radio
    name: context_appropriate
    question: "Is this content appropriate for the platform context?"
    context_info:
      platform: "{{metadata.platform}}"
      community: "{{metadata.community}}"
      audience: "{{metadata.audience}}"
 
    options:
      - name: appropriate
        label: "Appropriate for Context"
 
      - name: borderline
        label: "Borderline"
 
      - name: inappropriate
        label: "Inappropriate for Context"
 
  - annotation_type: text
    name: context_notes
    question: "Explain your contextual reasoning"
    depends_on:
      field: context_appropriate
      value: borderline

의도 탐지

yaml

annotation_schemes:
  - annotation_type: radio
    name: intent
    question: "What is the apparent intent?"
    options:
      - name: genuine_attack
        label: "Genuine Attack"
        description: "Intent to harm or offend"
 
      - name: satire
        label: "Satire/Parody"
        description: "Mocking toxic behavior"
 
      - name: quote
        label: "Quote/Report"
        description: "Reporting or discussing toxic content"
 
      - name: reclaimed
        label: "Reclaimed Language"
        description: "In-group use of slurs"
 
      - name: unclear
        label: "Unclear Intent"

이미지 콘텐츠 모더레이션

시각 콘텐츠 분류

yaml

annotation_schemes:
  - annotation_type: multiselect
    name: image_violations
    question: "Select all policy violations"
    options:
      - name: nudity
        label: "Nudity/Sexual Content"
 
      - name: violence_graphic
        label: "Graphic Violence"
 
      - name: gore
        label: "Gore/Disturbing Content"
 
      - name: hate_symbols
        label: "Hate Symbols"
 
      - name: dangerous_acts
        label: "Dangerous Acts"
 
      - name: child_safety
        label: "Child Safety Concern"
        priority: critical
        escalate: true
 
      - name: none
        label: "No Violations"
 
  - annotation_type: radio
    name: action_recommendation
    question: "Recommended action"
    options:
      - name: approve
        label: "Approve"
      - name: age_restrict
        label: "Age-Restrict"
      - name: warning_label
        label: "Add Warning Label"
      - name: remove
        label: "Remove"
      - name: escalate
        label: "Escalate to Specialist"

품질 관리

yaml

quality_control:
  # Calibration for subjective content
  calibration:
    enabled: true
    frequency: 20  # Every 20 items
    items: calibration/moderation_gold.json
    feedback: true
    recalibrate_on_drift: true
 
  # High redundancy for borderline cases
  redundancy:
    annotations_per_item: 3
    increase_for_borderline: 5
    agreement_threshold: 0.67
 
  # Expert escalation
  escalation:
    enabled: true
    triggers:
      - field: toxicity_level
        value: severe
      - field: image_violations
        contains: child_safety
    escalate_to: trust_safety_team
 
  # Distribution monitoring
  monitoring:
    track_distribution: true
    alert_on_skew: true
    expected_distribution:
      none: 0.4
      mild: 0.3
      moderate: 0.2
      severe: 0.1

전체 구성

yaml

annotation_task_name: "Content Moderation"
 
# Wellbeing first
wellbeing:
  warnings:
    enabled: true
    message: "This task contains potentially offensive content."
  breaks:
    reminder_interval: 30
    message: "Remember to take breaks."
  limits:
    max_session_duration: 90
    max_items_per_session: 75
 
display:
  # Blur sensitive content
  image_display:
    blur_by_default: true
    click_to_reveal: true
 
  # Show platform context
  metadata_display:
    show_fields: [platform, community, report_reason]
 
annotation_schemes:
  # Toxicity level
  - annotation_type: radio
    name: toxicity
    question: "Toxicity level"
    options:
      - name: none
        label: "None"
      - name: mild
        label: "Mild"
      - name: moderate
        label: "Moderate"
      - name: severe
        label: "Severe"
 
  # Categories
  - annotation_type: multiselect
    name: categories
    question: "Types of harmful content (select all)"
    options:
      - name: hate
        label: "Hate Speech"
      - name: harassment
        label: "Harassment"
      - name: violence
        label: "Violence/Threats"
      - name: sexual
        label: "Sexual Content"
      - name: self_harm
        label: "Self-Harm"
      - name: spam
        label: "Spam"
      - name: none
        label: "None"
 
  # Confidence
  - annotation_type: likert
    name: confidence
    question: "How confident are you?"
    size: 5
    min_label: "Uncertain"
    max_label: "Very Confident"
 
  # Notes
  - annotation_type: text
    name: notes
    question: "Additional notes (optional)"
    multiline: true
 
quality_control:
  redundancy:
    annotations_per_item: 3
  calibration:
    enabled: true
    frequency: 25
  escalation:
    enabled: true
    triggers:
      - field: toxicity
        value: severe
 
output_annotation_dir: annotations/
export_annotation_format: jsonl

지침 작성하기

어노테이터 간 의견 차이는 대부분 모호한 지침으로 거슬러 올라가므로, 바로 여기에 들이는 노력이 보답을 줍니다. 어노테이터가 짐작하도록 내버려 두는 대신 "mild"와 "moderate"를 무엇이 가르는지 명확히 적으세요. 경계 사례를 보여주고, 각 사례가 왜 그 자리에 놓였는지 설명을 덧붙이세요. 같은 문장이 한 커뮤니티에서는 괜찮고 다른 곳에서는 위반일 수 있으므로, 플랫폼과 청중이 판단을 어떻게 바꾸는지 밝히세요. 그리고 의도가 정말로 불분명할 때 어떻게 해야 하는지를, 그런 일이 결코 없는 척하지 말고 어노테이터에게 알려주세요. 새로운 종류의 콘텐츠가 등장하면서 이 모든 것을 수정하게 되리라 예상하세요.

어노테이터 지원하기

한 사람을 하루 종일 독성 콘텐츠에 붙박아 두지 말고, 작업을 돌려 가며 맡기세요. 정신 건강 자원을 한 번의 클릭으로 닿을 수 있게 두고, 우려를 알릴 진짜 통로를 사람들에게 주며, 이것이 힘든 일임을 인정하세요. 실제로 그렇습니다.

기저의 분류 스키마가 어떻게 작동하는지는 텍스트 어노테이션 문서를 참고하세요. 품질 관리 가이드는 중복 처리와 보정을 더 자세히 다룹니다.

전체 문서는 /docs/core-concepts/annotation-schemes에서 확인하세요.