Skip to content
Guides6 min read

콘텐츠 모더레이션 어노테이션 설정

독성 탐지, 혐오 발언 분류, 민감 콘텐츠 라벨링을 위해 Potato를 설정하되 어노테이터의 웰빙을 염두에 두는 방법을 알아봅니다.

Potato Team

독성 콘텐츠를 라벨링하는 일은 영화 리뷰를 라벨링하는 것과 다릅니다. 이 작업은 사람을 지치게 하고, 지침은 늘 조금 모호하며, 가장 중요한 사례들은 보통 어노테이터들이 서로 의견이 갈리는 것들입니다. 이 가이드는 이러한 문제들을 진지하게 다루는 콘텐츠 모더레이션 작업을 Potato에서 구성하는 방법을 다루며, 그 일을 하는 사람들에서 출발합니다.

어노테이터 웰빙

혐오스럽고 충격적인 콘텐츠를 몇 시간씩 읽는 것은 그 일을 하는 사람들에게 실제로 영향을 미칩니다. 몇 가지 설정이 작업을 견딜 만하게 유지하는 데 도움이 됩니다.

웰빙 구성

yaml
wellbeing:
  # Content warnings
  warnings:
    enabled: true
    show_before_session: true
    message: |
      This task involves reviewing potentially offensive content including
      hate speech, harassment, and explicit material. Take breaks as needed.
 
  # Break reminders
  breaks:
    enabled: true
    reminder_interval: 30  # minutes
    break_duration: 5  # suggested minutes
    message: "Consider taking a short break. Your wellbeing matters."
 
  # Session limits
  limits:
    max_session_duration: 120  # minutes
    max_items_per_session: 100
    cooldown_between_sessions: 60  # minutes
 
  # Easy exit
  exit:
    allow_immediate_exit: true
    no_penalty_exit: true
    exit_button_prominent: true
    exit_message: "No problem. Take care of yourself."
 
  # Support resources
  resources:
    show_support_link: true
    support_url: "https://yourorg.com/support"
    hotline_number: "1-800-XXX-XXXX"

콘텐츠 흐리게 처리

yaml
display:
  # Blur images by default
  image_display:
    blur_by_default: true
    blur_amount: 20
    click_to_reveal: true
    reveal_duration: 10  # auto-blur after 10 seconds
 
  # Text content warnings
  text_display:
    show_severity_indicator: true
    expandable_content: true
    default_collapsed: true

독성 분류

다단계 독성

yaml
annotation_schemes:
  - annotation_type: radio
    name: toxicity_level
    question: "Rate the toxicity level of this content"
    options:
      - name: none
        label: "Not Toxic"
        description: "No harmful content"
 
      - name: mild
        label: "Mildly Toxic"
        description: "Rude or insensitive but not severe"
 
      - name: moderate
        label: "Moderately Toxic"
        description: "Clearly offensive or harmful"
 
      - name: severe
        label: "Severely Toxic"
        description: "Extremely offensive, threatening, or dangerous"

독성 범주

yaml
annotation_schemes:
  - annotation_type: multiselect
    name: toxicity_types
    question: "Select all types of toxicity present"
    options:
      - name: profanity
        label: "Profanity/Obscenity"
        description: "Swear words, vulgar language"
 
      - name: insult
        label: "Insults"
        description: "Personal attacks, name-calling"
 
      - name: threat
        label: "Threats"
        description: "Threats of violence or harm"
 
      - name: hate_speech
        label: "Hate Speech"
        description: "Targeting protected groups"
 
      - name: harassment
        label: "Harassment"
        description: "Targeted, persistent hostility"
 
      - name: sexual
        label: "Sexual Content"
        description: "Explicit or suggestive content"
 
      - name: self_harm
        label: "Self-Harm/Suicide"
        description: "Promoting or glorifying self-harm"
 
      - name: misinformation
        label: "Misinformation"
        description: "Demonstrably false claims"
 
      - name: spam
        label: "Spam/Scam"
        description: "Unwanted promotional content"

혐오 발언 탐지

대상 집단

yaml
annotation_schemes:
  - annotation_type: multiselect
    name: target_groups
    question: "Which groups are targeted? (if hate speech detected)"
    depends_on:
      field: toxicity_types
      contains: hate_speech
 
    options:
      - name: race_ethnicity
        label: "Race/Ethnicity"
 
      - name: religion
        label: "Religion"
 
      - name: gender
        label: "Gender"
 
      - name: sexual_orientation
        label: "Sexual Orientation"
 
      - name: disability
        label: "Disability"
 
      - name: nationality
        label: "Nationality/Origin"
 
      - name: age
        label: "Age"
 
      - name: other
        label: "Other Protected Group"

혐오 발언 심각도

yaml
annotation_schemes:
  - annotation_type: radio
    name: hate_severity
    question: "Severity of hate speech"
    depends_on:
      field: toxicity_types
      contains: hate_speech
 
    options:
      - name: implicit
        label: "Implicit"
        description: "Coded language, dog whistles"
 
      - name: explicit_mild
        label: "Explicit - Mild"
        description: "Clear but not threatening"
 
      - name: explicit_severe
        label: "Explicit - Severe"
        description: "Dehumanizing, threatening, or violent"

맥락 기반 모더레이션

플랫폼별 규칙

yaml
# Context affects what's acceptable
annotation_schemes:
  - annotation_type: radio
    name: context_appropriate
    question: "Is this content appropriate for the platform context?"
    context_info:
      platform: "{{metadata.platform}}"
      community: "{{metadata.community}}"
      audience: "{{metadata.audience}}"
 
    options:
      - name: appropriate
        label: "Appropriate for Context"
 
      - name: borderline
        label: "Borderline"
 
      - name: inappropriate
        label: "Inappropriate for Context"
 
  - annotation_type: text
    name: context_notes
    question: "Explain your contextual reasoning"
    depends_on:
      field: context_appropriate
      value: borderline

의도 탐지

yaml
annotation_schemes:
  - annotation_type: radio
    name: intent
    question: "What is the apparent intent?"
    options:
      - name: genuine_attack
        label: "Genuine Attack"
        description: "Intent to harm or offend"
 
      - name: satire
        label: "Satire/Parody"
        description: "Mocking toxic behavior"
 
      - name: quote
        label: "Quote/Report"
        description: "Reporting or discussing toxic content"
 
      - name: reclaimed
        label: "Reclaimed Language"
        description: "In-group use of slurs"
 
      - name: unclear
        label: "Unclear Intent"

이미지 콘텐츠 모더레이션

시각 콘텐츠 분류

yaml
annotation_schemes:
  - annotation_type: multiselect
    name: image_violations
    question: "Select all policy violations"
    options:
      - name: nudity
        label: "Nudity/Sexual Content"
 
      - name: violence_graphic
        label: "Graphic Violence"
 
      - name: gore
        label: "Gore/Disturbing Content"
 
      - name: hate_symbols
        label: "Hate Symbols"
 
      - name: dangerous_acts
        label: "Dangerous Acts"
 
      - name: child_safety
        label: "Child Safety Concern"
        priority: critical
        escalate: true
 
      - name: none
        label: "No Violations"
 
  - annotation_type: radio
    name: action_recommendation
    question: "Recommended action"
    options:
      - name: approve
        label: "Approve"
      - name: age_restrict
        label: "Age-Restrict"
      - name: warning_label
        label: "Add Warning Label"
      - name: remove
        label: "Remove"
      - name: escalate
        label: "Escalate to Specialist"

품질 관리

yaml
quality_control:
  # Calibration for subjective content
  calibration:
    enabled: true
    frequency: 20  # Every 20 items
    items: calibration/moderation_gold.json
    feedback: true
    recalibrate_on_drift: true
 
  # High redundancy for borderline cases
  redundancy:
    annotations_per_item: 3
    increase_for_borderline: 5
    agreement_threshold: 0.67
 
  # Expert escalation
  escalation:
    enabled: true
    triggers:
      - field: toxicity_level
        value: severe
      - field: image_violations
        contains: child_safety
    escalate_to: trust_safety_team
 
  # Distribution monitoring
  monitoring:
    track_distribution: true
    alert_on_skew: true
    expected_distribution:
      none: 0.4
      mild: 0.3
      moderate: 0.2
      severe: 0.1

전체 구성

yaml
annotation_task_name: "Content Moderation"
 
# Wellbeing first
wellbeing:
  warnings:
    enabled: true
    message: "This task contains potentially offensive content."
  breaks:
    reminder_interval: 30
    message: "Remember to take breaks."
  limits:
    max_session_duration: 90
    max_items_per_session: 75
 
display:
  # Blur sensitive content
  image_display:
    blur_by_default: true
    click_to_reveal: true
 
  # Show platform context
  metadata_display:
    show_fields: [platform, community, report_reason]
 
annotation_schemes:
  # Toxicity level
  - annotation_type: radio
    name: toxicity
    question: "Toxicity level"
    options:
      - name: none
        label: "None"
      - name: mild
        label: "Mild"
      - name: moderate
        label: "Moderate"
      - name: severe
        label: "Severe"
 
  # Categories
  - annotation_type: multiselect
    name: categories
    question: "Types of harmful content (select all)"
    options:
      - name: hate
        label: "Hate Speech"
      - name: harassment
        label: "Harassment"
      - name: violence
        label: "Violence/Threats"
      - name: sexual
        label: "Sexual Content"
      - name: self_harm
        label: "Self-Harm"
      - name: spam
        label: "Spam"
      - name: none
        label: "None"
 
  # Confidence
  - annotation_type: likert
    name: confidence
    question: "How confident are you?"
    size: 5
    min_label: "Uncertain"
    max_label: "Very Confident"
 
  # Notes
  - annotation_type: text
    name: notes
    question: "Additional notes (optional)"
    multiline: true
 
quality_control:
  redundancy:
    annotations_per_item: 3
  calibration:
    enabled: true
    frequency: 25
  escalation:
    enabled: true
    triggers:
      - field: toxicity
        value: severe
 
output_annotation_dir: annotations/
export_annotation_format: jsonl

지침 작성하기

어노테이터 간 의견 차이는 대부분 모호한 지침으로 거슬러 올라가므로, 바로 여기에 들이는 노력이 보답을 줍니다. 어노테이터가 짐작하도록 내버려 두는 대신 "mild"와 "moderate"를 무엇이 가르는지 명확히 적으세요. 경계 사례를 보여주고, 각 사례가 왜 그 자리에 놓였는지 설명을 덧붙이세요. 같은 문장이 한 커뮤니티에서는 괜찮고 다른 곳에서는 위반일 수 있으므로, 플랫폼과 청중이 판단을 어떻게 바꾸는지 밝히세요. 그리고 의도가 정말로 불분명할 때 어떻게 해야 하는지를, 그런 일이 결코 없는 척하지 말고 어노테이터에게 알려주세요. 새로운 종류의 콘텐츠가 등장하면서 이 모든 것을 수정하게 되리라 예상하세요.

어노테이터 지원하기

한 사람을 하루 종일 독성 콘텐츠에 붙박아 두지 말고, 작업을 돌려 가며 맡기세요. 정신 건강 자원을 한 번의 클릭으로 닿을 수 있게 두고, 우려를 알릴 진짜 통로를 사람들에게 주며, 이것이 힘든 일임을 인정하세요. 실제로 그렇습니다.

기저의 분류 스키마가 어떻게 작동하는지는 텍스트 어노테이션 문서를 참고하세요. 품질 관리 가이드는 중복 처리와 보정을 더 자세히 다룹니다.


전체 문서는 /docs/core-concepts/annotation-schemes에서 확인하세요.