情感分析是一项基础的 NLP 任务，Potato 让收集高质量情感标签变得简单。在本教程中，我们将构建一个功能齐全的生产级情感标注界面。

项目概述

我们将创建一个标注社交媒体帖子的界面，包含：

三分类情感分类（正面、负面、中性）
每个标注的置信度评分
可选的文字说明
键盘快捷键以提高速度
质量控制措施

完整配置

以下是完整的 config.yaml：

yaml

annotation_task_name: "Social Media Sentiment Analysis"
 
# Data configuration
data_files:
  - "data/tweets.json"
 
item_properties:
  id_key: id
  text_key: text
 
# Annotation interface
annotation_schemes:
  # Primary sentiment label
  - annotation_type: radio
    name: sentiment
    description: "What is the overall sentiment of this post?"
    labels:
      - name: Positive
        tooltip: "Expresses happiness, satisfaction, or approval"
        keyboard_shortcut: "1"
      - name: Negative
        tooltip: "Expresses sadness, frustration, or disapproval"
        keyboard_shortcut: "2"
      - name: Neutral
        tooltip: "Factual, objective, or lacks emotional content"
        keyboard_shortcut: "3"
    required: true
 
  # Confidence rating
  - annotation_type: likert
    name: confidence
    description: "How confident are you in your sentiment label?"
    size: 5
    min_label: "Not confident"
    max_label: "Very confident"
    required: true
 
  # Optional explanation
  - annotation_type: text
    name: explanation
    description: "Why did you choose this label? (Optional)"
    multiline: true
    required: false
    placeholder: "Explain your reasoning..."
 
# Guidelines
annotation_guidelines:
  title: "Sentiment Annotation Guidelines"
  content: |
    ## Your Task
    Classify the sentiment expressed in each social media post.
 
    ## Labels
 
    **Positive**: The author expresses positive emotions or opinions
    - Happiness, excitement, gratitude
    - Praise, recommendations, approval
    - Examples: "Love this!", "Best day ever!", "Highly recommend"
 
    **Negative**: The author expresses negative emotions or opinions
    - Anger, frustration, sadness
    - Complaints, criticism, disapproval
    - Examples: "Terrible service", "So disappointed", "Worst experience"
 
    **Neutral**: Factual or lacking clear sentiment
    - News, announcements, questions
    - Mixed or balanced opinions
    - Examples: "The store opens at 9am", "Has anyone tried this?"
 
    ## Tips
    - Focus on the author's sentiment, not the topic
    - Sarcasm should be labeled based on intended meaning
    - When unsure, lower your confidence rating
 
# User management
automatic_assignment:
  on: true
  sampling_strategy: random
  labels_per_instance: 1
  instance_per_annotator: 100

示例数据格式

创建 data/tweets.json：

json

{"id": "t001", "text": "Just got my new laptop and I'm absolutely loving it! Best purchase of the year! #happy"}
{"id": "t002", "text": "Waited 2 hours for customer service and they still couldn't help me. Never shopping here again."}
{"id": "t003", "text": "The new coffee shop on Main Street opens tomorrow at 7am."}
{"id": "t004", "text": "This movie was okay I guess. Some good parts, some boring parts."}
{"id": "t005", "text": "Can't believe how beautiful the sunset was tonight! Nature is amazing."}

运行任务

启动标注服务器：

bash

potato start config.yaml

导航到 http://localhost:8000 并登录开始标注。

理解界面

主标注区域

界面显示：

要标注的文本（高亮显示 URL、提及、话题标签）
带有工具提示的情感单选按钮
置信度 Likert 量表
可选的说明文本框

键盘工作流

为了最高效率：

阅读文本
按 1、2 或 3 选择情感
点击置信度级别（或使用鼠标）
按 Enter 提交

进度跟踪

界面显示：

当前进度（例如 "15 / 100"）
预计剩余时间
会话统计

输出格式

标注保存到 annotations/username.jsonl：

json

{
  "id": "t001",
  "text": "Just got my new laptop and I'm absolutely loving it!...",
  "annotations": {
    "sentiment": "Positive",
    "confidence": 5,
    "explanation": "Clear expression of happiness with the purchase"
  },
  "annotator": "john_doe",
  "timestamp": "2026-01-15T14:30:00Z"
}

添加质量控制

注意力检查

添加黄金标准项目以验证标注者注意力：

yaml

quality_control:
  attention_checks:
    enabled: true
    frequency: 10  # Every 10th item
    items:
      - text: "I am extremely happy and satisfied! This is the best!"
        expected:
          sentiment: "Positive"
      - text: "This is absolutely terrible and I hate it completely."
        expected:
          sentiment: "Negative"

标注者间一致性

对于研究项目，启用多重标注：

yaml

automatic_assignment:
  on: true
  sampling_strategy: random
  labels_per_instance: 3  # Each item annotated by 3 people
  instance_per_annotator: 50

分析结果

导出并分析您的标注：

python

import json
from collections import Counter
 
# Load annotations
annotations = []
with open('annotations/annotator1.jsonl') as f:
    for line in f:
        annotations.append(json.loads(line))
 
# Sentiment distribution
sentiments = Counter(a['annotations']['sentiment'] for a in annotations)
print(f"Sentiment distribution: {dict(sentiments)}")
 
# Average confidence
confidences = [a['annotations']['confidence'] for a in annotations]
print(f"Average confidence: {sum(confidences)/len(confidences):.2f}")

下一步

设置众包进行大规模标注
添加 AI 建议加速标注
实现主动学习优先处理困难案例

在我们的文档中探索更多标注类型。