Tutorials4 min read
创建情感分析任务
构建一个完整的情感分类任务,包含单选按钮、工具提示和键盘快捷键,实现高效标注。
Potato Team·
创建情感分析任务
情感分析是一项基础的 NLP 任务,Potato 让收集高质量情感标签变得简单。在本教程中,我们将构建一个功能齐全的生产级情感标注界面。
项目概述
我们将创建一个标注社交媒体帖子的界面,包含:
- 三分类情感分类(正面、负面、中性)
- 每个标注的置信度评分
- 可选的文字说明
- 键盘快捷键以提高速度
- 质量控制措施
完整配置
以下是完整的 config.yaml:
yaml
annotation_task_name: "Social Media Sentiment Analysis"
# Data configuration
data_files:
- "data/tweets.json"
item_properties:
id_key: id
text_key: text
# Annotation interface
annotation_schemes:
# Primary sentiment label
- annotation_type: radio
name: sentiment
description: "What is the overall sentiment of this post?"
labels:
- name: Positive
tooltip: "Expresses happiness, satisfaction, or approval"
keyboard_shortcut: "1"
- name: Negative
tooltip: "Expresses sadness, frustration, or disapproval"
keyboard_shortcut: "2"
- name: Neutral
tooltip: "Factual, objective, or lacks emotional content"
keyboard_shortcut: "3"
required: true
# Confidence rating
- annotation_type: likert
name: confidence
description: "How confident are you in your sentiment label?"
size: 5
min_label: "Not confident"
max_label: "Very confident"
required: true
# Optional explanation
- annotation_type: text
name: explanation
description: "Why did you choose this label? (Optional)"
textarea: true
required: false
placeholder: "Explain your reasoning..."
# Guidelines
annotation_guidelines:
title: "Sentiment Annotation Guidelines"
content: |
## Your Task
Classify the sentiment expressed in each social media post.
## Labels
**Positive**: The author expresses positive emotions or opinions
- Happiness, excitement, gratitude
- Praise, recommendations, approval
- Examples: "Love this!", "Best day ever!", "Highly recommend"
**Negative**: The author expresses negative emotions or opinions
- Anger, frustration, sadness
- Complaints, criticism, disapproval
- Examples: "Terrible service", "So disappointed", "Worst experience"
**Neutral**: Factual or lacking clear sentiment
- News, announcements, questions
- Mixed or balanced opinions
- Examples: "The store opens at 9am", "Has anyone tried this?"
## Tips
- Focus on the author's sentiment, not the topic
- Sarcasm should be labeled based on intended meaning
- When unsure, lower your confidence rating
# User management
automatic_assignment:
on: true
sampling_strategy: random
labels_per_instance: 1
instance_per_annotator: 100示例数据格式
创建 data/tweets.json:
json
{"id": "t001", "text": "Just got my new laptop and I'm absolutely loving it! Best purchase of the year! #happy"}
{"id": "t002", "text": "Waited 2 hours for customer service and they still couldn't help me. Never shopping here again."}
{"id": "t003", "text": "The new coffee shop on Main Street opens tomorrow at 7am."}
{"id": "t004", "text": "This movie was okay I guess. Some good parts, some boring parts."}
{"id": "t005", "text": "Can't believe how beautiful the sunset was tonight! Nature is amazing."}运行任务
启动标注服务器:
bash
potato start config.yaml导航到 http://localhost:8000 并登录开始标注。
理解界面
主标注区域
界面显示:
- 要标注的文本(高亮显示 URL、提及、话题标签)
- 带有工具提示的情感单选按钮
- 置信度 Likert 量表
- 可选的说明文本框
键盘工作流
为了最高效率:
- 阅读文本
- 按
1、2或3选择情感 - 点击置信度级别(或使用鼠标)
- 按
Enter提交
进度跟踪
界面显示:
- 当前进度(例如 "15 / 100")
- 预计剩余时间
- 会话统计
输出格式
标注保存到 annotations/username.jsonl:
json
{
"id": "t001",
"text": "Just got my new laptop and I'm absolutely loving it!...",
"annotations": {
"sentiment": "Positive",
"confidence": 5,
"explanation": "Clear expression of happiness with the purchase"
},
"annotator": "john_doe",
"timestamp": "2026-01-15T14:30:00Z"
}添加质量控制
注意力检查
添加黄金标准项目以验证标注者注意力:
yaml
quality_control:
attention_checks:
enabled: true
frequency: 10 # Every 10th item
items:
- text: "I am extremely happy and satisfied! This is the best!"
expected:
sentiment: "Positive"
- text: "This is absolutely terrible and I hate it completely."
expected:
sentiment: "Negative"标注者间一致性
对于研究项目,启用多重标注:
yaml
automatic_assignment:
on: true
sampling_strategy: random
labels_per_instance: 3 # Each item annotated by 3 people
instance_per_annotator: 50分析结果
导出并分析您的标注:
python
import json
from collections import Counter
# Load annotations
annotations = []
with open('annotations/annotator1.jsonl') as f:
for line in f:
annotations.append(json.loads(line))
# Sentiment distribution
sentiments = Counter(a['annotations']['sentiment'] for a in annotations)
print(f"Sentiment distribution: {dict(sentiments)}")
# Average confidence
confidences = [a['annotations']['confidence'] for a in annotations]
print(f"Average confidence: {sum(confidences)/len(confidences):.2f}")下一步
在我们的文档中探索更多标注类型。