配置基础
学习Potato配置文件的基本知识。
配置基础
Potato 使用 YAML 配置文件来定义标注任务。本指南涵盖了基本的配置选项。
配置文件结构
一个基本的 Potato 配置包含以下主要部分:
yaml
# Task settings
annotation_task_name: "My Annotation Task"
port: 8000
# Data configuration
data_files:
- data.json
item_properties:
id_key: id
text_key: text
# Output settings
output_annotation_dir: "annotation_output/"
output_annotation_format: "json"
# Annotation schemes
annotation_schemes:
- annotation_type: radio
name: my_annotation
labels:
- Label 1
- Label 2
# User settings
allow_all_users: true基本设置
任务和服务器配置
yaml
annotation_task_name: "My Task" # Display name for your task
port: 8000 # Port to run the server on数据配置
yaml
data_files:
- data.json # Path to your data file(s)
- more_data.json # You can specify multiple files
item_properties:
id_key: id # Field containing unique ID
text_key: text # Field containing text to annotate支持的数据格式:
- JSON (
.json) - JSON Lines (
.jsonl) - CSV (
.csv) - TSV (
.tsv)
输出配置
yaml
output_annotation_dir: "annotation_output/" # Directory for annotation files
output_annotation_format: "json" # Format: json, jsonl, csv, tsv标注方案
定义一个或多个标注方案:
yaml
annotation_schemes:
- annotation_type: radio # Type of annotation
name: sentiment # Internal name
description: "Select the sentiment" # Instructions
labels: # Options for annotators
- Positive
- Negative
- Neutral可用的标注类型
| 类型 | 描述 |
|---|---|
radio | 单选 |
multiselect | 多选 |
likert | 量表评分 |
text | 自由文本输入 |
number | 数字输入 |
span | 文本片段高亮 |
slider | 连续范围选择 |
multirate | 多项评分 |
用户配置
允许所有用户
yaml
allow_all_users: true限制特定用户
yaml
allow_all_users: false
authorized_users:
- user1@example.com
- user2@example.com任务目录
task_dir 设置定义了相对路径的根目录:
yaml
task_dir: ./my-task/
data_files:
- data/input.json # Resolves to ./my-task/data/input.json完整示例
以下是一个完整的情感分析任务配置:
yaml
# config.yaml
annotation_task_name: "Sentiment Analysis"
port: 8000
task_dir: ./
# Data
data_files:
- data/tweets.json
item_properties:
id_key: id
text_key: text
context_key: metadata
# Output
output_annotation_dir: "annotation_output/"
output_annotation_format: "json"
# Annotation
annotation_schemes:
- annotation_type: radio
name: sentiment
description: "What is the sentiment expressed in this tweet?"
labels:
- name: Positive
key_value: "1"
- name: Negative
key_value: "2"
- name: Neutral
key_value: "3"
sequential_key_binding: true
# Users
allow_all_users: true
# Assignment
instances_per_annotator: 100
annotation_per_instance: 2