Skip to content

配置基础

学习Potato配置文件的基本知识。

配置基础

Potato 使用 YAML 配置文件来定义标注任务。本指南涵盖了基本的配置选项。

配置文件结构

一个基本的 Potato 配置包含以下主要部分:

yaml
# Task settings
annotation_task_name: "My Annotation Task"
port: 8000
 
# Data configuration
data_files:
  - data.json
 
item_properties:
  id_key: id
  text_key: text
 
# Output settings
output_annotation_dir: "annotation_output/"
output_annotation_format: "json"
 
# Annotation schemes
annotation_schemes:
  - annotation_type: radio
    name: my_annotation
    labels:
      - Label 1
      - Label 2
 
# User settings
allow_all_users: true

基本设置

任务和服务器配置

yaml
annotation_task_name: "My Task"  # Display name for your task
port: 8000                       # Port to run the server on

数据配置

yaml
data_files:
  - data.json           # Path to your data file(s)
  - more_data.json      # You can specify multiple files
 
item_properties:
  id_key: id            # Field containing unique ID
  text_key: text        # Field containing text to annotate

支持的数据格式:

  • JSON (.json)
  • JSON Lines (.jsonl)
  • CSV (.csv)
  • TSV (.tsv)

输出配置

yaml
output_annotation_dir: "annotation_output/"   # Directory for annotation files
output_annotation_format: "json"              # Format: json, jsonl, csv, tsv

标注方案

定义一个或多个标注方案:

yaml
annotation_schemes:
  - annotation_type: radio      # Type of annotation
    name: sentiment             # Internal name
    description: "Select the sentiment"  # Instructions
    labels:                     # Options for annotators
      - Positive
      - Negative
      - Neutral

可用的标注类型

类型描述
radio单选
multiselect多选
likert量表评分
text自由文本输入
number数字输入
span文本片段高亮
slider连续范围选择
multirate多项评分

用户配置

允许所有用户

yaml
allow_all_users: true

限制特定用户

yaml
allow_all_users: false
authorized_users:
  - user1@example.com
  - user2@example.com

任务目录

task_dir 设置定义了相对路径的根目录:

yaml
task_dir: ./my-task/
data_files:
  - data/input.json    # Resolves to ./my-task/data/input.json

完整示例

以下是一个完整的情感分析任务配置:

yaml
# config.yaml
annotation_task_name: "Sentiment Analysis"
port: 8000
task_dir: ./
 
# Data
data_files:
  - data/tweets.json
 
item_properties:
  id_key: id
  text_key: text
  context_key: metadata
 
# Output
output_annotation_dir: "annotation_output/"
output_annotation_format: "json"
 
# Annotation
annotation_schemes:
  - annotation_type: radio
    name: sentiment
    description: "What is the sentiment expressed in this tweet?"
    labels:
      - name: Positive
        key_value: "1"
      - name: Negative
        key_value: "2"
      - name: Neutral
        key_value: "3"
    sequential_key_binding: true
 
# Users
allow_all_users: true
 
# Assignment
instances_per_annotator: 100
annotation_per_instance: 2

下一步