Skip to content

Configuration Basics

Learn Potato's YAML configuration format — task settings, data file paths, annotation schemes, output formats, and user management essentials.

Configuration Basics

Potato uses YAML configuration files to define annotation tasks. This guide covers the essential configuration options.

Configuration File Structure

A basic Potato configuration has these main sections:

yaml
# Task settings
annotation_task_name: "My Annotation Task"
port: 8000
 
# Data configuration
data_files:
  - data.json
 
item_properties:
  id_key: id
  text_key: text
 
# Output settings
output_annotation_dir: "annotation_output/"
export_annotation_format: "json"
 
# Annotation schemes
annotation_schemes:
  - annotation_type: radio
    name: my_annotation
    labels:
      - Label 1
      - Label 2
 
# User settings
user_config:
  allow_all_users: true

Essential Settings

Task and Server Configuration

yaml
annotation_task_name: "My Task"  # Display name for your task
port: 8000                       # Port to run the server on

Data Configuration

yaml
data_files:
  - data.json           # Path to your data file(s)
  - more_data.json      # You can specify multiple files
 
item_properties:
  id_key: id            # Field containing unique ID
  text_key: text        # Field containing text to annotate

Supported data formats:

  • JSON (.json)
  • JSON Lines (.jsonl)
  • CSV (.csv)
  • TSV (.tsv)

Output Configuration

yaml
output_annotation_dir: "annotation_output/"   # Directory for annotation files
export_annotation_format: "json"              # Format: json, jsonl, csv, tsv

Annotation Schemes

Define one or more annotation schemes:

yaml
annotation_schemes:
  - annotation_type: radio      # Type of annotation
    name: sentiment             # Internal name
    description: "Select the sentiment"  # Instructions
    labels:                     # Options for annotators
      - Positive
      - Negative
      - Neutral

Available Annotation Types

TypeDescription
radioSingle choice selection
multiselectMultiple choice selection
likertRating on a scale
textFree text input
numberNumeric input
spanText span highlighting
sliderContinuous range selection
multirateRate multiple items
selectDropdown single selection
pairwisePairwise comparison
best_worstBest-worst scaling
soft_labelSoft label distribution
confidence_annotationAnnotation with confidence
constant_sumConstant sum allocation
range_sliderRange slider selection
semantic_differentialSemantic differential scale
hierarchical_multiselectHierarchical multi-level selection
card_sortCard sorting
rubric_evalRubric-based evaluation
extractive_qaExtractive question answering
error_spanError span highlighting
triageTriage classification
coreferenceCoreference annotation
span_linkSpan linking
entity_linkingEntity linking

User Configuration

Allow all users

yaml
user_config:
  allow_all_users: true

Restrict to specific users

yaml
user_config:
  allow_all_users: false
  authorized_users:
    - user1@example.com
    - user2@example.com

Task Directory

The task_dir setting defines the root directory for relative paths:

yaml
task_dir: ./my-task/
data_files:
  - data/input.json    # Resolves to ./my-task/data/input.json

Full Example

Here's a complete configuration for a sentiment analysis task:

yaml
# config.yaml
annotation_task_name: "Sentiment Analysis"
port: 8000
task_dir: ./
 
# Data
data_files:
  - data/tweets.json
 
item_properties:
  id_key: id
  text_key: text
  context_key: metadata
 
# Output
output_annotation_dir: "annotation_output/"
export_annotation_format: "json"
 
# Annotation
annotation_schemes:
  - annotation_type: radio
    name: sentiment
    description: "What is the sentiment expressed in this tweet?"
    labels:
      - name: Positive
        key_value: "1"
      - name: Negative
        key_value: "2"
      - name: Neutral
        key_value: "3"
    sequential_key_binding: true
 
# Users
user_config:
  allow_all_users: true
 
# Assignment
instances_per_annotator: 100
annotation_per_instance: 2

Next Steps