Training Phase
Train and qualify annotators with practice questions before the main task.
Training Phase
Potato 2.0 includes an optional training phase that helps qualify annotators before they begin the main annotation task. Annotators answer practice questions with known correct answers and receive feedback on their performance.
Use Cases
- Ensure annotators understand the task
- Filter out low-quality annotators
- Provide guided practice before real annotations
- Collect baseline quality metrics
- Teach annotation guidelines through examples
How It Works
- Annotators complete a set of training questions
- They receive immediate feedback on each answer
- Progress is tracked against passing criteria
- Only annotators who pass can proceed to the main task
Configuration
Basic Setup
phases:
training:
enabled: true
data_file: "data/training_data.json"
schema_name: sentiment # Which annotation scheme to train
# Passing criteria
passing_criteria:
min_correct: 8 # Must get at least 8 correct
total_questions: 10Full Configuration
phases:
training:
enabled: true
data_file: "data/training_data.json"
schema_name: sentiment
passing_criteria:
# Different criteria options (choose one or combine)
min_correct: 8
require_all_correct: false
max_mistakes: 3
max_mistakes_per_question: 2
# Allow retries
retries:
enabled: true
max_retries: 3
# Show explanations for incorrect answers
show_explanations: true
# Randomize question order
randomize: truePassing Criteria
You can set various criteria for passing the training phase:
Minimum Correct
passing_criteria:
min_correct: 8
total_questions: 10Annotator must answer at least 8 out of 10 questions correctly.
Require All Correct
passing_criteria:
require_all_correct: trueAnnotator must answer every question correctly to pass.
Maximum Mistakes
passing_criteria:
max_mistakes: 3Annotator is disqualified after 3 total mistakes.
Maximum Mistakes Per Question
passing_criteria:
max_mistakes_per_question: 2Annotator is disqualified after 2 mistakes on any single question.
Combined Criteria
passing_criteria:
min_correct: 8
max_mistakes_per_question: 3Must get 8 correct AND not fail any single question more than 3 times.
Training Data Format
Training data must include correct answers and optional explanations:
[
{
"id": "train_1",
"text": "I absolutely love this product! Best purchase ever!",
"correct_answer": {
"sentiment": "Positive"
},
"explanation": "This text expresses strong positive sentiment with words like 'love' and 'best'."
},
{
"id": "train_2",
"text": "This is the worst service I've ever experienced.",
"correct_answer": {
"sentiment": "Negative"
},
"explanation": "The words 'worst' and the overall complaint indicate negative sentiment."
},
{
"id": "train_3",
"text": "The package arrived on time.",
"correct_answer": {
"sentiment": "Neutral"
},
"explanation": "This is a factual statement without emotional indicators."
}
]Multiple Schema Training
For tasks with multiple annotation schemes:
{
"id": "train_1",
"text": "Apple announced new iPhone features yesterday.",
"correct_answer": {
"sentiment": "Neutral",
"topic": "Technology"
},
"explanation": {
"sentiment": "This is a factual news statement.",
"topic": "The text discusses Apple and iPhone, which are tech topics."
}
}User Experience
Training Flow
- User sees "Training Phase" indicator
- Question is displayed with annotation form
- User submits their answer
- Feedback is shown immediately:
- Correct: Green checkmark, proceed to next
- Incorrect: Red X, explanation shown, retry option
Feedback Display
When an annotator answers incorrectly:
- The correct answer is highlighted
- The provided explanation is shown
- Retry button appears (if retries enabled)
- Progress toward passing criteria is displayed
Admin Monitoring
Track training performance in the admin dashboard:
- Completion rates
- Average correct answers
- Pass/fail rates
- Time spent on training
- Per-question accuracy
Access via /admin API endpoints:
GET /api/admin/training/stats
GET /api/admin/training/user/{user_id}
Example: Sentiment Analysis Training
task_name: "Sentiment Analysis"
task_dir: "."
port: 8000
# Main annotation data
data_files:
- "data/reviews.json"
item_properties:
id_key: id
text_key: text
annotation_schemes:
- annotation_type: radio
name: sentiment
description: "What is the sentiment of this review?"
labels:
- Positive
- Negative
- Neutral
# Training phase configuration
phases:
training:
enabled: true
data_file: "data/training_questions.json"
schema_name: sentiment
passing_criteria:
min_correct: 8
total_questions: 10
max_mistakes_per_question: 2
retries:
enabled: true
max_retries: 3
show_explanations: true
randomize: true
output_annotation_dir: "output/"
output_annotation_format: "json"
allow_all_users: trueExample: NER Training
annotation_schemes:
- annotation_type: span
name: entities
description: "Highlight named entities"
labels:
- Person
- Organization
- Location
- Date
phases:
training:
enabled: true
data_file: "data/ner_training.json"
schema_name: entities
passing_criteria:
min_correct: 7
total_questions: 10
show_explanations: trueTraining data for span annotation:
{
"id": "train_1",
"text": "Tim Cook announced that Apple will open a new store in New York on March 15.",
"correct_answer": {
"entities": [
{"start": 0, "end": 8, "label": "Person"},
{"start": 24, "end": 29, "label": "Organization"},
{"start": 54, "end": 62, "label": "Location"},
{"start": 66, "end": 74, "label": "Date"}
]
},
"explanation": "Tim Cook is a Person, Apple is an Organization, New York is a Location, and March 15 is a Date."
}Best Practices
1. Start Simple
Begin with straightforward examples before introducing edge cases:
[
{"text": "I love this!", "correct_answer": {"sentiment": "Positive"}},
{"text": "I hate this!", "correct_answer": {"sentiment": "Negative"}},
{"text": "It arrived yesterday.", "correct_answer": {"sentiment": "Neutral"}}
]2. Cover All Labels
Ensure training includes examples of every possible label:
[
{"correct_answer": {"sentiment": "Positive"}},
{"correct_answer": {"sentiment": "Negative"}},
{"correct_answer": {"sentiment": "Neutral"}}
]3. Write Clear Explanations
Explanations should teach the annotation guidelines:
{
"explanation": "While this text mentions a problem, the overall tone is constructive and the reviewer expresses satisfaction with the resolution. This makes it Positive rather than Negative."
}4. Set Reasonable Criteria
Don't require perfection unnecessarily:
# Too strict - may lose good annotators
passing_criteria:
require_all_correct: true
# Better - allows for learning
passing_criteria:
min_correct: 8
total_questions: 105. Include Edge Cases
Add tricky examples to prepare annotators:
{
"text": "Not bad at all, I guess it could be worse.",
"correct_answer": {"sentiment": "Neutral"},
"explanation": "Despite negative words like 'not bad' and 'worse', this is actually a lukewarm endorsement - neutral rather than positive or negative."
}Integration with Workflows
Training integrates with multi-phase workflows:
phases:
consent:
enabled: true
data_file: "data/consent.json"
prestudy:
enabled: true
data_file: "data/demographics.json"
instructions:
enabled: true
content: "data/instructions.html"
training:
enabled: true
data_file: "data/training.json"
schema_name: sentiment
passing_criteria:
min_correct: 8
annotation:
# Main task - always enabled
enabled: true
poststudy:
enabled: true
data_file: "data/feedback.json"Performance Considerations
- Training data is loaded at startup
- Progress is stored in memory per session
- Minimal performance impact on main annotation
- Consider separating complex training into multiple phases