Migrating from Label Studio to Potato
Step-by-step guide for manually converting Label Studio projects, templates, and annotations to Potato format.
Migrating from Label Studio to Potato
This guide helps you manually migrate existing Label Studio projects to Potato. Migration involves converting configurations by hand and writing Python scripts to transform your data formats.
Note that there is no official migration tool - this is a manual process that requires understanding both platforms.
Why Migrate?
Potato offers advantages for certain use cases:
- Research focus: Built for academic annotation studies
- Crowdsourcing: Native Prolific and MTurk integration
- Simplicity: YAML-based configuration, no database required
- Customization: Easy to extend with Python
- Lightweight: File-based storage, easy to deploy
Migration Overview
Migration is a manual process involving these steps:
- Manually convert Label Studio XML template to Potato YAML configuration
- Write Python scripts to transform data format (JSON to JSONL)
- Write scripts to migrate existing annotations (if any)
- Test thoroughly and validate your converted data
Template Conversion
Text Classification
Label Studio XML:
<View>
<Text name="text" value="$text"/>
<Choices name="sentiment" toName="text" choice="single">
<Choice value="Positive"/>
<Choice value="Negative"/>
<Choice value="Neutral"/>
</Choices>
</View>Potato YAML:
annotation_task_name: "Sentiment Classification"
data_files:
- "data/items.jsonl"
item_properties:
id_key: id
text_key: text
annotation_schemes:
- annotation_type: radio
name: sentiment
description: "What is the sentiment?"
labels:
- name: positive
tooltip: "Positive sentiment"
- name: negative
tooltip: "Negative sentiment"
- name: neutral
tooltip: "Neutral sentiment"Multi-Label Classification
Label Studio XML:
<View>
<Text name="text" value="$text"/>
<Choices name="topics" toName="text" choice="multiple">
<Choice value="Politics"/>
<Choice value="Sports"/>
<Choice value="Technology"/>
<Choice value="Entertainment"/>
</Choices>
</View>Potato YAML:
annotation_schemes:
- annotation_type: multiselect
name: topics
description: "Select all relevant topics"
labels:
- name: politics
tooltip: "Politics content"
- name: sports
tooltip: "Sports content"
- name: technology
tooltip: "Technology content"
- name: entertainment
tooltip: "Entertainment content"Named Entity Recognition
Label Studio XML:
<View>
<Labels name="entities" toName="text">
<Label value="PERSON" background="#FFC0CB"/>
<Label value="ORG" background="#90EE90"/>
<Label value="LOCATION" background="#ADD8E6"/>
</Labels>
<Text name="text" value="$text"/>
</View>Potato YAML:
annotation_schemes:
- annotation_type: span
name: entities
description: "Select entity spans in the text"
labels:
- name: PERSON
tooltip: "Person names"
- name: ORG
tooltip: "Organization names"
- name: LOCATION
tooltip: "Location names"Note: Potato's span annotation may use different highlighting than Label Studio. Test your converted config to verify the display meets your needs.
Image Classification
Label Studio XML:
<View>
<Image name="image" value="$image_url"/>
<Choices name="category" toName="image">
<Choice value="Cat"/>
<Choice value="Dog"/>
<Choice value="Other"/>
</Choices>
</View>Potato YAML:
data_files:
- "data/images.jsonl"
item_properties:
id_key: id
text_key: image_url
annotation_schemes:
- annotation_type: radio
name: category
description: "What animal is in the image?"
labels:
- name: cat
tooltip: "Cat"
- name: dog
tooltip: "Dog"
- name: other
tooltip: "Other animal"Bounding Box Annotation
Label Studio XML:
<View>
<Image name="image" value="$image_url"/>
<RectangleLabels name="objects" toName="image">
<Label value="Car"/>
<Label value="Person"/>
<Label value="Bicycle"/>
</RectangleLabels>
</View>Potato YAML:
annotation_schemes:
- annotation_type: bounding_box
name: objects
description: "Draw boxes around objects"
labels:
- name: car
tooltip: "Car"
- name: person
tooltip: "Person"
- name: bicycle
tooltip: "Bicycle"Note: Bounding box support in Potato may differ from Label Studio. Check the documentation for current capabilities.
Rating Scales
Label Studio XML:
<View>
<Text name="text" value="$text"/>
<Rating name="quality" toName="text" maxRating="5"/>
</View>Potato YAML:
annotation_schemes:
- annotation_type: likert
name: quality
description: "Rate the quality"
size: 5
labels:
- name: "1"
tooltip: "Poor"
- name: "2"
tooltip: "Below average"
- name: "3"
tooltip: "Average"
- name: "4"
tooltip: "Good"
- name: "5"
tooltip: "Excellent"Data Format Conversion
Label Studio JSON to Potato JSONL
Label Studio format:
[
{
"id": 1,
"data": {
"text": "This is great!",
"meta_info": "source1"
}
},
{
"id": 2,
"data": {
"text": "This is terrible.",
"meta_info": "source2"
}
}
]Potato JSONL format:
{"id": "1", "text": "This is great!", "metadata": {"source": "source1"}}
{"id": "2", "text": "This is terrible.", "metadata": {"source": "source2"}}Conversion Script
import json
def convert_label_studio_to_potato(ls_file, potato_file):
"""Convert Label Studio JSON to Potato JSONL"""
with open(ls_file, 'r') as f:
ls_data = json.load(f)
with open(potato_file, 'w') as f:
for item in ls_data:
potato_item = {
"id": str(item["id"]),
"text": item["data"].get("text", ""),
}
# Convert nested data fields
if "data" in item:
for key, value in item["data"].items():
if key != "text":
if "metadata" not in potato_item:
potato_item["metadata"] = {}
potato_item["metadata"][key] = value
# Handle image URLs
if "image" in item.get("data", {}):
potato_item["image_url"] = item["data"]["image"]
f.write(json.dumps(potato_item) + "\n")
print(f"Converted {len(ls_data)} items")
# Usage
convert_label_studio_to_potato("label_studio_export.json", "data/items.jsonl")Annotation Migration
Converting Existing Annotations
def convert_annotations(ls_export, potato_output):
"""Convert Label Studio annotations to Potato format"""
with open(ls_export, 'r') as f:
ls_data = json.load(f)
with open(potato_output, 'w') as f:
for item in ls_data:
if "annotations" not in item or not item["annotations"]:
continue
for annotation in item["annotations"]:
potato_ann = {
"id": str(item["id"]),
"text": item["data"].get("text", ""),
"annotations": {},
"annotator": annotation.get("completed_by", {}).get("email", "unknown"),
"timestamp": annotation.get("created_at", "")
}
# Convert results
for result in annotation.get("result", []):
scheme_name = result.get("from_name", "unknown")
if result["type"] == "choices":
# Classification
potato_ann["annotations"][scheme_name] = result["value"]["choices"][0]
elif result["type"] == "labels":
# NER spans
if scheme_name not in potato_ann["annotations"]:
potato_ann["annotations"][scheme_name] = []
potato_ann["annotations"][scheme_name].append({
"start": result["value"]["start"],
"end": result["value"]["end"],
"label": result["value"]["labels"][0],
"text": result["value"]["text"]
})
elif result["type"] == "rating":
potato_ann["annotations"][scheme_name] = result["value"]["rating"]
f.write(json.dumps(potato_ann) + "\n")
# Usage
convert_annotations("ls_annotated_export.json", "annotations/migrated.jsonl")Span Annotation Conversion
Label Studio uses character offsets; Potato also uses character offsets, so conversion is straightforward:
def convert_spans(ls_spans):
"""Convert Label Studio span format to Potato format"""
potato_spans = []
for span in ls_spans:
potato_spans.append({
"start": span["value"]["start"],
"end": span["value"]["end"],
"label": span["value"]["labels"][0],
"text": span["value"]["text"]
})
return potato_spansFeature Mapping
| Label Studio | Potato |
|---|---|
| Choices (single) | radio |
| Choices (multiple) | multiselect |
| Labels | span |
| Rating | likert |
| TextArea | text |
| RectangleLabels | bounding_box |
| PolygonLabels | polygon |
| Taxonomy | (use nested multiselect) |
| Pairwise | comparison |
Quality Control Considerations
When migrating from Label Studio, you'll need to manually implement quality control measures. Potato provides some basic QC capabilities, but there is no comprehensive built-in QC system.
Approaches for Quality Control
Attention checks: You can manually add attention check items to your data file. These are regular items with known correct answers that you include to verify annotator attentiveness:
# Add attention check items to your data
attention_items = [
{"id": "attn_1", "text": "ATTENTION CHECK: Please select 'Positive'", "is_attention": True},
{"id": "attn_2", "text": "ATTENTION CHECK: Please select 'Negative'", "is_attention": True},
]
# Intersperse with regular items
import random
all_items = regular_items + attention_items
random.shuffle(all_items)Agreement calculation: Compute inter-annotator agreement offline using your collected annotations:
from sklearn.metrics import cohen_kappa_score
import numpy as np
def compute_agreement(annotations_file):
"""Compute agreement from collected annotations"""
# Load annotations and compute metrics externally
# Potato does not have built-in agreement calculation
passRedundant annotation: Configure multiple annotators per item by assigning items to multiple users in your data management process.
User Migration
Export Users from Label Studio
# Label Studio API call to get users
import requests
def export_ls_users(ls_url, api_key):
response = requests.get(
f"{ls_url}/api/users",
headers={"Authorization": f"Token {api_key}"}
)
return response.json()Create Potato User Config
user_config:
# Simple auth for migrated users
auth_type: password
users:
- username: user1@example.com
password_hash: "..." # Generate new passwords
- username: user2@example.com
password_hash: "..."Testing Migration
Validation Script
def validate_migration(original_ls, converted_potato):
"""Validate converted data matches original"""
with open(original_ls) as f:
ls_data = json.load(f)
with open(converted_potato) as f:
potato_data = [json.loads(line) for line in f]
# Check item count
assert len(ls_data) == len(potato_data), "Item count mismatch"
# Check IDs preserved
ls_ids = {str(item["id"]) for item in ls_data}
potato_ids = {item["id"] for item in potato_data}
assert ls_ids == potato_ids, "ID mismatch"
# Check text content
for ls_item, potato_item in zip(
sorted(ls_data, key=lambda x: x["id"]),
sorted(potato_data, key=lambda x: x["id"])
):
assert ls_item["data"]["text"] == potato_item["text"], \
f"Text mismatch for item {ls_item['id']}"
print("Validation passed!")
validate_migration("label_studio_export.json", "data/items.jsonl")Migration Checklist
- Export data from Label Studio (JSON format)
- Manually convert template XML to Potato YAML
- Write and run Python scripts to transform data format (JSON to JSONL)
- Write and run scripts to convert existing annotations (if any)
- Set up Potato project structure
- Test with sample data
- Validate converted data matches original
- Train annotators on new interface
- Run pilot annotation batch
Common Issues
Character Encoding
Label Studio and Potato both use UTF-8, but check for encoding issues in your data.
Image Paths
Convert local paths to URLs or update paths to match Potato's expected format.
Custom Components
Label Studio custom components need to be recreated as Potato custom templates.
API Differences
If you automated Label Studio, update scripts to use Potato's API.
Need help migrating? Check the full documentation or reach out on GitHub.