# Migrating from Label Studio to Potato

Source: https://www.potatoannotator.com/blog/label-studio-migration

This guide walks through moving an existing Label Studio project over to Potato. Be warned up front: there is no official migration tool. You convert the config by hand and write a bit of Python to reshape your data, so you will need to know your way around both platforms.

For a side-by-side feature comparison and Potato's own migration tooling, see the [source documentation](https://github.com/davidjurgens/potato/blob/master/docs/comparison.md) and the [migration CLI docs](https://github.com/davidjurgens/potato/blob/master/docs/tools/migration_cli.md).

## Why migrate?

Potato is a better fit for some projects. It is built for academic annotation studies, ships with Prolific and MTurk integration, and configures through YAML with no database to stand up. It is easy to extend in Python, and because storage is just files, it is easy to deploy.

## Migration overview

The process is manual and runs roughly like this:

1. Manually convert Label Studio XML template to Potato YAML configuration
2. Write Python scripts to transform data format (JSON to JSONL)
3. Write scripts to migrate existing annotations (if any)
4. Test thoroughly and validate your converted data

## Template Conversion

### Text Classification

**Label Studio XML:**
```xml
<View>
  <Text name="text" value="$text"/>
  <Choices name="sentiment" toName="text" choice="single">
    <Choice value="Positive"/>
    <Choice value="Negative"/>
    <Choice value="Neutral"/>
  </Choices>
</View>
```

**Potato YAML:**
```yaml
annotation_task_name: "Sentiment Classification"

data_files:
  - "data/items.jsonl"

item_properties:
  id_key: id
  text_key: text

annotation_schemes:
  - annotation_type: radio
    name: sentiment
    description: "What is the sentiment?"
    labels:
      - name: positive
        tooltip: "Positive sentiment"
      - name: negative
        tooltip: "Negative sentiment"
      - name: neutral
        tooltip: "Neutral sentiment"
```

### Multi-Label Classification

**Label Studio XML:**
```xml
<View>
  <Text name="text" value="$text"/>
  <Choices name="topics" toName="text" choice="multiple">
    <Choice value="Politics"/>
    <Choice value="Sports"/>
    <Choice value="Technology"/>
    <Choice value="Entertainment"/>
  </Choices>
</View>
```

**Potato YAML:**
```yaml
annotation_schemes:
  - annotation_type: multiselect
    name: topics
    description: "Select all relevant topics"
    labels:
      - name: politics
        tooltip: "Politics content"
      - name: sports
        tooltip: "Sports content"
      - name: technology
        tooltip: "Technology content"
      - name: entertainment
        tooltip: "Entertainment content"
```

### Named Entity Recognition

**Label Studio XML:**
```xml
<View>
  <Labels name="entities" toName="text">
    <Label value="PERSON" background="#FFC0CB"/>
    <Label value="ORG" background="#90EE90"/>
    <Label value="LOCATION" background="#ADD8E6"/>
  </Labels>
  <Text name="text" value="$text"/>
</View>
```

**Potato YAML:**
```yaml
annotation_schemes:
  - annotation_type: span
    name: entities
    description: "Select entity spans in the text"
    labels:
      - name: PERSON
        tooltip: "Person names"
      - name: ORG
        tooltip: "Organization names"
      - name: LOCATION
        tooltip: "Location names"
```

Note: Potato's span annotation may use different highlighting than Label Studio. Test your converted config to verify the display meets your needs.

### Image Classification

**Label Studio XML:**
```xml
<View>
  <Image name="image" value="$image_url"/>
  <Choices name="category" toName="image">
    <Choice value="Cat"/>
    <Choice value="Dog"/>
    <Choice value="Other"/>
  </Choices>
</View>
```

**Potato YAML:**
```yaml
data_files:
  - "data/images.jsonl"

item_properties:
  id_key: id
  text_key: image_url

annotation_schemes:
  - annotation_type: radio
    name: category
    description: "What animal is in the image?"
    labels:
      - name: cat
        tooltip: "Cat"
      - name: dog
        tooltip: "Dog"
      - name: other
        tooltip: "Other animal"
```

### Bounding Box Annotation

**Label Studio XML:**
```xml
<View>
  <Image name="image" value="$image_url"/>
  <RectangleLabels name="objects" toName="image">
    <Label value="Car"/>
    <Label value="Person"/>
    <Label value="Bicycle"/>
  </RectangleLabels>
</View>
```

**Potato YAML:**
```yaml
annotation_schemes:
  - annotation_type: bounding_box
    name: objects
    description: "Draw boxes around objects"
    labels:
      - name: car
        tooltip: "Car"
      - name: person
        tooltip: "Person"
      - name: bicycle
        tooltip: "Bicycle"
```

Note: Bounding box support in Potato may differ from Label Studio. Check the documentation for current capabilities.

### Rating Scales

**Label Studio XML:**
```xml
<View>
  <Text name="text" value="$text"/>
  <Rating name="quality" toName="text" maxRating="5"/>
</View>
```

**Potato YAML:**
```yaml
annotation_schemes:
  - annotation_type: likert
    name: quality
    description: "Rate the quality"
    size: 5
    labels:
      - name: "1"
        tooltip: "Poor"
      - name: "2"
        tooltip: "Below average"
      - name: "3"
        tooltip: "Average"
      - name: "4"
        tooltip: "Good"
      - name: "5"
        tooltip: "Excellent"
```

## Data Format Conversion

### Label Studio JSON to Potato JSONL

**Label Studio format:**
```json
[
  {
    "id": 1,
    "data": {
      "text": "This is great!",
      "meta_info": "source1"
    }
  },
  {
    "id": 2,
    "data": {
      "text": "This is terrible.",
      "meta_info": "source2"
    }
  }
]
```

**Potato JSONL format:**
```json
{"id": "1", "text": "This is great!", "metadata": {"source": "source1"}}
{"id": "2", "text": "This is terrible.", "metadata": {"source": "source2"}}
```

### Conversion Script

```python
import json

def convert_label_studio_to_potato(ls_file, potato_file):
    """Convert Label Studio JSON to Potato JSONL"""

    with open(ls_file, 'r') as f:
        ls_data = json.load(f)

    with open(potato_file, 'w') as f:
        for item in ls_data:
            potato_item = {
                "id": str(item["id"]),
                "text": item["data"].get("text", ""),
            }

            # Convert nested data fields
            if "data" in item:
                for key, value in item["data"].items():
                    if key != "text":
                        if "metadata" not in potato_item:
                            potato_item["metadata"] = {}
                        potato_item["metadata"][key] = value

            # Handle image URLs
            if "image" in item.get("data", {}):
                potato_item["image_url"] = item["data"]["image"]

            f.write(json.dumps(potato_item) + "\n")

    print(f"Converted {len(ls_data)} items")

# Usage
convert_label_studio_to_potato("label_studio_export.json", "data/items.jsonl")
```

## Annotation Migration

### Converting Existing Annotations

```python
def convert_annotations(ls_export, potato_output):
    """Convert Label Studio annotations to Potato format"""

    with open(ls_export, 'r') as f:
        ls_data = json.load(f)

    with open(potato_output, 'w') as f:
        for item in ls_data:
            if "annotations" not in item or not item["annotations"]:
                continue

            for annotation in item["annotations"]:
                potato_ann = {
                    "id": str(item["id"]),
                    "text": item["data"].get("text", ""),
                    "annotations": {},
                    "annotator": annotation.get("completed_by", {}).get("email", "unknown"),
                    "timestamp": annotation.get("created_at", "")
                }

                # Convert results
                for result in annotation.get("result", []):
                    scheme_name = result.get("from_name", "unknown")

                    if result["type"] == "choices":
                        # Classification
                        potato_ann["annotations"][scheme_name] = result["value"]["choices"][0]

                    elif result["type"] == "labels":
                        # NER spans
                        if scheme_name not in potato_ann["annotations"]:
                            potato_ann["annotations"][scheme_name] = []

                        potato_ann["annotations"][scheme_name].append({
                            "start": result["value"]["start"],
                            "end": result["value"]["end"],
                            "label": result["value"]["labels"][0],
                            "text": result["value"]["text"]
                        })

                    elif result["type"] == "rating":
                        potato_ann["annotations"][scheme_name] = result["value"]["rating"]

                f.write(json.dumps(potato_ann) + "\n")

# Usage
convert_annotations("ls_annotated_export.json", "annotations/migrated.jsonl")
```

### Span Annotation Conversion

Label Studio uses character offsets; Potato also uses character offsets, so conversion is straightforward:

```python
def convert_spans(ls_spans):
    """Convert Label Studio span format to Potato format"""
    potato_spans = []

    for span in ls_spans:
        potato_spans.append({
            "start": span["value"]["start"],
            "end": span["value"]["end"],
            "label": span["value"]["labels"][0],
            "text": span["value"]["text"]
        })

    return potato_spans
```

## Feature Mapping

| Label Studio | Potato |
|--------------|--------|
| Choices (single) | radio |
| Choices (multiple) | multiselect |
| Labels | span |
| Rating | likert |
| TextArea | text |
| RectangleLabels | bounding_box |
| PolygonLabels | polygon |
| Taxonomy | (use nested multiselect) |
| Pairwise | comparison |

## Quality control considerations

Quality control is one area where you will have to do some work by hand. Potato has some basic QC, but not a single comprehensive system, so plan to fill the gaps yourself.

### Approaches for Quality Control

**Attention checks**: You can manually add attention check items to your data file. These are regular items with known correct answers that you include to verify annotator attentiveness:

```python
# Add attention check items to your data
attention_items = [
    {"id": "attn_1", "text": "ATTENTION CHECK: Please select 'Positive'", "is_attention": True},
    {"id": "attn_2", "text": "ATTENTION CHECK: Please select 'Negative'", "is_attention": True},
]

# Intersperse with regular items
import random
all_items = regular_items + attention_items
random.shuffle(all_items)
```

**Agreement calculation**: Compute inter-annotator agreement offline using your collected annotations:

```python
from sklearn.metrics import cohen_kappa_score
import numpy as np

def compute_agreement(annotations_file):
    """Compute agreement from collected annotations"""
    # Load annotations and compute metrics externally
    # Potato does not have built-in agreement calculation
    pass
```

**Redundant annotation**: Configure multiple annotators per item by assigning items to multiple users in your data management process.

## User Migration

### Export Users from Label Studio

```python
# Label Studio API call to get users
import requests

def export_ls_users(ls_url, api_key):
    response = requests.get(
        f"{ls_url}/api/users",
        headers={"Authorization": f"Token {api_key}"}
    )
    return response.json()
```

### Create Potato User Config

```yaml
user_config:
  # Simple auth for migrated users
  auth_type: password

  user_config:
    - username: user1@example.com
      password_hash: "..."  # Generate new passwords

    - username: user2@example.com
      password_hash: "..."
```

## Testing Migration

### Validation Script

```python
def validate_migration(original_ls, converted_potato):
    """Validate converted data matches original"""

    with open(original_ls) as f:
        ls_data = json.load(f)

    with open(converted_potato) as f:
        potato_data = [json.loads(line) for line in f]

    # Check item count
    assert len(ls_data) == len(potato_data), "Item count mismatch"

    # Check IDs preserved
    ls_ids = {str(item["id"]) for item in ls_data}
    potato_ids = {item["id"] for item in potato_data}
    assert ls_ids == potato_ids, "ID mismatch"

    # Check text content
    for ls_item, potato_item in zip(
        sorted(ls_data, key=lambda x: x["id"]),
        sorted(potato_data, key=lambda x: x["id"])
    ):
        assert ls_item["data"]["text"] == potato_item["text"], \
            f"Text mismatch for item {ls_item['id']}"

    print("Validation passed!")

validate_migration("label_studio_export.json", "data/items.jsonl")
```

## Migration Checklist

- [ ] Export data from Label Studio (JSON format)
- [ ] Manually convert template XML to Potato YAML
- [ ] Write and run Python scripts to transform data format (JSON to JSONL)
- [ ] Write and run scripts to convert existing annotations (if any)
- [ ] Set up Potato project structure
- [ ] Test with sample data
- [ ] Validate converted data matches original
- [ ] Train annotators on new interface
- [ ] Run pilot annotation batch

## Common issues

A few things tend to trip people up. Both tools use UTF-8, but it is still worth checking your data for encoding gremlins. Local image paths usually need to become URLs, or at least match the format Potato expects. Any custom components you built in Label Studio have to be rebuilt as Potato custom templates. And if you scripted against the Label Studio API, those scripts need to point at Potato's API instead.

---

*Need help migrating? Check the [full documentation](/docs/getting-started/quick-start) or reach out on [GitHub](https://github.com/davidjurgens/potato/discussions).*
