Skip to content
Blog/Announcements
Announcements6 min read

Potato 2.1: Instance Display, Visual AI, and Span Linking

Potato 2.1.0 brings the instance display system, visual AI support for image and video annotation, span linking, multi-field spans, and layout customization.

By Potato Team·

Potato 2.1: Instance Display, Visual AI, and Span Linking

We're excited to announce Potato 2.1.0, a feature-packed release that brings five major capabilities to the annotation platform. This update focuses on multi-modal content display, AI-powered visual annotation, and richer relationship annotation.

Instance Display System

The headline feature of v2.1 is the new instance_display configuration block. Previously, displaying an image alongside radio buttons required awkward workarounds like creating an image_annotation schema with min_annotations: 0. Now you can explicitly separate what content to show from what annotations to collect.

yaml
instance_display:
  layout:
    direction: horizontal
    gap: 24px
  fields:
    - key: image_url
      type: image
      label: "Image to Classify"
      display_options:
        max_width: 600
        zoomable: true
    - key: description
      type: text
      label: "Context"
 
annotation_schemes:
  - annotation_type: radio
    name: category
    labels: [nature, urban, people, objects]

Instance display supports 11 content types: text, html, image, video, audio, dialogue, pairwise, code, spreadsheet, document, and pdf. You can combine multiple display fields with any annotation scheme, arrange them horizontally or vertically, and enable span annotation on text fields with span_target: true.

A standout feature is per-turn dialogue ratings — you can add inline Likert-scale rating widgets to individual conversation turns, allowing annotators to rate specific speakers without leaving the conversation view.

Read the full Instance Display documentation →

Multi-Field Span Annotation

Span annotation now supports a target_field option, enabling annotation across multiple text fields in the same data instance. This is essential for tasks like summarization evaluation where you need to annotate entities in both a source document and its summary.

yaml
annotation_schemes:
  - annotation_type: span
    name: source_entities
    target_field: "source_text"
    labels: [PERSON, ORGANIZATION, LOCATION]
 
  - annotation_type: span
    name: summary_entities
    target_field: "summary"
    labels: [PERSON, ORGANIZATION, LOCATION]

Output annotations are keyed by field name, making it clear which text field each span belongs to.

Read the updated Span Annotation documentation →

Span Linking

The new span_link annotation type enables relation extraction by creating typed relationships between annotated spans. This unlocks tasks like knowledge graph construction, coreference resolution, and discourse analysis.

yaml
annotation_schemes:
  - annotation_type: span
    name: entities
    labels:
      - name: "PERSON"
        color: "#3b82f6"
      - name: "ORGANIZATION"
        color: "#22c55e"
 
  - annotation_type: span_link
    name: relations
    span_schema: entities
    link_types:
      - name: "WORKS_FOR"
        directed: true
        allowed_source_labels: ["PERSON"]
        allowed_target_labels: ["ORGANIZATION"]
        color: "#dc2626"
      - name: "COLLABORATES_WITH"
        directed: false
        allowed_source_labels: ["PERSON"]
        allowed_target_labels: ["PERSON"]
        color: "#06b6d4"

Key capabilities include directed and undirected links, n-ary relationships (links between more than two spans), visual arc display above the text, and label constraints that restrict which entity types can participate in each relationship type.

Read the full Span Linking documentation →

Visual AI Support

Potato 2.1 introduces four new vision endpoints that bring AI-powered assistance to image and video annotation tasks. This is a major expansion of Potato's AI capabilities beyond text.

Four Vision Endpoints

YOLO — Best for fast, precise object detection using local inference. Supports YOLOv8 variants and YOLO-World for open-vocabulary detection.

yaml
ai_support:
  enabled: true
  endpoint_type: "yolo"
  ai_config:
    model: "yolov8m.pt"
    confidence_threshold: 0.5
    iou_threshold: 0.45

Ollama Vision — Run vision-language models locally with Ollama. Supports LLaVA, Llama 3.2 Vision, Qwen2.5-VL, BakLLaVA, and Moondream.

yaml
ai_support:
  enabled: true
  endpoint_type: "ollama_vision"
  ai_config:
    model: "llava:latest"
    base_url: "http://localhost:11434"

OpenAI Vision — Cloud-based vision analysis using GPT-4o with configurable detail levels.

yaml
ai_support:
  enabled: true
  endpoint_type: "openai_vision"
  ai_config:
    api_key: "${OPENAI_API_KEY}"
    model: "gpt-4o"
    detail: "auto"

Anthropic Vision — Claude with vision capabilities for image understanding and classification.

yaml
ai_support:
  enabled: true
  endpoint_type: "anthropic_vision"
  ai_config:
    api_key: "${ANTHROPIC_API_KEY}"
    model: "claude-sonnet-4-20250514"

Image AI Features

For image annotation tasks, visual AI provides four assistance modes:

  • Detection — Finds objects matching your configured labels and draws suggestion bounding boxes as dashed overlays
  • Pre-annotation (Auto) — Automatically detects all objects in the image and creates suggestions for human review
  • Classification — Classifies a selected region or the entire image with a confidence score
  • Hints — Provides guidance without revealing exact locations, useful for annotator training
yaml
annotation_schemes:
  - annotation_type: image_annotation
    name: object_detection
    tools: [bbox, polygon]
    labels:
      - name: "person"
        color: "#FF6B6B"
      - name: "car"
        color: "#4ECDC4"
    ai_support:
      enabled: true
      features:
        detection: true
        pre_annotate: true
        classification: false
        hint: true

Video AI Features

For video tasks, visual AI adds scene detection (identifying scene boundaries and suggesting temporal segments), keyframe detection (finding significant moments), and object tracking (suggesting positions across frames).

Accept/Reject Workflow

AI suggestions appear as dashed overlays that annotators can accept (double-click), reject (right-click), accept all, or clear all — keeping humans in the loop while accelerating annotation.

Separate Visual and Text Endpoints

You can configure different AI endpoints for text and visual tasks, using the best model for each content type:

yaml
ai_support:
  enabled: true
  endpoint_type: "ollama"          # Text annotations
  visual_endpoint_type: "yolo"     # Image/video annotations
  ai_config:
    model: "llama3.2"
  visual_ai_config:
    model: "yolov8m.pt"
    confidence_threshold: 0.5

Read the full Visual AI Support documentation →

Layout Customization

Potato 2.1 adds support for sophisticated custom visual layouts. Potato generates an editable layouts/task_layout.html file by default, and you can provide a fully custom HTML template with CSS grid layouts, color-coded options, and section styling.

yaml
task_layout: layouts/custom_task_layout.html

Three example layouts are included in project-hub/layout-examples/:

  • Content moderation — Warning banner, 2-column grid, color-coded severity
  • Dialogue QA — Case metadata, circular Likert ratings, grouped assessments
  • Medical review — Professional medical styling, structured reporting

Custom layouts work alongside the new instance_display system — display content renders above your custom annotation forms.

Read the full Layout Customization documentation →

Other Improvements

Label Rationales

A fourth AI capability joins hints, keyword highlighting, and label suggestions. Rationales generate balanced explanations for why each label might apply, helping annotators understand the reasoning behind different classifications.

yaml
ai_support:
  features:
    rationales:
      enabled: true

Bug Fixes and Testing

  • 50+ new tests for improved reliability
  • Responsive design improvements across annotation types
  • Enhanced project-hub organization with layout examples

Upgrading to v2.1

bash
pip install --upgrade potato-annotation

Existing v2.0 configurations work without changes — all new features are opt-in through additional config blocks like instance_display, span_link schemes, and visual AI endpoints.

Getting Started


Have questions or feedback? Join our Discord or open an issue on GitHub.