Skip to content
Release5 min read

Potato 2.4.0: Web Agent Annotation, Live Evaluation, and HuggingFace Integration

Potato 2.4.0 ships web agent trace review, real-time live agent evaluation, an LLM chat sidebar, HuggingFace Hub export, webhooks, SSO/OAuth, and five active learning strategies.

Potato Team·
Diese Seite ist in Ihrer Sprache noch nicht verfügbar. Englische Version wird angezeigt.

We're releasing Potato 2.4.0, the biggest update since the agentic annotation launch in 2.3. This release makes Potato the most complete platform for evaluating AI agents and ships a set of long-requested enterprise and integration features.

Web Agent Annotation

Evaluating web-browsing agents is hard. You need to see exactly what the agent saw, where it clicked, how it scrolled, and whether each step made sense. Potato 2.4 introduces a dedicated Web Agent Trace Viewer built for exactly this.

Review Mode gives annotators a filmstrip navigation view through pre-recorded screenshots. SVG overlays mark click targets, bounding boxes, mouse paths, and scroll positions — so evaluators see what the agent saw, with annotation controls inline.

Creation Mode flips the interface: annotators browse a live website inside an iframe, and Potato automatically records every interaction as an annotation-ready trace. Import existing traces from WebArena, Mind2Web, and Anthropic Computer Use formats, or create new ones on the fly.

yaml
display:
  type: web_agent_trace
  mode: review          # or "creation"
  show_overlays: true
  keyboard_shortcuts: true

Live Agent Evaluation

Sometimes you need to evaluate agents while they run, not after the fact. The new Live Agent Evaluation system lets annotators watch AI agents execute tasks in real time and annotate their behavior mid-execution.

Potato manages parallel agent execution through the Agent Runner Manager, captures traces as they arrive via a webhook receiver, and presents annotators with a real-time evaluation interface. Step-level inter-annotator agreement is tracked automatically.

LLM Chat Sidebar

Difficult annotation decisions benefit from a second opinion. The new LLM Chat Sidebar gives annotators an AI assistant panel they can consult mid-task without leaving the interface.

The sidebar supports multi-turn conversations with full task context injected automatically. It works with OpenAI, Anthropic, and Ollama endpoints, and every conversation is logged as behavioral data — useful for studying how annotators use AI assistance.

yaml
llm_sidebar:
  enabled: true
  provider: anthropic
  model: claude-3-5-sonnet-20241022
  system_prompt: "You are a helpful annotation assistant for this {task_name} task."
  collapsible: true

HuggingFace Ecosystem Integration

Potato now has deep HuggingFace integration:

  • Push to Hub: Export annotations directly to HuggingFace Hub datasets with auto-generated DatasetCards
  • Load as HF Dataset: Access annotations as datasets.Dataset objects with zero round-trips
  • One-click Spaces deployment: Deploy your Potato instance to HuggingFace Spaces
  • LangChain callback: Automatic trace ingestion when running LangChain agents
bash
pip install potato-annotation[huggingface]
python
from potato import PotatoDataset
 
ds = PotatoDataset.from_output("annotations/")
ds.push_to_hub("my-org/my-annotation-dataset")

Webhook System

Potato 2.4 ships a full webhook system for event-driven integrations. Five event types, signed with HMAC-SHA256 per the Standard Webhooks spec:

EventTriggers when
annotation.createdAn annotator submits a label
item.fully_annotatedAn item reaches its required overlap count
task.completedAll items in a task are annotated
user.phase_completedA user finishes a phase (Solo Mode)
quality.attention_check_failedAn annotator fails an attention check

Webhooks are delivered non-blocking with configurable retry, and managed via the admin API.

yaml
webhooks:
  - url: https://your-system.example.com/potato-events
    secret: your-signing-secret
    events: [annotation.created, item.fully_annotated]

Advanced Active Learning: 5 Strategies + LLM Cold-Start

The active learning system now ships five query strategies:

  1. Uncertainty sampling — Select instances the model is least confident about
  2. Diversity-based selection — Maximize coverage of the input space
  3. BADGE — Batch Active Learning by Diverse Gradient Embeddings
  4. BALD — Bayesian Active Learning by Disagreement
  5. Hybrid ensemble — Combine strategies for robust selection

New in 2.4: LLM cold-start for intelligent instance selection before any labels exist. Use a language model to identify challenging or representative instances to seed the annotation process. Also new: CoverICL for selecting diverse in-context learning examples.

Password Management and SSO/OAuth

Two long-requested authentication features:

Password Management: PBKDF2-SHA256 hashing with per-user salts, admin CLI and API password reset, and a self-service token-based reset flow backed by SQLite or PostgreSQL.

SSO/OAuth: Single sign-on via Google, GitHub, or any generic OIDC provider through Authlib.

bash
pip install potato-annotation[auth]

Updated Counts

Capability2.32.4
Annotation types2021
Display types1517+
AI endpoints711
Example projects1540+
Active learning strategies15
Webhook event types05
Agent example projects014

Install

bash
pip install potato-annotation           # core
pip install potato-annotation[ai]       # OpenAI, Ollama
pip install potato-annotation[huggingface]  # HF Hub + Spaces
pip install potato-annotation[langchain]    # LangChain callback
pip install potato-annotation[auth]         # SSO/OAuth
pip install potato-annotation[all]          # everything

Try It

The fastest way to see 2.4 in action is the live demo on HuggingFace Spaces — no installation needed. It showcases agent trace evaluation with radio buttons, likert scales, span annotation, and free-text notes:

Try the live demo →

Or run an example locally:

bash
git clone https://github.com/davidjurgens/potato.git
cd potato
pip install -e .
python potato/flask_server.py start examples/agent-traces/complex-annotation/config.yaml -p 8000

Full release notes and changelog are in the GitHub repository.