Skip to content
Release5 min read

Potato 2.4.0: Web Agent Annotation, Live Evaluation, and HuggingFace Integration

Potato 2.4.0 ships web agent trace review, real-time live agent evaluation, an LLM chat sidebar, HuggingFace Hub export, webhooks, SSO/OAuth, and five active learning strategies.

Potato Team
यह पृष्ठ अभी आपकी भाषा में उपलब्ध नहीं है। अंग्रेज़ी संस्करण दिखाया जा रहा है।

Note: The feature counts in this post reflect the state at the v2.4.0 release. Potato now supports 30+ annotation types. See the annotation types documentation for the full list.

Potato 2.4.0 is out. It's our biggest update since agentic annotation landed in 2.3, and it adds the agent-evaluation features people kept asking for, plus a batch of enterprise and integration work.

Web Agent Annotation

Evaluating web-browsing agents is hard. You need to see what the agent saw, where it clicked, how it scrolled, and whether each step made sense. Potato 2.4 adds a Web Agent Trace Viewer for this.

Review Mode gives annotators a filmstrip view through pre-recorded screenshots. SVG overlays mark click targets, bounding boxes, mouse paths, and scroll positions, so evaluators see what the agent saw, with annotation controls inline.

Creation Mode flips the interface around. Annotators browse a live website inside an iframe, and Potato records every interaction as an annotation-ready trace. You can import existing traces from WebArena, Mind2Web, and Anthropic Computer Use formats, or record new ones as you go.

yaml
display:
  type: web_agent_trace
  mode: review          # or "creation"
  show_overlays: true
  keyboard_shortcuts: true

Live Agent Evaluation

Sometimes you need to evaluate agents while they run, not after the fact. The new Live Agent Evaluation system lets annotators watch AI agents execute tasks in real time and annotate their behavior mid-execution.

Potato runs agents in parallel through the Agent Runner Manager, captures traces as they arrive via a webhook receiver, and shows annotators a real-time evaluation interface. It tracks step-level inter-annotator agreement automatically.

LLM Chat Sidebar

Hard annotation calls benefit from a second opinion. The new LLM Chat Sidebar gives annotators an AI assistant panel they can consult mid-task without leaving the interface.

The sidebar handles multi-turn conversations and injects the full task context automatically. It works with OpenAI, Anthropic, and Ollama endpoints, and it logs every conversation as behavioral data, which is handy if you want to study how annotators lean on AI assistance.

yaml
llm_sidebar:
  enabled: true
  provider: anthropic
  model: claude-3-5-sonnet-20241022
  system_prompt: "You are a helpful annotation assistant for this {task_name} task."
  collapsible: true

HuggingFace Ecosystem Integration

Potato now connects to HuggingFace in a few ways. You can push annotations straight to Hub datasets with auto-generated DatasetCards, load them back as datasets.Dataset objects without a round trip, deploy a Potato instance to HuggingFace Spaces, and ingest traces automatically when you run LangChain agents through the LangChain callback.

bash
pip install potato-annotation[huggingface]
python
from potato import PotatoDataset
 
ds = PotatoDataset.from_output("annotations/")
ds.push_to_hub("my-org/my-annotation-dataset")

Webhook System

Potato 2.4 ships a full webhook system for event-driven integrations. Five event types, signed with HMAC-SHA256 per the Standard Webhooks spec:

EventTriggers when
annotation.createdAn annotator submits a label
item.fully_annotatedAn item reaches its required overlap count
task.completedAll items in a task are annotated
user.phase_completedA user finishes a phase (Solo Mode)
quality.attention_check_failedAn annotator fails an attention check

Webhooks are delivered non-blocking with configurable retry, and managed via the admin API.

yaml
webhooks:
  - url: https://your-system.example.com/potato-events
    secret: your-signing-secret
    events: [annotation.created, item.fully_annotated]

Advanced Active Learning: 5 Strategies + LLM Cold-Start

The active learning system now ships five query strategies:

  1. Uncertainty sampling: Select instances the model is least confident about
  2. Diversity-based selection: Maximize coverage of the input space
  3. BADGE: Batch Active Learning by Diverse Gradient Embeddings
  4. BALD: Bayesian Active Learning by Disagreement
  5. Hybrid ensemble: Combine strategies for robust selection

There's also LLM cold-start, which picks instances before any labels exist. You point a language model at your pool and let it surface the challenging or representative items to seed annotation. CoverICL is new too, for picking diverse in-context learning examples.

Password Management and SSO/OAuth

Two authentication features people kept requesting:

Password management uses PBKDF2-SHA256 hashing with per-user salts, supports admin CLI and API password resets, and includes a self-service token-based reset flow backed by SQLite or PostgreSQL.

SSO/OAuth handles single sign-on through Google, GitHub, or any generic OIDC provider via Authlib.

bash
pip install potato-annotation[auth]

Updated Counts

Capability2.32.4
Annotation types2021
Display types1517+
AI endpoints711
Example projects1540+
Active learning strategies15
Webhook event types05
Agent example projects014

Install

bash
pip install potato-annotation           # core
pip install potato-annotation[ai]       # OpenAI, Ollama
pip install potato-annotation[huggingface]  # HF Hub + Spaces
pip install potato-annotation[langchain]    # LangChain callback
pip install potato-annotation[auth]         # SSO/OAuth
pip install potato-annotation[all]          # everything

Try It

The fastest way to see 2.4 in action is the live demo on HuggingFace Spaces, with no installation needed. It runs an agent trace evaluation task with radio buttons, likert scales, span annotation, and free-text notes:

Try the live demo →

Or run an example locally:

bash
git clone https://github.com/davidjurgens/potato.git
cd potato
pip install -e .
python potato/flask_server.py start examples/agent-traces/complex-annotation/config.yaml -p 8000

For the complete changelog, see the v2.4.0 release notes, and the rest of the docs in the GitHub repository.