# Model Arena

Source: https://www.potatoannotator.com/docs/agent-evaluation/model-arena

**The Model Arena sends one prompt to several models side by side, lets you compare the responses and pick the best, and builds a win-rate leaderboard.** It is **provider-agnostic** — every model is built through Potato's endpoint factory, so you can compare OpenAI, Anthropic, Ollama, vLLM, Gemini, and OpenRouter in the same view, not just one vendor.

## Enabling

```yaml
arena:
  enabled: true
  models:
    - {label: "GPT-4o",  endpoint_type: openai,    model: gpt-4o}
    - {label: "Claude",  endpoint_type: anthropic, model: claude-sonnet-4-6}
    - {label: "Llama",   endpoint_type: ollama,    model: llama3.2, base_url: http://localhost:11434}
```

Each entry maps to an endpoint config (`endpoint_type`, `model`, `base_url`, `temperature`, optional `ai_config` for keys/params). When enabled, the admin dashboard shows an **Arena** link.

## How it works

1. Enter a prompt — it's sent to every model **concurrently**. One model failing (bad key, provider down) never blocks the others; its card shows the error.
2. Responses render side by side, each with per-model latency.
3. Click **Pick as best** — this records a preference and updates the **leaderboard** (wins / comparisons / win-rate per model).

## API

| Method | Path | Purpose |
|--------|------|---------|
| POST | `/admin/arena/api/run` | `{prompt}` → per-model responses |
| POST | `/admin/arena/api/preference` | `{prompt, winner, ranking?}` → record a pick |
| GET | `/admin/arena/api/leaderboard` | win-rate per model |

```bash
curl -X POST localhost:8000/admin/arena/api/run -H "X-API-Key: <key>" \
  -H "Content-Type: application/json" -d '{"prompt": "Explain RLHF in one sentence."}'
```

## Related

- [Full reference on Read the Docs](https://potatoannotator.readthedocs.io/en/latest/agent-evaluation/model_arena/) — full config and API, version-matched
- [Datasets & Experiments](/docs/agent-evaluation/datasets-and-experiments) — for offline, dataset-scale comparison
- [Pairwise Comparison](/docs/annotation-types/pairwise-comparison) — annotate A/B preferences in the main flow
