Cette page n'est pas encore disponible dans votre langue. La version anglaise est affichée.
Model Arena
Send one prompt to N models side by side, compare their responses, and pick the best to build a win-rate leaderboard. Provider-agnostic — compare OpenAI, Anthropic, Ollama, vLLM, and Gemini in one view, free and self-hosted.
The Model Arena sends one prompt to several models side by side, lets you compare the responses and pick the best, and builds a win-rate leaderboard. It is provider-agnostic — every model is built through Potato's endpoint factory, so you can compare OpenAI, Anthropic, Ollama, vLLM, Gemini, and OpenRouter in the same view, not just one vendor.
Enabling
yaml
arena:
enabled: true
models:
- {label: "GPT-4o", endpoint_type: openai, model: gpt-4o}
- {label: "Claude", endpoint_type: anthropic, model: claude-sonnet-4-6}
- {label: "Llama", endpoint_type: ollama, model: llama3.2, base_url: http://localhost:11434}Each entry maps to an endpoint config (endpoint_type, model, base_url, temperature, optional ai_config for keys/params). When enabled, the admin dashboard shows an Arena link.
How it works
- Enter a prompt — it's sent to every model concurrently. One model failing (bad key, provider down) never blocks the others; its card shows the error.
- Responses render side by side, each with per-model latency.
- Click Pick as best — this records a preference and updates the leaderboard (wins / comparisons / win-rate per model).
API
| Method | Path | Purpose |
|---|---|---|
| POST | /admin/arena/api/run | {prompt} → per-model responses |
| POST | /admin/arena/api/preference | {prompt, winner, ranking?} → record a pick |
| GET | /admin/arena/api/leaderboard | win-rate per model |
bash
curl -X POST localhost:8000/admin/arena/api/run -H "X-API-Key: <key>" \
-H "Content-Type: application/json" -d '{"prompt": "Explain RLHF in one sentence."}'Related
- Full reference on Read the Docs — full config and API, version-matched
- Datasets & Experiments — for offline, dataset-scale comparison
- Pairwise Comparison — annotate A/B preferences in the main flow