Cette page n'est pas encore disponible dans votre langue. La version anglaise est affichée.

Model Arena

Send one prompt to N models side by side, compare their responses, and pick the best to build a win-rate leaderboard. Provider-agnostic — compare OpenAI, Anthropic, Ollama, vLLM, and Gemini in one view, free and self-hosted.

The Model Arena sends one prompt to several models side by side, lets you compare the responses and pick the best, and builds a win-rate leaderboard. It is provider-agnostic — every model is built through Potato's endpoint factory, so you can compare OpenAI, Anthropic, Ollama, vLLM, Gemini, and OpenRouter in the same view, not just one vendor.

Enabling

yaml

arena:
  enabled: true
  models:
    - {label: "GPT-4o",  endpoint_type: openai,    model: gpt-4o}
    - {label: "Claude",  endpoint_type: anthropic, model: claude-sonnet-4-6}
    - {label: "Llama",   endpoint_type: ollama,    model: llama3.2, base_url: http://localhost:11434}

Each entry maps to an endpoint config (endpoint_type, model, base_url, temperature, optional ai_config for keys/params). When enabled, the admin dashboard shows an Arena link.

How it works

Enter a prompt — it's sent to every model concurrently. One model failing (bad key, provider down) never blocks the others; its card shows the error.
Responses render side by side, each with per-model latency.
Click Pick as best — this records a preference and updates the leaderboard (wins / comparisons / win-rate per model).

API

Method	Path	Purpose
POST	`/admin/arena/api/run`	`{prompt}` → per-model responses
POST	`/admin/arena/api/preference`	`{prompt, winner, ranking?}` → record a pick
GET	`/admin/arena/api/leaderboard`	win-rate per model

bash

curl -X POST localhost:8000/admin/arena/api/run -H "X-API-Key: <key>" \
  -H "Content-Type: application/json" -d '{"prompt": "Explain RLHF in one sentence."}'

Full reference on Read the Docs — full config and API, version-matched
Datasets & Experiments — for offline, dataset-scale comparison
Pairwise Comparison — annotate A/B preferences in the main flow

Model Arena

Enabling

How it works

API

Related