Skip to content
यह पृष्ठ अभी आपकी भाषा में उपलब्ध नहीं है। अंग्रेज़ी संस्करण दिखाया जा रहा है।

Model Arena

Send one prompt to N models side by side, compare their responses, and pick the best to build a win-rate leaderboard. Provider-agnostic — compare OpenAI, Anthropic, Ollama, vLLM, and Gemini in one view, free and self-hosted.

The Model Arena sends one prompt to several models side by side, lets you compare the responses and pick the best, and builds a win-rate leaderboard. It is provider-agnostic — every model is built through Potato's endpoint factory, so you can compare OpenAI, Anthropic, Ollama, vLLM, Gemini, and OpenRouter in the same view, not just one vendor.

Enabling

yaml
arena:
  enabled: true
  models:
    - {label: "GPT-4o",  endpoint_type: openai,    model: gpt-4o}
    - {label: "Claude",  endpoint_type: anthropic, model: claude-sonnet-4-6}
    - {label: "Llama",   endpoint_type: ollama,    model: llama3.2, base_url: http://localhost:11434}

Each entry maps to an endpoint config (endpoint_type, model, base_url, temperature, optional ai_config for keys/params). When enabled, the admin dashboard shows an Arena link.

How it works

  1. Enter a prompt — it's sent to every model concurrently. One model failing (bad key, provider down) never blocks the others; its card shows the error.
  2. Responses render side by side, each with per-model latency.
  3. Click Pick as best — this records a preference and updates the leaderboard (wins / comparisons / win-rate per model).

API

MethodPathPurpose
POST/admin/arena/api/run{prompt} → per-model responses
POST/admin/arena/api/preference{prompt, winner, ranking?} → record a pick
GET/admin/arena/api/leaderboardwin-rate per model
bash
curl -X POST localhost:8000/admin/arena/api/run -H "X-API-Key: <key>" \
  -H "Content-Type: application/json" -d '{"prompt": "Explain RLHF in one sentence."}'