Skip to content
Showcase/MMLU Knowledge Evaluation
beginnerevaluation

MMLU Knowledge Evaluation

Multiple-choice knowledge evaluation across diverse academic subjects, based on the Massive Multitask Language Understanding benchmark (Hendrycks et al., ICLR 2021). Annotators select the correct answer from four options and provide an explanation.

Submit

Configuration Fileconfig.yaml

# MMLU Knowledge Evaluation
# Based on Hendrycks et al., ICLR 2021
# Paper: https://arxiv.org/abs/2009.03300
# Dataset: https://huggingface.co/datasets/cais/mmlu
#
# Multiple-choice knowledge evaluation based on the Massive Multitask
# Language Understanding benchmark. Each question covers one of 57
# academic subjects spanning STEM, humanities, social sciences, and more.
# Annotators select the correct answer and provide an explanation.
#
# Answer Options:
# - A, B, C, D: Four possible answers; exactly one is correct
#
# Annotation Guidelines:
# 1. Read the question carefully
# 2. Consider all four answer options before selecting
# 3. Choose the single best answer
# 4. Provide a brief explanation of your reasoning

annotation_task_name: "MMLU Knowledge Evaluation"
task_dir: "."

data_files:
  - sample-data.json

item_properties:
  id_key: "id"
  text_key: "text"

output_annotation_dir: "annotation_output/"
output_annotation_format: "json"

port: 8000
server_name: localhost

annotation_schemes:
  # Step 1: Select the correct answer
  - annotation_type: radio
    name: answer
    description: "Select the correct answer to this question."
    labels:
      - "A"
      - "B"
      - "C"
      - "D"
    keyboard_shortcuts:
      "A": "1"
      "B": "2"
      "C": "3"
      "D": "4"
    tooltips:
      "A": "Select option A"
      "B": "Select option B"
      "C": "Select option C"
      "D": "Select option D"

  # Step 2: Explanation
  - annotation_type: text
    name: explanation
    description: "Briefly explain why you chose this answer."
    textarea: true
    required: false
    placeholder: "Explain your reasoning..."

annotation_instructions: |
  You will answer multiple-choice knowledge questions from the MMLU benchmark.

  For each item:
  1. Read the question and note the subject area.
  2. Read all four answer options (A, B, C, D) carefully.
  3. Select the single correct answer.
  4. Optionally, provide a brief explanation of your reasoning.

  Tips:
  - Questions span many subjects; use your best knowledge.
  - Eliminate clearly wrong options first to narrow your choice.
  - If unsure, make your best guess rather than skipping.

html_layout: |
  <div style="padding: 15px; max-width: 800px; margin: auto;">
    <div style="background: #e8eaf6; padding: 8px 15px; border-radius: 8px; margin-bottom: 16px;">
      <strong>Subject:</strong> {{subject}}
    </div>
    <div style="background: #f0f9ff; border: 1px solid #bae6fd; border-radius: 8px; padding: 16px; margin-bottom: 16px;">
      <strong style="color: #0369a1;">Question:</strong>
      <p style="font-size: 16px; line-height: 1.7; margin: 8px 0 0 0;">{{text}}</p>
    </div>
    <div style="display: grid; grid-template-columns: 1fr 1fr; gap: 10px;">
      <div style="background: #f8fafc; border: 1px solid #e2e8f0; border-radius: 8px; padding: 12px;">
        <strong style="color: #475569;">A:</strong> {{option_a}}
      </div>
      <div style="background: #f8fafc; border: 1px solid #e2e8f0; border-radius: 8px; padding: 12px;">
        <strong style="color: #475569;">B:</strong> {{option_b}}
      </div>
      <div style="background: #f8fafc; border: 1px solid #e2e8f0; border-radius: 8px; padding: 12px;">
        <strong style="color: #475569;">C:</strong> {{option_c}}
      </div>
      <div style="background: #f8fafc; border: 1px solid #e2e8f0; border-radius: 8px; padding: 12px;">
        <strong style="color: #475569;">D:</strong> {{option_d}}
      </div>
    </div>
  </div>

allow_all_users: true
instances_per_annotator: 50
annotation_per_instance: 2
allow_skip: true
skip_reason_required: false

Sample Datasample-data.json

[
  {
    "id": "mmlu_001",
    "text": "What is the primary function of mitochondria in eukaryotic cells?",
    "option_a": "Protein synthesis",
    "option_b": "ATP production through cellular respiration",
    "option_c": "DNA replication",
    "option_d": "Lipid storage",
    "subject": "Biology"
  },
  {
    "id": "mmlu_002",
    "text": "In economics, what does GDP stand for?",
    "option_a": "General Domestic Product",
    "option_b": "Gross Domestic Product",
    "option_c": "Global Development Program",
    "option_d": "Gross Development Percentage",
    "subject": "Economics"
  }
]

// ... and 8 more items

Get This Design

View on GitHub

Clone or download from the repository

Quick start:

git clone https://github.com/davidjurgens/potato-showcase.git
cd potato-showcase/evaluation/mmlu-knowledge-eval
potato start config.yaml

Details

Annotation Types

radiotext

Domain

NLPAI Evaluation

Use Cases

Knowledge EvaluationLLM BenchmarkingQuestion Answering

Tags

mmlumultiple-choiceknowledgebenchmarkiclr2021

Found an issue or want to improve this design?

Open an Issue