User Simulator

Simulate multiple concurrent annotators in Potato for integration testing — configure annotation strategies, speed, and agreement levels for realistic load tests.

User Simulator

The User Simulator enables automated testing of Potato annotation tasks by simulating multiple users with configurable behaviors and competence levels.

Overview

The simulator is useful for:

Quality control testing: Test attention checks, gold standards, and blocking behavior
Dashboard testing: Generate realistic annotation data for admin dashboard
Scalability testing: Stress test the server with many concurrent users
AI assistance evaluation: Compare LLM accuracy against human-like behaviors
Active learning testing: Simulate iterative annotation workflows

Quick Start

bash

# Basic random simulation with 10 users
python -m potato.simulator --server http://localhost:8000 --users 10
 
# With configuration file
python -m potato.simulator --config simulator-config.yaml --server http://localhost:8000
 
# Fast scalability test (no waiting between annotations)
python -m potato.simulator --server http://localhost:8000 --users 50 --parallel 10 --fast-mode

Configuration

YAML Configuration File

Create a YAML file with simulator settings:

yaml

simulator:
  # User configuration
  users:
    count: 20
    competence_distribution:
      good: 0.5      # 50% will be "good" annotators (80-90% accuracy)
      average: 0.3   # 30% "average" (60-70% accuracy)
      poor: 0.2      # 20% "poor" (40-50% accuracy)
 
  # Annotation strategy
  strategy: random  # random, biased, llm, pattern
 
  # Timing configuration
  timing:
    annotation_time:
      min: 2.0
      max: 45.0
      mean: 12.0
      std: 6.0
      distribution: normal  # uniform, normal, exponential
 
  # Execution
  execution:
    parallel_users: 5
    delay_between_users: 0.5
    max_annotations_per_user: 50
 
server:
  url: http://localhost:8000

Competence Levels

Level	Accuracy	Description
`perfect`	100%	Always matches gold standard
`good`	80-90%	High-quality annotator
`average`	60-70%	Typical crowdworker
`poor`	40-50%	Low-quality annotator
`random`	~1/N	Random selection from labels
`adversarial`	0%	Intentionally wrong (for testing QC)

Annotation Strategies

Random Strategy (default)

Selects labels uniformly at random:

yaml

strategy: random

Biased Strategy

Weighted selection based on label preferences:

yaml

strategy: biased
biased_config:
  label_weights:
    positive: 0.6
    negative: 0.3
    neutral: 0.1

LLM Strategy

Uses an LLM to generate annotations based on text content:

yaml

strategy: llm
llm_config:
  endpoint_type: openai
  model: gpt-4o-mini
  api_key: ${OPENAI_API_KEY}
  temperature: 0.1
  add_noise: true
  noise_rate: 0.05

For local LLMs with Ollama:

yaml

strategy: llm
llm_config:
  endpoint_type: ollama
  model: llama3.2
  base_url: http://localhost:11434

CLI Options

text

Usage: python -m potato.simulator [OPTIONS]

Required:
  --server, -s URL        Potato server URL

User Configuration:
  --users, -u NUM         Number of simulated users (default: 10)
  --competence DIST       Competence distribution

Strategy:
  --strategy TYPE         Strategy: random, biased, llm, pattern
  --llm-endpoint TYPE     LLM endpoint: openai, anthropic, ollama
  --llm-model NAME        LLM model name

Execution:
  --parallel, -p NUM      Max concurrent users (default: 5)
  --max-annotations, -m   Max annotations per user
  --fast-mode             Disable waiting between annotations

Output:
  --output-dir, -o DIR    Output directory (default: simulator_output)

Quality Control Testing

Test attention check detection:

yaml

simulator:
  users:
    count: 10
    competence_distribution:
      adversarial: 1.0  # All users will fail
  quality_control:
    attention_check_fail_rate: 0.5
    respond_fast_rate: 0.3

Output Files

After simulation, results are exported to the output directory:

summary_{timestamp}.json - Aggregate statistics
user_results_{timestamp}.json - Per-user detailed results
annotations_{timestamp}.csv - All annotations in flat format

Summary Example

json

{
  "user_count": 20,
  "total_annotations": 400,
  "total_time_seconds": 125.3,
  "attention_checks": {
    "passed": 18,
    "failed": 2,
    "pass_rate": 0.9
  }
}

Programmatic Usage

python

from potato.simulator import SimulatorManager, SimulatorConfig
 
# Create configuration
config = SimulatorConfig(
    user_count=10,
    strategy="random",
    competence_distribution={"good": 0.5, "average": 0.5}
)
 
# Create and run simulator
manager = SimulatorManager(config, "http://localhost:8000")
results = manager.run_parallel(max_annotations_per_user=20)
 
# Print summary and export
manager.print_summary()
manager.export_results()

Integration with Tests

The simulator can be used in pytest fixtures:

python

import pytest
from potato.simulator import SimulatorManager, SimulatorConfig
 
@pytest.fixture
def simulated_annotations(flask_test_server):
    config = SimulatorConfig(user_count=5, strategy="random")
    manager = SimulatorManager(config, flask_test_server.base_url)
    return manager.run_parallel(max_annotations_per_user=10)
 
def test_dashboard_shows_annotations(simulated_annotations, flask_test_server):
    response = requests.get(f"{flask_test_server.base_url}/admin/api/overview")
    assert response.json()["total_annotations"] > 0

Troubleshooting

Ensure the server allows anonymous registration or has require_password: false
Check server logs for authentication errors

No instances available

Verify data files are loaded correctly
Check assignment strategy settings

LLM strategy not working

Verify API key is set
For Ollama, ensure the server is running
Check model name is correct

User Simulator

User Simulator

Overview

Quick Start

Configuration

YAML Configuration File

Competence Levels

Annotation Strategies

Random Strategy (default)

Biased Strategy

LLM Strategy

CLI Options

Quality Control Testing

Output Files

Summary Example

Programmatic Usage

Integration with Tests

Troubleshooting

Login failures

No instances available

LLM strategy not working

Further Reading