User Simulator
Automated testing tool for simulating multiple annotators with configurable behaviors.
User Simulator
The User Simulator enables automated testing of Potato annotation tasks by simulating multiple users with configurable behaviors and competence levels.
Overview
The simulator is useful for:
- Quality control testing: Test attention checks, gold standards, and blocking behavior
- Dashboard testing: Generate realistic annotation data for admin dashboard
- Scalability testing: Stress test the server with many concurrent users
- AI assistance evaluation: Compare LLM accuracy against human-like behaviors
- Active learning testing: Simulate iterative annotation workflows
Quick Start
bash
# Basic random simulation with 10 users
python -m potato.simulator --server http://localhost:8000 --users 10
# With configuration file
python -m potato.simulator --config simulator-config.yaml --server http://localhost:8000
# Fast scalability test (no waiting between annotations)
python -m potato.simulator --server http://localhost:8000 --users 50 --parallel 10 --fast-modeConfiguration
YAML Configuration File
Create a YAML file with simulator settings:
yaml
simulator:
# User configuration
users:
count: 20
competence_distribution:
good: 0.5 # 50% will be "good" annotators (80-90% accuracy)
average: 0.3 # 30% "average" (60-70% accuracy)
poor: 0.2 # 20% "poor" (40-50% accuracy)
# Annotation strategy
strategy: random # random, biased, llm, pattern
# Timing configuration
timing:
annotation_time:
min: 2.0
max: 45.0
mean: 12.0
std: 6.0
distribution: normal # uniform, normal, exponential
# Execution
execution:
parallel_users: 5
delay_between_users: 0.5
max_annotations_per_user: 50
server:
url: http://localhost:8000Competence Levels
| Level | Accuracy | Description |
|---|---|---|
perfect | 100% | Always matches gold standard |
good | 80-90% | High-quality annotator |
average | 60-70% | Typical crowdworker |
poor | 40-50% | Low-quality annotator |
random | ~1/N | Random selection from labels |
adversarial | 0% | Intentionally wrong (for testing QC) |
Annotation Strategies
Random Strategy (default)
Selects labels uniformly at random:
yaml
strategy: randomBiased Strategy
Weighted selection based on label preferences:
yaml
strategy: biased
biased_config:
label_weights:
positive: 0.6
negative: 0.3
neutral: 0.1LLM Strategy
Uses an LLM to generate annotations based on text content:
yaml
strategy: llm
llm_config:
endpoint_type: openai
model: gpt-4o-mini
api_key: ${OPENAI_API_KEY}
temperature: 0.1
add_noise: true
noise_rate: 0.05For local LLMs with Ollama:
yaml
strategy: llm
llm_config:
endpoint_type: ollama
model: llama3.2
base_url: http://localhost:11434CLI Options
text
Usage: python -m potato.simulator [OPTIONS]
Required:
--server, -s URL Potato server URL
User Configuration:
--users, -u NUM Number of simulated users (default: 10)
--competence DIST Competence distribution
Strategy:
--strategy TYPE Strategy: random, biased, llm, pattern
--llm-endpoint TYPE LLM endpoint: openai, anthropic, ollama
--llm-model NAME LLM model name
Execution:
--parallel, -p NUM Max concurrent users (default: 5)
--max-annotations, -m Max annotations per user
--fast-mode Disable waiting between annotations
Output:
--output-dir, -o DIR Output directory (default: simulator_output)
Quality Control Testing
Test attention check detection:
yaml
simulator:
users:
count: 10
competence_distribution:
adversarial: 1.0 # All users will fail
quality_control:
attention_check_fail_rate: 0.5
respond_fast_rate: 0.3Output Files
After simulation, results are exported to the output directory:
summary_{timestamp}.json- Aggregate statisticsuser_results_{timestamp}.json- Per-user detailed resultsannotations_{timestamp}.csv- All annotations in flat format
Summary Example
json
{
"user_count": 20,
"total_annotations": 400,
"total_time_seconds": 125.3,
"attention_checks": {
"passed": 18,
"failed": 2,
"pass_rate": 0.9
}
}Programmatic Usage
python
from potato.simulator import SimulatorManager, SimulatorConfig
# Create configuration
config = SimulatorConfig(
user_count=10,
strategy="random",
competence_distribution={"good": 0.5, "average": 0.5}
)
# Create and run simulator
manager = SimulatorManager(config, "http://localhost:8000")
results = manager.run_parallel(max_annotations_per_user=20)
# Print summary and export
manager.print_summary()
manager.export_results()Integration with Tests
The simulator can be used in pytest fixtures:
python
import pytest
from potato.simulator import SimulatorManager, SimulatorConfig
@pytest.fixture
def simulated_annotations(flask_test_server):
config = SimulatorConfig(user_count=5, strategy="random")
manager = SimulatorManager(config, flask_test_server.base_url)
return manager.run_parallel(max_annotations_per_user=10)
def test_dashboard_shows_annotations(simulated_annotations, flask_test_server):
response = requests.get(f"{flask_test_server.base_url}/admin/api/overview")
assert response.json()["total_annotations"] > 0Troubleshooting
Login failures
- Ensure the server allows anonymous registration or has
require_password: false - Check server logs for authentication errors
No instances available
- Verify data files are loaded correctly
- Check assignment strategy settings
LLM strategy not working
- Verify API key is set
- For Ollama, ensure the server is running
- Check model name is correct
Further Reading
- Quality Control - Test attention checks and gold standards
- Admin Dashboard - View simulated data
- Debugging Guide - Troubleshoot issues
For implementation details, see the source documentation.