Category Assignment
Route annotation items to annotators based on their demonstrated expertise.
Category-Based Assignment
Category-based assignment automatically matches annotators with annotation instances based on their demonstrated expertise. Annotators are assessed during training phases on category-specific questions and only receive instances from categories they have qualified for.
Overview
The category-based assignment system works as follows:
- Data Tagging: Instances in your data files are tagged with categories
- Training Assessment: Training questions are also tagged with categories
- Performance Tracking: The system tracks accuracy per category during training
- Qualification: Users who meet the threshold accuracy are "qualified"
- Assignment: Users only receive instances from their qualified categories
Configuration
Basic Setup
# Enable category-based assignment strategy
assignment_strategy: category_based
# Configure category key in item_properties
item_properties:
id_key: id
text_key: text
category_key: category # Field containing category
# Category assignment settings
category_assignment:
enabled: true
qualification:
source: training # Where qualification comes from
threshold: 0.7 # 70% accuracy required
min_questions: 2 # At least 2 questions per category
fallback: uncategorized # What to do if user qualifies for nothingConfiguration Options
| Option | Type | Default | Description |
|---|---|---|---|
enabled | boolean | true | Enable/disable category assignment |
qualification.source | string | "training" | Source: "training", "prestudy", or "both" |
qualification.threshold | float | 0.7 | Minimum accuracy (0.0-1.0) to qualify |
qualification.min_questions | integer | 1 | Minimum questions per category |
fallback | string | "uncategorized" | Behavior when user doesn't qualify |
Fallback Options
uncategorized: Assign instances that have no categoryrandom: Assign randomly from all remaining instancesnone: Don't assign any instances
Data Format
Instance Data
Include the category field in your data files:
{"id": "econ_001", "text": "Market analysis...", "category": "economics"}
{"id": "sci_001", "text": "Research findings...", "category": "science"}
{"id": "multi_001", "text": "Interdisciplinary...", "category": ["economics", "science"]}
{"id": "general_001", "text": "General content...", "category": null}Training Data
Training instances should include categories:
{
"training_instances": [
{
"id": "train_econ_1",
"text": "Question about economic concepts...",
"category": "economics",
"correct_answers": {"topic": "Economics"},
"explanation": "This is an economics topic because..."
}
]
}How Qualification Works
During Training
As users answer training questions:
- The system records the category of each question
- For each category, it tracks:
- Total questions answered
- Number of correct answers
- Accuracy (correct / total)
After Training Completes
When a user passes training:
- Accuracy is calculated for each category
- Categories meeting both the threshold AND minimum questions are added to "qualified categories"
- Qualifications persist for the session
Example
If threshold is 0.7 (70%) and min_questions is 2:
| Category | Questions | Correct | Accuracy | Qualified? |
|---|---|---|---|---|
| Economics | 3 | 3 | 100% | Yes |
| Science | 2 | 1 | 50% | No (below threshold) |
| Sports | 1 | 1 | 100% | No (below min_questions) |
User would only receive "Economics" instances.
Use Cases
Expert Routing
Route specialized content to qualified annotators:
- Medical texts to annotators with medical knowledge
- Legal documents to those who understand legal terminology
- Technical content to those with technical expertise
Quality Control
Ensure quality by only assigning content to qualified individuals:
- Annotators prove competence before receiving real work
- Different quality thresholds for different content types
Workload Distribution
Distribute work based on expertise:
- High-complexity items to expert annotators
- General items to all annotators
Dynamic Expertise Mode
Dynamic expertise enables on-the-fly assessment during annotation without gold-labeled training data:
category_assignment:
enabled: true
dynamic:
enabled: true
agreement_method: majority_vote
min_annotations_for_consensus: 2
learning_rate: 0.1
update_interval_seconds: 60
base_probability: 0.1How Dynamic Mode Works
- Initial State: All annotators start with neutral expertise (0.5) for all categories
- Probabilistic Assignment: Categories with higher expertise have higher assignment probability
- Background Processing: Periodically calculates consensus and updates expertise scores
- Expertise Updates: Scores increase when agreeing with consensus, decrease when disagreeing
Configuration Options
| Option | Type | Default | Description |
|---|---|---|---|
agreement_method | string | "majority_vote" | How to calculate consensus |
min_annotations_for_consensus | integer | 2 | Min annotations before calculating |
learning_rate | float | 0.1 | How quickly expertise scores change |
base_probability | float | 0.1 | Minimum probability for any category |
API Reference
TrainingState Methods
# Record an answer for category tracking
training_state.record_category_answer(categories=['economics'], is_correct=True)
# Get score for a specific category
score = training_state.get_category_score('economics')
# Returns: {'correct': 3, 'total': 4, 'accuracy': 0.75}
# Get qualified categories based on threshold
qualified = training_state.get_qualified_categories(threshold=0.7, min_questions=2)UserState Methods
# Add a qualified category
user_state.add_qualified_category('economics', score=0.85)
# Check if user is qualified for a category
is_qualified = user_state.is_qualified_for_category('economics')
# Get all qualified categories
categories = user_state.get_qualified_categories()Troubleshooting
Users Not Getting Assigned Instances
- Check if the user has qualified categories (review training performance)
- Are there instances in those categories that haven't been annotated?
- Is
fallbackset appropriately?
Categories Not Being Tracked
- Verify
category_keyis set initem_properties - Training instances have the
categoryfield category_assignment.enabledistrue
Further Reading
- Task Assignment - General assignment strategies
- Training Phase - Training configuration
- Quality Control - Attention checks and gold standards
For implementation details, see the source documentation.