Skip to content
Docs/Features

Category Assignment

Route annotation items to annotators based on their demonstrated expertise.

Category-Based Assignment

Category-based assignment automatically matches annotators with annotation instances based on their demonstrated expertise. Annotators are assessed during training phases on category-specific questions and only receive instances from categories they have qualified for.

Overview

The category-based assignment system works as follows:

  1. Data Tagging: Instances in your data files are tagged with categories
  2. Training Assessment: Training questions are also tagged with categories
  3. Performance Tracking: The system tracks accuracy per category during training
  4. Qualification: Users who meet the threshold accuracy are "qualified"
  5. Assignment: Users only receive instances from their qualified categories

Configuration

Basic Setup

yaml
# Enable category-based assignment strategy
assignment_strategy: category_based
 
# Configure category key in item_properties
item_properties:
  id_key: id
  text_key: text
  category_key: category  # Field containing category
 
# Category assignment settings
category_assignment:
  enabled: true
  qualification:
    source: training      # Where qualification comes from
    threshold: 0.7        # 70% accuracy required
    min_questions: 2      # At least 2 questions per category
  fallback: uncategorized # What to do if user qualifies for nothing

Configuration Options

OptionTypeDefaultDescription
enabledbooleantrueEnable/disable category assignment
qualification.sourcestring"training"Source: "training", "prestudy", or "both"
qualification.thresholdfloat0.7Minimum accuracy (0.0-1.0) to qualify
qualification.min_questionsinteger1Minimum questions per category
fallbackstring"uncategorized"Behavior when user doesn't qualify

Fallback Options

  • uncategorized: Assign instances that have no category
  • random: Assign randomly from all remaining instances
  • none: Don't assign any instances

Data Format

Instance Data

Include the category field in your data files:

json
{"id": "econ_001", "text": "Market analysis...", "category": "economics"}
{"id": "sci_001", "text": "Research findings...", "category": "science"}
{"id": "multi_001", "text": "Interdisciplinary...", "category": ["economics", "science"]}
{"id": "general_001", "text": "General content...", "category": null}

Training Data

Training instances should include categories:

json
{
  "training_instances": [
    {
      "id": "train_econ_1",
      "text": "Question about economic concepts...",
      "category": "economics",
      "correct_answers": {"topic": "Economics"},
      "explanation": "This is an economics topic because..."
    }
  ]
}

How Qualification Works

During Training

As users answer training questions:

  1. The system records the category of each question
  2. For each category, it tracks:
    • Total questions answered
    • Number of correct answers
    • Accuracy (correct / total)

After Training Completes

When a user passes training:

  1. Accuracy is calculated for each category
  2. Categories meeting both the threshold AND minimum questions are added to "qualified categories"
  3. Qualifications persist for the session

Example

If threshold is 0.7 (70%) and min_questions is 2:

CategoryQuestionsCorrectAccuracyQualified?
Economics33100%Yes
Science2150%No (below threshold)
Sports11100%No (below min_questions)

User would only receive "Economics" instances.

Use Cases

Expert Routing

Route specialized content to qualified annotators:

  • Medical texts to annotators with medical knowledge
  • Legal documents to those who understand legal terminology
  • Technical content to those with technical expertise

Quality Control

Ensure quality by only assigning content to qualified individuals:

  • Annotators prove competence before receiving real work
  • Different quality thresholds for different content types

Workload Distribution

Distribute work based on expertise:

  • High-complexity items to expert annotators
  • General items to all annotators

Dynamic Expertise Mode

Dynamic expertise enables on-the-fly assessment during annotation without gold-labeled training data:

yaml
category_assignment:
  enabled: true
  dynamic:
    enabled: true
    agreement_method: majority_vote
    min_annotations_for_consensus: 2
    learning_rate: 0.1
    update_interval_seconds: 60
    base_probability: 0.1

How Dynamic Mode Works

  1. Initial State: All annotators start with neutral expertise (0.5) for all categories
  2. Probabilistic Assignment: Categories with higher expertise have higher assignment probability
  3. Background Processing: Periodically calculates consensus and updates expertise scores
  4. Expertise Updates: Scores increase when agreeing with consensus, decrease when disagreeing

Configuration Options

OptionTypeDefaultDescription
agreement_methodstring"majority_vote"How to calculate consensus
min_annotations_for_consensusinteger2Min annotations before calculating
learning_ratefloat0.1How quickly expertise scores change
base_probabilityfloat0.1Minimum probability for any category

API Reference

TrainingState Methods

python
# Record an answer for category tracking
training_state.record_category_answer(categories=['economics'], is_correct=True)
 
# Get score for a specific category
score = training_state.get_category_score('economics')
# Returns: {'correct': 3, 'total': 4, 'accuracy': 0.75}
 
# Get qualified categories based on threshold
qualified = training_state.get_qualified_categories(threshold=0.7, min_questions=2)

UserState Methods

python
# Add a qualified category
user_state.add_qualified_category('economics', score=0.85)
 
# Check if user is qualified for a category
is_qualified = user_state.is_qualified_for_category('economics')
 
# Get all qualified categories
categories = user_state.get_qualified_categories()

Troubleshooting

Users Not Getting Assigned Instances

  1. Check if the user has qualified categories (review training performance)
  2. Are there instances in those categories that haven't been annotated?
  3. Is fallback set appropriately?

Categories Not Being Tracked

  1. Verify category_key is set in item_properties
  2. Training instances have the category field
  3. category_assignment.enabled is true

Further Reading

For implementation details, see the source documentation.