# Diversity Ordering

Source: https://www.potatoannotator.com/docs/features/diversity-ordering

Diversity ordering uses sentence-transformer embeddings to cluster similar items together, then samples items round-robin from different clusters. Annotators see diverse content rather than similar items in sequence.

## Benefits

- **Reduce annotator fatigue** from repetitive content
- **Improve annotation quality** through varied context
- **Faster coverage** of the full topic space

## Quick Start

```yaml
assignment_strategy: diversity_clustering

diversity_ordering:
  enabled: true
  prefill_count: 100
```

## How It Works

1. **Startup**: First N items are embedded using sentence-transformers and clustered with k-means
2. **Assignment**: Items are sampled round-robin from clusters for variety
3. **Annotation**: New items are embedded asynchronously as they're annotated
4. **Re-clustering**: When a user has sampled from all clusters, the system reclusters

## Configuration

```yaml
diversity_ordering:
  enabled: true

  # Sentence-transformer model
  model_name: "all-MiniLM-L6-v2"

  # Clustering parameters
  num_clusters: 10
  items_per_cluster: 20
  auto_clusters: true           # Auto-calculate based on data size

  # Prefill on startup
  prefill_count: 100
  batch_size: 32

  # Re-clustering behavior
  recluster_threshold: 1.0      # Recluster when all clusters sampled

  # Order preservation
  preserve_visited: true

  # AI integration
  trigger_ai_prefetch: true
```

## Configuration Reference

| Option | Type | Default | Description |
|--------|------|---------|-------------|
| `enabled` | boolean | `false` | Enable diversity ordering |
| `model_name` | string | `"all-MiniLM-L6-v2"` | Sentence-transformers model |
| `num_clusters` | integer | `10` | Number of clusters (when `auto_clusters=false`) |
| `items_per_cluster` | integer | `20` | Target cluster size (when `auto_clusters=true`) |
| `auto_clusters` | boolean | `true` | Automatically calculate cluster count |
| `prefill_count` | integer | `100` | Items to embed at startup |
| `batch_size` | integer | `32` | Batch size for embedding computation |
| `recluster_threshold` | float | `1.0` | Fraction of clusters to sample before reclustering |
| `preserve_visited` | boolean | `true` | Keep visited/skipped items in place |
| `trigger_ai_prefetch` | boolean | `true` | Trigger AI cache after reordering |

## Requirements

```bash
pip install sentence-transformers scikit-learn
```

These are optional dependencies. Without them, the feature will be disabled with a warning.

## Performance

- **Startup**: ~10 seconds for 100 items, ~30 seconds for 500 items (first run; cached after)
- **Memory**: ~1.5 KB per item (all-MiniLM-L6-v2), ~15 MB for 10,000 items
- **Cache**: Embeddings persisted to disk in `.diversity_cache/`

## Interaction with Other Features

- **AI Support**: When `trigger_ai_prefetch: true`, AI hints are automatically prefetched for reordered items
- **Active Learning**: Can be combined by starting with diversity clustering for initial coverage, then switching to active learning
- **Order Preservation**: When `preserve_visited: true`, previously seen items maintain their position

## Full Example

```yaml
annotation_task_name: "Diversity Ordering Test"

assignment_strategy: diversity_clustering

diversity_ordering:
  enabled: true
  model_name: "all-MiniLM-L6-v2"
  num_clusters: 5
  auto_clusters: false
  prefill_count: 100
  batch_size: 16
  recluster_threshold: 1.0
  preserve_visited: true

annotation_schemes:
  - annotation_type: radio
    name: topic
    description: "What is the main topic of this text?"
    labels:
      - name: Sports
      - name: Technology
      - name: Food
      - name: Travel
      - name: Health
```

## Further Reading

- [AI Support](/docs/features/ai-support) - AI label suggestions
- [Active Learning](/docs/features/active-learning) - ML-based instance prioritization
- [Option Highlighting](/docs/features/option-highlighting) - AI-assisted option guidance

For implementation details, see the [source documentation](https://github.com/davidjurgens/potato/blob/main/docs/diversity_ordering.md).
