# ICL Labeling

Source: https://www.potatoannotator.com/docs/features/icl-labeling

Potato's ICL (In-Context Learning) labeling feature enables AI-assisted annotation by using high-confidence human annotations as in-context examples to guide an LLM in labeling remaining data. The system tracks LLM confidence and routes predictions back to humans for verification.

## Overview

The ICL labeling system:

1. **Collects High-Confidence Examples**: Identifies instances where annotators agree (e.g., 80%+ agreement)
2. **Labels with LLM**: Uses examples to prompt an LLM for labeling unlabeled instances
3. **Tracks Confidence**: Records LLM confidence scores for each prediction
4. **Verifies Accuracy**: Routes a sample of LLM-labeled instances to humans for blind verification
5. **Reports Metrics**: Calculates and displays LLM accuracy based on verification results

## Features

### Automatic Example Collection

The system automatically identifies high-confidence examples where multiple annotators agree:

- Configurable agreement threshold (default: 80%)
- Minimum annotator count requirement (default: 2)
- Automatic refresh on configurable interval
- Per-schema example pools

### LLM Labeling with Limits

To enable iterative improvement rather than bulk labeling:

- **Max total labels**: Limit the total number of LLM predictions
- **Max unlabeled ratio**: Only label a percentage of remaining data
- **Pause on low accuracy**: Automatically pause if accuracy drops below threshold

### Blind Verification

Verification uses "blind labeling" - annotators see the instance as a normal task without knowing the LLM's prediction:

- Configurable sample rate (default: 20% of LLM labels)
- Multiple selection strategies: `low_confidence`, `random`, `mixed`
- Verification tasks mixed naturally with regular assignments

## Configuration

ICL labeling requires `ai_support` to be enabled:

```yaml
# AI endpoint configuration (required)
ai_support:
  enabled: true
  endpoint_type: "openai"
  ai_config:
    model: "gpt-4o-mini"
    api_key: "${OPENAI_API_KEY}"

# ICL labeling configuration
icl_labeling:
  enabled: true

  # Example selection settings
  example_selection:
    min_agreement_threshold: 0.8      # 80% annotators must agree
    min_annotators_per_instance: 2    # Minimum annotations for consensus
    max_examples_per_schema: 10       # Max examples per schema in prompt
    refresh_interval_seconds: 300     # How often to refresh examples

  # LLM labeling settings
  llm_labeling:
    batch_size: 20
    trigger_threshold: 5              # Min examples before LLM labeling starts
    confidence_threshold: 0.7         # Min confidence to accept prediction
    batch_interval_seconds: 600
    max_total_labels: 100             # Max instances to label total
    max_unlabeled_ratio: 0.5          # Max portion of unlabeled to label
    pause_on_low_accuracy: true
    min_accuracy_threshold: 0.7

  # Human verification settings
  verification:
    enabled: true
    sample_rate: 0.2                  # 20% of LLM labels verified
    selection_strategy: "low_confidence"
    mix_with_regular_assignments: true
    assignment_mix_rate: 0.2
```

### Selection Strategies

- **low_confidence**: Prioritizes verifying LLM's least confident predictions first
- **random**: Random sampling from all predictions
- **mixed**: 50% low confidence + 50% random

## Admin API

### Status Endpoint

```http
GET /admin/api/icl/status
```

Returns overall ICL labeler status including examples per schema, predictions made, verification queue size, and accuracy metrics.

### Examples Endpoint

```http
GET /admin/api/icl/examples?schema=sentiment
```

Returns high-confidence examples, optionally filtered by schema.

### Accuracy Endpoint

```http
GET /admin/api/icl/accuracy?schema=sentiment
```

Returns accuracy metrics based on human verification results.

### Manual Trigger Endpoint

```http
POST /admin/api/icl/trigger
Content-Type: application/json

{"schema_name": "sentiment"}
```

Manually trigger batch labeling for a specific schema.

## Usage Workflow

### 1. Configure Your Project

```yaml
ai_support:
  enabled: true
  endpoint_type: "openai"
  ai_config:
    model: "gpt-4o-mini"
    api_key: "${OPENAI_API_KEY}"

icl_labeling:
  enabled: true
  example_selection:
    min_agreement_threshold: 0.8
  llm_labeling:
    max_total_labels: 50  # Start small
  verification:
    enabled: true
    sample_rate: 0.3  # Verify 30% initially
```

### 2. Collect Human Annotations

Have annotators label data normally. As they reach consensus (80%+ agreement), those instances become available as examples.

### 3. Monitor Progress

```bash
curl http://localhost:8000/admin/api/icl/status
```

### 4. Review Accuracy

```bash
curl http://localhost:8000/admin/api/icl/accuracy
```

### 5. Iterate

Based on accuracy:
- If accuracy is high (>80%), increase `max_total_labels`
- If accuracy is low, add more human examples before continuing

## Best Practices

1. **Start Small**: Begin with conservative limits (`max_total_labels: 50`) to assess accuracy before scaling

2. **Verify Early**: Use a higher `sample_rate` initially (0.3-0.5) to get confident accuracy estimates

3. **Monitor Actively**: Check accuracy metrics regularly through the admin API

4. **Adjust Thresholds**: If LLM accuracy is low:
   - Increase `min_agreement_threshold` for cleaner examples
   - Increase `trigger_threshold` for more examples before labeling
   - Lower `confidence_threshold` to reject uncertain predictions

5. **Use Selection Strategies**:
   - `low_confidence`: Best for identifying problematic categories
   - `random`: Best for unbiased accuracy estimates
   - `mixed`: Balanced approach

## Troubleshooting

### LLM Not Labeling

1. Check if `ai_support` is properly configured
2. Verify enough high-confidence examples exist
3. Check if labeling is paused due to limits or low accuracy

### Low Accuracy

1. Increase `min_agreement_threshold` for cleaner examples
2. Add more annotation guidelines/instructions
3. Review examples being used (`/admin/api/icl/examples`)

### Verification Tasks Not Appearing

1. Verify `verification.enabled` is true
2. Check `mix_with_regular_assignments` is true
3. Verify there are pending verifications in the queue

## Further Reading

- [AI Support](/docs/features/ai-support) - General AI endpoint configuration
- [Active Learning](/docs/features/active-learning) - Related AI-assisted features
- [Quality Control](/docs/features/quality-control) - Accuracy tracking

For implementation details, see the [source documentation](https://github.com/davidjurgens/potato/blob/main/docs/icl_labeling.md).
