# Semantic Curation (Catalog)

Source: https://www.potatoannotator.com/docs/agent-evaluation/semantic-curation

**Semantic Curation helps you find *what to review* by similarity, not just rules or uncertainty.** An embedding index over your items powers similarity search ("find traces like this failure") and **dynamic slices** — saved semantic + metadata filters that auto-include new matching traces and curate into [datasets](/docs/agent-evaluation/datasets-and-experiments). It complements rule-based [triage](/docs/agent-evaluation/triage-queue) and model-uncertainty active learning.

## Enabling

```yaml
curation:
  enabled: true
  model_name: all-MiniLM-L6-v2   # any sentence-transformers model
  embed_on_ingest: false          # index runtime-ingested traces on arrival
  text_key: task_description      # which field to embed
```

Embeddings are **lazy** — `sentence-transformers` is imported only when you build the index, never at startup, so boot stays fast. Install it with `pip install sentence-transformers`, or wire a custom embedder. When enabled, the admin dashboard shows a **Catalog** link.

## Build, search, slice

```bash
# Build the index over current items
curl -X POST localhost:8000/admin/catalog/api/build -H "X-API-Key: <key>"

# Search by text query (or by an anchor instance to find neighbours)
curl -X POST localhost:8000/admin/catalog/api/search -H "X-API-Key: <key>" \
  -H "Content-Type: application/json" -d '{"query": "tool call failed", "top_k": 10, "threshold": 0.3}'
```

A **slice** is a saved filter resolved *on demand* against the current index, so traces ingested after you saved it are automatically included if they match. It combines an optional semantic neighborhood with a metadata filter:

```bash
curl -X POST localhost:8000/admin/catalog/api/slices -H "X-API-Key: <key>" \
  -H "Content-Type: application/json" \
  -d '{"name": "tool-errors", "query": "tool call failed", "threshold": 0.3,
       "metadata_filter": [{"field": "metadata.outcome", "equals": "error"}]}'

# Curate the resolved instances straight into a dataset
curl -X POST localhost:8000/admin/catalog/api/slices/tool-errors/to_dataset \
  -H "X-API-Key: <key>" -H "Content-Type: application/json" \
  -d '{"dataset": "tool-errors-to-fix"}'
```

## Related

- [Full reference on Read the Docs](https://potatoannotator.readthedocs.io/en/latest/agent-evaluation/semantic_curation/) — full slice/embedding API, version-matched
- [Datasets & Experiments](/docs/agent-evaluation/datasets-and-experiments) — slice curation target
- [Automation Rules](/docs/agent-evaluation/automation-rules) — rule-based routing (shares the condition grammar)
- [Triage Queue](/docs/agent-evaluation/triage-queue) — signal-based prioritization
