Skip to content
Questa pagina non è ancora disponibile nella tua lingua. Viene mostrata la versione in inglese.

Detecting Hallucinations with Span Annotation

How to find and label hallucinations and factual errors in model output using span annotation and MQM-style error marking in Potato.

A hallucination is a confident statement a model makes that isn't supported by its input or by fact. The most useful way to capture one is to highlight the exact words and label what's wrong with them, a span annotation task over model output. Span-level labels are far more actionable than a single "this answer is wrong" flag.

See hallucination (artificial intelligence) for background.

Why mark spans, not whole answers

A whole-answer "unfaithful" label tells you that something is wrong; a span tells you what and where. Span data lets you measure error rates per type, find patterns, and build targeted training data. It mirrors MQM (Multidimensional Quality Metrics), the standard error-span framework from machine-translation evaluation.

Setting up error-span annotation

yaml
annotation_schemes:
  - annotation_type: span
    name: errors
    description: "Highlight each problematic span and label the error type."
    labels: [unsupported_claim, factual_error, contradiction, fabricated_citation]
    label_colors:
      unsupported_claim: "#f59e0b"
      factual_error: "#ef4444"
      contradiction: "#8b5cf6"
      fabricated_citation: "#ec4899"
  - annotation_type: radio
    name: severity
    description: "How serious is the worst error?"
    labels: [Minor, Major, Critical]

Add a severity judgment so you can weight a trivial slip differently from a dangerous fabrication, the way MQM does.

Defining the error types

  • Unsupported claim: not backed by the source (the RAG case).
  • Factual error: contradicts established fact.
  • Contradiction: conflicts with something earlier in the same output.
  • Fabricated citation: a reference that doesn't exist or doesn't say what's claimed.

Keep the set small and give each a one-line definition with an example, per Writing Annotation Guidelines.

Quality considerations

  • Give annotators the source material; "unsupported" is undefinable without it.
  • Boundary rules matter, does the span cover the whole sentence or just the false clause? Decide once.
  • Faithfulness is subjective at the edges; collect overlap and track agreement.

Further reading