Skip to content
यह पृष्ठ अभी आपकी भाषा में उपलब्ध नहीं है। अंग्रेज़ी संस्करण दिखाया जा रहा है।

Coreference Resolution

What coreference annotation is, how to group mentions into entity chains, and how to set up a coreference task in Potato.

Coreference resolution is the task of grouping all the mentions in a text that refer to the same thing. "Marie Curie … she … the physicist" is one chain pointing at one person. It turns scattered mentions into entities, which is essential for summarization, question answering, and knowledge extraction.

See Coreference for background.

What annotators do

  1. Mark each mention (a name, a pronoun, or a noun phrase) as a span.
  2. Group mentions that refer to the same entity into a chain.
  3. Repeat for every distinct entity in the passage.

The output is a set of chains, each a list of spans that co-refer. Chains can cross sentence boundaries, which is what makes the task harder than plain span annotation.

Setting it up in Potato

Potato has a coreference annotation type that lets annotators mark mentions and link them into chains. The coreference showcase is a ready-to-run example.

yaml
annotation_schemes:
  - annotation_type: span
    name: mentions
    description: "Mark every mention (names, pronouns, noun phrases), then group mentions that refer to the same entity into a chain."
    labels: [Entity]
    allow_overlapping: true

Allow overlapping spans, because mentions frequently nest ("[[his] mother]").

Common pitfalls

  • Singletons. Decide whether to mark entities mentioned only once. It affects your counts and metrics.
  • Generic vs. specific. "Doctors recommend rest", is "doctors" an entity to track? Write a rule.
  • Split antecedents. "Alice and Bob … they" refers to both; decide how to represent it.

Because chains are structured, measure agreement carefully, see Inter-Annotator Agreement and adjudicate with care.

Further reading