CoNLL-2003 NER with Triage
Named entity recognition with a triage pre-annotation step, based on the CoNLL-2003 Shared Task (Tjong Kim Sang & De Meulder, CoNLL 2003). Annotators first flag whether a sentence contains entities worth annotating, then mark spans for Person, Organization, Location, and Miscellaneous entities.
Fichier de configurationconfig.yaml
# CoNLL-2003 NER with Triage
# Based on Tjong Kim Sang & De Meulder, CoNLL 2003
# Paper: https://aclanthology.org/W03-0419/
# Dataset: https://www.clips.uantwerpen.be/conll2003/ner/
#
# Two-stage annotation process:
# 1. Triage: Quickly flag whether the text contains named entities
# 2. Span annotation: Mark entity boundaries and assign types
#
# Entity types (CoNLL-2003 standard):
# - PER: Person names (e.g., "John Smith", "Dr. Johnson")
# - ORG: Organization names (e.g., "Microsoft", "United Nations")
# - LOC: Location names (e.g., "Paris", "Mount Everest")
# - MISC: Miscellaneous entities (e.g., nationalities, events, works of art)
#
# Guidelines:
# - Mark the full extent of the entity mention
# - Include titles only if they are part of the name
# - Nested entities: annotate the outermost entity
annotation_task_name: "CoNLL-2003 NER with Triage"
task_dir: "."
data_files:
- sample-data.json
item_properties:
id_key: "id"
text_key: "text"
output_annotation_dir: "annotation_output/"
output_annotation_format: "json"
port: 8000
server_name: localhost
annotation_schemes:
- annotation_type: triage
name: entity_triage
description: "Flag whether this text contains named entities worth annotating"
- annotation_type: span
name: named_entities
description: "Highlight and label all named entities in the text"
labels:
- "PER"
- "ORG"
- "LOC"
- "MISC"
keyboard_shortcuts:
"PER": "1"
"ORG": "2"
"LOC": "3"
"MISC": "4"
tooltips:
"PER": "Person names including first, last, or full names"
"ORG": "Organization names: companies, agencies, institutions"
"LOC": "Location names: cities, countries, geographic features"
"MISC": "Miscellaneous: nationalities, events, languages, works of art"
annotation_instructions: |
Annotate named entities in news text:
1. First, use the triage tool to indicate whether the text contains entities.
2. If entities are present, highlight each entity span and assign a type.
3. Entity types: PER (person), ORG (organization), LOC (location), MISC (other).
4. Mark the full span of each entity mention.
html_layout: |
<div style="padding: 15px; max-width: 800px; margin: auto;">
<div style="background: #f0f9ff; border: 1px solid #bae6fd; border-radius: 8px; padding: 16px; margin-bottom: 16px;">
<strong style="color: #0369a1;">Text:</strong>
<p style="font-size: 16px; line-height: 1.8; margin: 8px 0 0 0;">{{text}}</p>
</div>
</div>
allow_all_users: true
instances_per_annotator: 200
annotation_per_instance: 2
allow_skip: true
skip_reason_required: false
Données d'exemplesample-data.json
[
{
"id": "conll_001",
"text": "German midfielder Michael Ballack scored twice as Bayern Munich defeated Real Madrid 3-1 in the Champions League quarter-final at the Allianz Arena on Tuesday."
},
{
"id": "conll_002",
"text": "The United Nations Security Council voted unanimously to impose new sanctions on North Korea following its latest missile test over the Sea of Japan."
}
]
// ... and 8 more itemsObtenir ce design
Clone or download from the repository
Démarrage rapide :
git clone https://github.com/davidjurgens/potato-showcase.git cd potato-showcase/text/named-entity-recognition/conll2003-ner-triage potato start config.yaml
Détails
Types d'annotation
Domaine
Cas d'utilisation
Étiquettes
Vous avez trouvé un problème ou souhaitez améliorer ce design ?
Ouvrir un ticketDesigns associés
Aspect-Based Sentiment Analysis
Identification of aspect terms in review text with sentiment polarity classification for each aspect. Based on SemEval-2016 Task 5 (ABSA).
BioNLP 2011 - Gene Regulation Event Extraction
Biomedical event extraction for gene regulation, based on the BioNLP 2011 Shared Task (Kim et al., ACL Workshop 2011). Annotators identify biological entities and mark regulatory events such as gene expression, transcription, and protein catabolism in scientific abstracts.
Causal Medical Claim Detection and PICO Extraction
Detection of causal claims in medical texts and extraction of PICO (Population, Intervention, Comparator, Outcome) elements. Based on SemEval-2023 Task 8 (Khetan et al.).