Biomedical Named Entity Recognition (JNLPBA)
Named entity recognition for biomedical text based on the JNLPBA shared task. Annotate entities including proteins, DNA, RNA, cell lines, and cell types following BioNLP community standards.
text annotation
Configuration Fileconfig.yaml
# Biomedical Named Entity Recognition
# Based on standard biomedical NER annotation guidelines
# (JNLPBA, BC5CDR, NCBI Disease, i2b2, etc.)
#
# Entity Types:
# - Disease/Condition: Diseases, disorders, symptoms, findings
# - Medication/Drug: Medications, drugs, therapeutic agents
# - Procedure: Medical procedures, surgeries, treatments
# - Anatomy: Body parts, organs, tissues, cells
# - Gene/Protein: Genes, proteins, gene products
# - Chemical: Chemical compounds (non-drug)
# - Organism: Species, pathogens, organisms
#
# Annotation Guidelines:
# 1. Annotate the full noun phrase, including modifiers
# - "severe acute respiratory syndrome" not just "syndrome"
# 2. Include abbreviations when they stand alone
# - "COVID-19", "HIV", "NSAID"
# 3. Do NOT include articles (a, the) in the span
# 4. For nested entities, annotate the outermost mention
# 5. Annotate each mention, even if repeated
# 6. When uncertain, prefer broader entity boundaries
#
# Difficult Cases:
# - "Heart attack" → Disease (entire phrase)
# - "Blood pressure" → can be Anatomy or finding depending on context
# - "Insulin treatment" → Medication (insulin) + Procedure (treatment)
port: 8000
server_name: localhost
task_name: "Biomedical Named Entity Recognition"
data_files:
- sample-data.json
id_key: id
text_key: text
output_file: annotations.json
annotation_schemes:
- annotation_type: span
name: entities
description: "Highlight all biomedical entities in the text"
labels:
# Core clinical entities
- "Disease"
- "Symptom"
- "Medication"
- "Procedure"
- "Anatomy"
# Molecular/biological entities
- "Gene"
- "Protein"
- "Chemical"
- "Organism"
# Clinical values
- "Lab_Value"
- "Dosage"
label_colors:
"Disease": "#ef4444"
"Symptom": "#f97316"
"Medication": "#3b82f6"
"Procedure": "#8b5cf6"
"Anatomy": "#22c55e"
"Gene": "#06b6d4"
"Protein": "#0891b2"
"Chemical": "#eab308"
"Organism": "#84cc16"
"Lab_Value": "#a855f7"
"Dosage": "#ec4899"
tooltips:
"Disease": "Diseases, disorders, conditions, syndromes (e.g., 'diabetes', 'hypertension', 'COVID-19')"
"Symptom": "Signs, symptoms, clinical findings (e.g., 'fever', 'chest pain', 'fatigue')"
"Medication": "Drugs, medications, therapeutic agents (e.g., 'aspirin', 'metformin', 'insulin')"
"Procedure": "Medical procedures, surgeries, treatments (e.g., 'biopsy', 'MRI', 'chemotherapy')"
"Anatomy": "Body parts, organs, tissues, cells (e.g., 'liver', 'left ventricle', 'neurons')"
"Gene": "Gene names and symbols (e.g., 'BRCA1', 'TP53', 'insulin gene')"
"Protein": "Protein names (e.g., 'hemoglobin', 'albumin', 'cytokines')"
"Chemical": "Chemical compounds, non-drug chemicals (e.g., 'glucose', 'sodium chloride')"
"Organism": "Species, pathogens, microorganisms (e.g., 'E. coli', 'SARS-CoV-2', 'Staphylococcus')"
"Lab_Value": "Laboratory measurements and values (e.g., 'blood glucose 120 mg/dL', 'WBC count')"
"Dosage": "Drug dosages and frequencies (e.g., '500mg twice daily', '10mg/kg')"
allow_overlapping: false
allow_all_users: true
instances_per_annotator: 50
annotation_per_instance: 2
allow_skip: true
skip_reason_required: false
Sample Datasample-data.json
[
{
"id": "bio_001",
"text": "The patient was diagnosed with type 2 diabetes mellitus and started on metformin 500mg twice daily. Blood glucose levels improved significantly after 3 months of treatment."
},
{
"id": "bio_002",
"text": "MRI of the brain revealed a 2cm lesion in the left temporal lobe, suspicious for glioblastoma. The patient was referred to neurosurgery for biopsy."
}
]
// ... and 8 more itemsGet This Design
Clone or download from the repository
Quick start:
git clone https://github.com/davidjurgens/potato-showcase.git cd potato-showcase/biomedical-ner potato start config.yaml
Details
Annotation Types
Domain
Use Cases
Tags
Found an issue or want to improve this design?
Open an IssueRelated Designs
Chemical-Disease Relation Extraction (BC5CDR)
Extract chemical-disease relations from biomedical literature. Based on BioCreative V CDR task. Identify chemical and disease entities, then annotate causal relationships between them (chemical induces disease).
Social Determinants of Health (SDOH) Extraction
Event-based extraction of social determinants of health from clinical notes based on the n2c2 2022 Track 2 shared task and SHAC corpus. Annotates substance use (alcohol, drug, tobacco), employment, and living status with temporal and status attributes.
Adverse Drug Event Extraction (CADEC)
Named entity recognition for adverse drug events from patient-reported experiences, based on the CADEC corpus (Karimi et al., 2015). Annotates drugs, adverse effects, symptoms, diseases, and findings from colloquial health forum posts with mapping to medical vocabularies (SNOMED-CT, MedDRA).