Skip to content
Showcase/Biomedical Named Entity Recognition (JNLPBA)
advancedtext

Biomedical Named Entity Recognition (JNLPBA)

Named entity recognition for biomedical text based on the JNLPBA shared task. Annotate entities including proteins, DNA, RNA, cell lines, and cell types following BioNLP community standards.

PERORGLOCPERORGLOCDATESelect text to annotate

設定ファイルconfig.yaml

# Biomedical Named Entity Recognition
# Based on standard biomedical NER annotation guidelines
# (JNLPBA, BC5CDR, NCBI Disease, i2b2, etc.)
#
# Entity Types:
# - Disease/Condition: Diseases, disorders, symptoms, findings
# - Medication/Drug: Medications, drugs, therapeutic agents
# - Procedure: Medical procedures, surgeries, treatments
# - Anatomy: Body parts, organs, tissues, cells
# - Gene/Protein: Genes, proteins, gene products
# - Chemical: Chemical compounds (non-drug)
# - Organism: Species, pathogens, organisms
#
# Annotation Guidelines:
# 1. Annotate the full noun phrase, including modifiers
#    - "severe acute respiratory syndrome" not just "syndrome"
# 2. Include abbreviations when they stand alone
#    - "COVID-19", "HIV", "NSAID"
# 3. Do NOT include articles (a, the) in the span
# 4. For nested entities, annotate the outermost mention
# 5. Annotate each mention, even if repeated
# 6. When uncertain, prefer broader entity boundaries
#
# Difficult Cases:
# - "Heart attack" → Disease (entire phrase)
# - "Blood pressure" → can be Anatomy or finding depending on context
# - "Insulin treatment" → Medication (insulin) + Procedure (treatment)

annotation_task_name: "Biomedical Named Entity Recognition"
task_dir: "."

data_files:
  - sample-data.json
item_properties:
  id_key: "id"
  text_key: "text"

output_annotation_dir: "annotation_output/"
output_annotation_format: "json"

annotation_schemes:
  - annotation_type: span
    name: entities
    description: "Highlight all biomedical entities in the text"
    labels:
      # Core clinical entities
      - "Disease"
      - "Symptom"
      - "Medication"
      - "Procedure"
      - "Anatomy"
      # Molecular/biological entities
      - "Gene"
      - "Protein"
      - "Chemical"
      - "Organism"
      # Clinical values
      - "Lab_Value"
      - "Dosage"
    label_colors:
      "Disease": "#ef4444"
      "Symptom": "#f97316"
      "Medication": "#3b82f6"
      "Procedure": "#8b5cf6"
      "Anatomy": "#22c55e"
      "Gene": "#06b6d4"
      "Protein": "#0891b2"
      "Chemical": "#eab308"
      "Organism": "#84cc16"
      "Lab_Value": "#a855f7"
      "Dosage": "#ec4899"
    tooltips:
      "Disease": "Diseases, disorders, conditions, syndromes (e.g., 'diabetes', 'hypertension', 'COVID-19')"
      "Symptom": "Signs, symptoms, clinical findings (e.g., 'fever', 'chest pain', 'fatigue')"
      "Medication": "Drugs, medications, therapeutic agents (e.g., 'aspirin', 'metformin', 'insulin')"
      "Procedure": "Medical procedures, surgeries, treatments (e.g., 'biopsy', 'MRI', 'chemotherapy')"
      "Anatomy": "Body parts, organs, tissues, cells (e.g., 'liver', 'left ventricle', 'neurons')"
      "Gene": "Gene names and symbols (e.g., 'BRCA1', 'TP53', 'insulin gene')"
      "Protein": "Protein names (e.g., 'hemoglobin', 'albumin', 'cytokines')"
      "Chemical": "Chemical compounds, non-drug chemicals (e.g., 'glucose', 'sodium chloride')"
      "Organism": "Species, pathogens, microorganisms (e.g., 'E. coli', 'SARS-CoV-2', 'Staphylococcus')"
      "Lab_Value": "Laboratory measurements and values (e.g., 'blood glucose 120 mg/dL', 'WBC count')"
      "Dosage": "Drug dosages and frequencies (e.g., '500mg twice daily', '10mg/kg')"
    allow_overlapping: false

allow_all_users: true
instances_per_annotator: 50
annotation_per_instance: 2
allow_skip: true
skip_reason_required: false

サンプルデータsample-data.json

[
  {
    "id": "bio_001",
    "text": "The patient was diagnosed with type 2 diabetes mellitus and started on metformin 500mg twice daily. Blood glucose levels improved significantly after 3 months of treatment."
  },
  {
    "id": "bio_002",
    "text": "MRI of the brain revealed a 2cm lesion in the left temporal lobe, suspicious for glioblastoma. The patient was referred to neurosurgery for biopsy."
  }
]

// ... and 8 more items

このデザインを取得

View on GitHub

Clone or download from the repository

クイックスタート:

git clone https://github.com/davidjurgens/potato-showcase.git
cd potato-showcase/text/named-entity-recognition/biomedical-ner
potato start config.yaml

詳細

アノテーションタイプ

span

ドメイン

BiomedicalClinical NLPHealthcare

ユースケース

Named Entity RecognitionInformation ExtractionClinical Text Mining

タグ

biomedicalnerclinicalhealthcareentitiesjnlpbabionlp

問題を見つけた場合やデザインを改善したい場合は?

Issueを作成