SecureNLP - Malware and Security Entity Recognition

Named entity recognition in cybersecurity text, identifying malware, attack patterns, indicators, tools, vulnerabilities, and organizations, with document-level classification. Based on SemEval-2018 Task 8.

設定ファイルconfig.yaml

# SecureNLP - Malware and Security Entity Recognition
# Based on Phandi et al., SemEval 2018
# Paper: https://aclanthology.org/S18-1113/
# Dataset: https://competitions.codalab.org/competitions/17262
#
# This task asks annotators to identify cybersecurity-related entities
# in text and classify the document type.
#
# Entity Span Labels:
# - Malware: Names of malware, viruses, trojans, ransomware
# - Attack Pattern: Descriptions of attack techniques or methods
# - Indicator: IoCs such as IP addresses, hashes, domains
# - Tool: Security or hacking tools mentioned
# - Vulnerability: CVE identifiers or vulnerability descriptions
# - Organization: Threat actor groups or affected organizations
#
# Document Types:
# - Threat Report, Advisory, Analysis, News

annotation_task_name: "SecureNLP - Cybersecurity Entity Recognition"
task_dir: "."

data_files:
  - sample-data.json

item_properties:
  id_key: "id"
  text_key: "text"

output_annotation_dir: "annotation_output/"
output_annotation_format: "json"

port: 8000
server_name: localhost

annotation_schemes:
  - annotation_type: span
    name: security_entities
    description: "Highlight all cybersecurity-related entities in the text."
    labels:
      - "Malware"
      - "Attack Pattern"
      - "Indicator"
      - "Tool"
      - "Vulnerability"
      - "Organization"

  - annotation_type: radio
    name: document_type
    description: "What type of cybersecurity document is this?"
    labels:
      - "Threat Report"
      - "Advisory"
      - "Analysis"
      - "News"
    keyboard_shortcuts:
      "Threat Report": "1"
      "Advisory": "2"
      "Analysis": "3"
      "News": "4"
    tooltips:
      "Threat Report": "Detailed report on a specific threat or campaign"
      "Advisory": "Security advisory or bulletin with recommendations"
      "Analysis": "Technical analysis of malware, vulnerabilities, or attacks"
      "News": "News article covering cybersecurity events"

annotation_instructions: |
  You will be shown a cybersecurity text. Your task is to:
  1. Highlight all cybersecurity entities (malware names, attack patterns, indicators,
     tools, vulnerabilities, and organizations).
  2. Classify the document type (threat report, advisory, analysis, or news).

html_layout: |
  <div style="padding: 15px; max-width: 800px; margin: auto;">
    <div style="background: #fefce8; border: 1px solid #fde68a; border-radius: 8px; padding: 12px; margin-bottom: 12px;">
      <strong style="color: #a16207;">Source Type:</strong>
      <span style="font-size: 15px;">{{source_type}}</span>
    </div>
    <div style="background: #f0f9ff; border: 1px solid #bae6fd; border-radius: 8px; padding: 16px; margin-bottom: 16px;">
      <strong style="color: #0369a1;">Text:</strong>
      <p style="font-size: 16px; line-height: 1.7; margin: 8px 0 0 0;">{{text}}</p>
    </div>
  </div>

allow_all_users: true
instances_per_annotator: 50
annotation_per_instance: 2
allow_skip: true
skip_reason_required: false

サンプルデータsample-data.json

[
  {
    "id": "cyberner_001",
    "text": "The WannaCry ransomware exploited the EternalBlue vulnerability (CVE-2017-0144) in Windows SMB protocol. The attack affected over 200,000 computers in 150 countries, with Lazarus Group identified as the likely threat actor.",
    "source_type": "Threat Report"
  },
  {
    "id": "cyberner_002",
    "text": "A critical buffer overflow vulnerability (CVE-2018-4878) has been discovered in Adobe Flash Player. Users are advised to update to version 28.0.0.161 or later immediately. The vulnerability is being actively exploited in the wild.",
    "source_type": "Advisory"
  }
]

// ... and 8 more items

このデザインを取得

View on GitHub

Clone or download from the repository

クイックスタート：

git clone https://github.com/davidjurgens/potato-showcase.git
cd potato-showcase/semeval/2018/task08-cybersecurity-ner
potato start config.yaml

詳細

アノテーションタイプ

spanradio

ドメイン

SemEvalNLPCybersecurityNamed Entity Recognition

ユースケース

Cybersecurity NERThreat IntelligenceInformation Extraction

SecureNLP - Malware and Security Entity Recognition

設定ファイルconfig.yaml

サンプルデータsample-data.json

このデザインを取得

詳細

アノテーションタイプ

ドメイン

ユースケース

タグ

関連デザイン

Aspect-Based Sentiment Analysis

Causal Medical Claim Detection and PICO Extraction

Character Identification on Multiparty Dialogues