Clickbait Detection (Webis Clickbait Corpus)

Classify headlines and social media posts as clickbait or non-clickbait based on the Webis Clickbait Corpus. Identify manipulative content designed to attract clicks through sensationalism, curiosity gaps, or misleading framing.

Configuration Fileconfig.yaml

# Clickbait Detection
# Based on clickbait detection research and shared tasks
#
# Clickbait Definition:
# Content designed to attract attention and encourage visitors to click
# on a link, often using sensationalized, misleading, or vague headlines.
#
# Clickbait Characteristics:
# - Curiosity gap: Withholding information to create curiosity
#   ("You won't believe what happened next...")
# - Sensationalism: Exaggerated claims or emotional language
#   ("SHOCKING revelation!", "This will BLOW your mind")
# - Listicles with enticing numbers: "10 things that will change your life"
# - Forward referencing: Using "this" without context
#   ("This one trick...", "This is why...")
# - Misleading framing: Title doesn't match content
# - Emotional manipulation: Triggering outrage, fear, or excitement
#
# Non-Clickbait Characteristics:
# - Informative and specific headlines
# - Accurate representation of content
# - Professional/journalistic tone
# - Complete information provided upfront
#
# Annotation Guidelines:
# 1. Judge the headline/post in isolation (don't consider what article might say)
# 2. Focus on whether it uses manipulative techniques
# 3. A headline can be interesting without being clickbait
# 4. Sponsored content is not automatically clickbait

annotation_task_name: "Clickbait Detection"
task_dir: "."

data_files:
  - sample-data.json
item_properties:
  id_key: "id"
  text_key: "headline"

output_annotation_dir: "annotation_output/"
output_annotation_format: "json"

annotation_schemes:
  # Step 1: Binary clickbait classification
  - annotation_type: radio
    name: is_clickbait
    description: "Is this headline clickbait?"
    labels:
      - "Clickbait"
      - "Not Clickbait"
      - "Borderline"
    tooltips:
      "Clickbait": "Uses manipulative techniques to bait clicks (curiosity gap, sensationalism, misleading)"
      "Not Clickbait": "Informative, accurate, and straightforward headline"
      "Borderline": "Has some clickbait elements but not clearly manipulative"

  # Step 2: Identify clickbait techniques (if clickbait)
  - annotation_type: multiselect
    name: techniques
    description: "Which clickbait techniques are used? (Select all that apply)"
    labels:
      - "Curiosity Gap"
      - "Sensationalism"
      - "Listicle Bait"
      - "Forward Reference"
      - "Emotional Appeal"
      - "Exaggeration"
      - "Vague/Ambiguous"
      - "Question Bait"
    label_colors:
      "Curiosity Gap": "#ef4444"
      "Sensationalism": "#f97316"
      "Listicle Bait": "#eab308"
      "Forward Reference": "#22c55e"
      "Emotional Appeal": "#3b82f6"
      "Exaggeration": "#8b5cf6"
      "Vague/Ambiguous": "#06b6d4"
      "Question Bait": "#ec4899"
    tooltips:
      "Curiosity Gap": "Withholds key information to create curiosity ('You won't believe...', 'What happened next...')"
      "Sensationalism": "Uses shocking or extreme language ('EXPLOSIVE', 'DEVASTATING', 'UNBELIEVABLE')"
      "Listicle Bait": "Uses numbered lists to promise valuable content ('10 ways to...', '5 things you must know')"
      "Forward Reference": "Uses vague references like 'this', 'here's why', 'the reason' without context"
      "Emotional Appeal": "Tries to trigger strong emotions (fear, outrage, excitement) rather than inform"
      "Exaggeration": "Overstates importance or impact ('will change your life', 'everyone is talking about')"
      "Vague/Ambiguous": "Intentionally unclear to force clicking for basic information"
      "Question Bait": "Asks provocative questions designed to make readers want the answer"
    min_selections: 0
    max_selections: 8

  # Step 3: Confidence rating
  - annotation_type: likert
    name: confidence
    description: "How confident are you in your classification?"
    min_value: 1
    max_value: 5
    labels:
      1: "Not confident"
      2: "Slightly confident"
      3: "Moderately confident"
      4: "Confident"
      5: "Very confident"

allow_all_users: true
instances_per_annotator: 100
annotation_per_instance: 3
allow_skip: true
skip_reason_required: false

Sample Datasample-data.json

[
  {
    "id": "click_001",
    "headline": "You Won't Believe What This Celebrity Did at the Airport - Security Was Called!"
  },
  {
    "id": "click_002",
    "headline": "Federal Reserve Raises Interest Rates by 0.25% to Combat Inflation"
  }
]

// ... and 10 more items

Get This Design

View on GitHub

Clone or download from the repository

Quick start:

git clone https://github.com/davidjurgens/potato-showcase.git
cd potato-showcase/text/fact-verification/clickbait-detection
potato start config.yaml

Details

Annotation Types

likertmultiselectradio

Domain

NLPSocial MediaMisinformation

Use Cases

Content ModerationMedia QualityMisinformation Detection

Related Designs

Deceptive Review Detection

Distinguish between truthful and deceptive (fake) reviews. Based on Ott et al., ACL 2011. Identify fake reviews written to deceive vs genuine customer experiences.

likertmultiselect

Dynamic Hate Speech Detection

Hate speech classification with fine-grained type labels based on the Dynamically Generated Hate Speech Dataset (Vidgen et al., ACL 2021). Classify content as hateful or not, then identify hate type (animosity, derogation, dehumanization, threatening, support for hateful entities) and target group.