Clickbait Detection (Webis Clickbait Corpus)
Classify headlines and social media posts as clickbait or non-clickbait based on the Webis Clickbait Corpus. Identify manipulative content designed to attract clicks through sensationalism, curiosity gaps, or misleading framing.
evaluation annotation
Configuration Fileconfig.yaml
# Clickbait Detection
# Based on clickbait detection research and shared tasks
#
# Clickbait Definition:
# Content designed to attract attention and encourage visitors to click
# on a link, often using sensationalized, misleading, or vague headlines.
#
# Clickbait Characteristics:
# - Curiosity gap: Withholding information to create curiosity
# ("You won't believe what happened next...")
# - Sensationalism: Exaggerated claims or emotional language
# ("SHOCKING revelation!", "This will BLOW your mind")
# - Listicles with enticing numbers: "10 things that will change your life"
# - Forward referencing: Using "this" without context
# ("This one trick...", "This is why...")
# - Misleading framing: Title doesn't match content
# - Emotional manipulation: Triggering outrage, fear, or excitement
#
# Non-Clickbait Characteristics:
# - Informative and specific headlines
# - Accurate representation of content
# - Professional/journalistic tone
# - Complete information provided upfront
#
# Annotation Guidelines:
# 1. Judge the headline/post in isolation (don't consider what article might say)
# 2. Focus on whether it uses manipulative techniques
# 3. A headline can be interesting without being clickbait
# 4. Sponsored content is not automatically clickbait
port: 8000
server_name: localhost
task_name: "Clickbait Detection"
data_files:
- sample-data.json
id_key: id
text_key: headline
output_file: annotations.json
annotation_schemes:
# Step 1: Binary clickbait classification
- annotation_type: radio
name: is_clickbait
description: "Is this headline clickbait?"
labels:
- "Clickbait"
- "Not Clickbait"
- "Borderline"
tooltips:
"Clickbait": "Uses manipulative techniques to bait clicks (curiosity gap, sensationalism, misleading)"
"Not Clickbait": "Informative, accurate, and straightforward headline"
"Borderline": "Has some clickbait elements but not clearly manipulative"
# Step 2: Identify clickbait techniques (if clickbait)
- annotation_type: multiselect
name: techniques
description: "Which clickbait techniques are used? (Select all that apply)"
labels:
- "Curiosity Gap"
- "Sensationalism"
- "Listicle Bait"
- "Forward Reference"
- "Emotional Appeal"
- "Exaggeration"
- "Vague/Ambiguous"
- "Question Bait"
label_colors:
"Curiosity Gap": "#ef4444"
"Sensationalism": "#f97316"
"Listicle Bait": "#eab308"
"Forward Reference": "#22c55e"
"Emotional Appeal": "#3b82f6"
"Exaggeration": "#8b5cf6"
"Vague/Ambiguous": "#06b6d4"
"Question Bait": "#ec4899"
tooltips:
"Curiosity Gap": "Withholds key information to create curiosity ('You won't believe...', 'What happened next...')"
"Sensationalism": "Uses shocking or extreme language ('EXPLOSIVE', 'DEVASTATING', 'UNBELIEVABLE')"
"Listicle Bait": "Uses numbered lists to promise valuable content ('10 ways to...', '5 things you must know')"
"Forward Reference": "Uses vague references like 'this', 'here's why', 'the reason' without context"
"Emotional Appeal": "Tries to trigger strong emotions (fear, outrage, excitement) rather than inform"
"Exaggeration": "Overstates importance or impact ('will change your life', 'everyone is talking about')"
"Vague/Ambiguous": "Intentionally unclear to force clicking for basic information"
"Question Bait": "Asks provocative questions designed to make readers want the answer"
min_selections: 0
max_selections: 8
# Step 3: Confidence rating
- annotation_type: likert
name: confidence
description: "How confident are you in your classification?"
min_value: 1
max_value: 5
labels:
1: "Not confident"
2: "Slightly confident"
3: "Moderately confident"
4: "Confident"
5: "Very confident"
allow_all_users: true
instances_per_annotator: 100
annotation_per_instance: 3
allow_skip: true
skip_reason_required: false
Sample Datasample-data.json
[
{
"id": "click_001",
"headline": "You Won't Believe What This Celebrity Did at the Airport - Security Was Called!"
},
{
"id": "click_002",
"headline": "Federal Reserve Raises Interest Rates by 0.25% to Combat Inflation"
}
]
// ... and 10 more itemsGet This Design
Clone or download from the repository
Quick start:
git clone https://github.com/davidjurgens/potato-showcase.git cd potato-showcase/clickbait-detection potato start config.yaml
Details
Annotation Types
Domain
Use Cases
Tags
Found an issue or want to improve this design?
Open an IssueRelated Designs
Deceptive Review Detection
Distinguish between truthful and deceptive (fake) reviews. Based on Ott et al., ACL 2011. Identify fake reviews written to deceive vs genuine customer experiences.
Dynamic Hate Speech Detection
Hate speech classification with fine-grained type labels based on the Dynamically Generated Hate Speech Dataset (Vidgen et al., ACL 2021). Classify content as hateful or not, then identify hate type (animosity, derogation, dehumanization, threatening, support for hateful entities) and target group.
GoEmotions - Fine-Grained Emotion Classification
Multi-label emotion classification with 27 emotion categories plus neutral, based on the Google Research GoEmotions dataset (Demszky et al., ACL 2020). Taxonomy covers 12 positive, 11 negative, and 4 ambiguous emotions designed for Reddit comment analysis.