Clickbait Detection (Webis Clickbait Corpus)
Classify headlines and social media posts as clickbait or non-clickbait based on the Webis Clickbait Corpus. Identify manipulative content designed to attract clicks through sensationalism, curiosity gaps, or misleading framing.
Konfigurationsdateiconfig.yaml
# Clickbait Detection
# Based on clickbait detection research and shared tasks
#
# Clickbait Definition:
# Content designed to attract attention and encourage visitors to click
# on a link, often using sensationalized, misleading, or vague headlines.
#
# Clickbait Characteristics:
# - Curiosity gap: Withholding information to create curiosity
# ("You won't believe what happened next...")
# - Sensationalism: Exaggerated claims or emotional language
# ("SHOCKING revelation!", "This will BLOW your mind")
# - Listicles with enticing numbers: "10 things that will change your life"
# - Forward referencing: Using "this" without context
# ("This one trick...", "This is why...")
# - Misleading framing: Title doesn't match content
# - Emotional manipulation: Triggering outrage, fear, or excitement
#
# Non-Clickbait Characteristics:
# - Informative and specific headlines
# - Accurate representation of content
# - Professional/journalistic tone
# - Complete information provided upfront
#
# Annotation Guidelines:
# 1. Judge the headline/post in isolation (don't consider what article might say)
# 2. Focus on whether it uses manipulative techniques
# 3. A headline can be interesting without being clickbait
# 4. Sponsored content is not automatically clickbait
annotation_task_name: "Clickbait Detection"
task_dir: "."
data_files:
- sample-data.json
item_properties:
id_key: "id"
text_key: "headline"
output_annotation_dir: "annotation_output/"
output_annotation_format: "json"
annotation_schemes:
# Step 1: Binary clickbait classification
- annotation_type: radio
name: is_clickbait
description: "Is this headline clickbait?"
labels:
- "Clickbait"
- "Not Clickbait"
- "Borderline"
tooltips:
"Clickbait": "Uses manipulative techniques to bait clicks (curiosity gap, sensationalism, misleading)"
"Not Clickbait": "Informative, accurate, and straightforward headline"
"Borderline": "Has some clickbait elements but not clearly manipulative"
# Step 2: Identify clickbait techniques (if clickbait)
- annotation_type: multiselect
name: techniques
description: "Which clickbait techniques are used? (Select all that apply)"
labels:
- "Curiosity Gap"
- "Sensationalism"
- "Listicle Bait"
- "Forward Reference"
- "Emotional Appeal"
- "Exaggeration"
- "Vague/Ambiguous"
- "Question Bait"
label_colors:
"Curiosity Gap": "#ef4444"
"Sensationalism": "#f97316"
"Listicle Bait": "#eab308"
"Forward Reference": "#22c55e"
"Emotional Appeal": "#3b82f6"
"Exaggeration": "#8b5cf6"
"Vague/Ambiguous": "#06b6d4"
"Question Bait": "#ec4899"
tooltips:
"Curiosity Gap": "Withholds key information to create curiosity ('You won't believe...', 'What happened next...')"
"Sensationalism": "Uses shocking or extreme language ('EXPLOSIVE', 'DEVASTATING', 'UNBELIEVABLE')"
"Listicle Bait": "Uses numbered lists to promise valuable content ('10 ways to...', '5 things you must know')"
"Forward Reference": "Uses vague references like 'this', 'here's why', 'the reason' without context"
"Emotional Appeal": "Tries to trigger strong emotions (fear, outrage, excitement) rather than inform"
"Exaggeration": "Overstates importance or impact ('will change your life', 'everyone is talking about')"
"Vague/Ambiguous": "Intentionally unclear to force clicking for basic information"
"Question Bait": "Asks provocative questions designed to make readers want the answer"
min_selections: 0
max_selections: 8
# Step 3: Confidence rating
- annotation_type: likert
name: confidence
description: "How confident are you in your classification?"
min_value: 1
max_value: 5
labels:
1: "Not confident"
2: "Slightly confident"
3: "Moderately confident"
4: "Confident"
5: "Very confident"
allow_all_users: true
instances_per_annotator: 100
annotation_per_instance: 3
allow_skip: true
skip_reason_required: false
Beispieldatensample-data.json
[
{
"id": "click_001",
"headline": "You Won't Believe What This Celebrity Did at the Airport - Security Was Called!"
},
{
"id": "click_002",
"headline": "Federal Reserve Raises Interest Rates by 0.25% to Combat Inflation"
}
]
// ... and 10 more itemsDieses Design herunterladen
Clone or download from the repository
Schnellstart:
git clone https://github.com/davidjurgens/potato-showcase.git cd potato-showcase/text/fact-verification/clickbait-detection potato start config.yaml
Details
Annotationstypen
Bereich
Anwendungsfälle
Schlagwörter
Problem gefunden oder möchten Sie dieses Design verbessern?
Issue öffnenVerwandte Designs
Deceptive Review Detection
Distinguish between truthful and deceptive (fake) reviews. Based on Ott et al., ACL 2011. Identify fake reviews written to deceive vs genuine customer experiences.
Dynamic Hate Speech Detection
Hate speech classification with fine-grained type labels based on the Dynamically Generated Hate Speech Dataset (Vidgen et al., ACL 2021). Classify content as hateful or not, then identify hate type (animosity, derogation, dehumanization, threatening, support for hateful entities) and target group.
Rumor Stance Detection (PHEME)
Classify stance toward rumors in social media threads. Based on PHEME (Zubiaga et al.). Label replies as supporting, denying, querying, or commenting on rumorous claims.