Annotation Showcase
Browse 378+ ready-to-use annotation configurations. Download configs and start annotating immediately.
Showing 378 of 378 designs
Text Annotation172
Adverse Drug Event Extraction (CADEC)
intermediateNamed entity recognition for adverse drug events from patient-reported experiences, based on the CADEC corpus (Karimi et al., 2015). Annotates drugs, adverse effects, symptoms, diseases, and findings from colloquial health forum posts with mapping to medical vocabularies (SNOMED-CT, MedDRA).
AMI Meeting Multi-Tier Annotation
advancedMulti-tier ELAN-style annotation of multi-party meeting recordings. Annotators segment speaker turns, head gestures, and focus of attention on parallel timeline tiers, then classify dialogue acts and topic segments. Based on the AMI Meeting Corpus.
Analysis of Clinical Text: Disorder Identification and Normalization
advancedIdentify disorder mentions and their attributes in clinical discharge summaries, based on SemEval-2015 Task 14 (Elhadad et al.). Annotators mark disorder spans, body locations, severity indicators, and classify the assertion status of each disorder.
Aspect-Based Sentiment Analysis
intermediateIdentification of aspect terms in review text with sentiment polarity classification for each aspect. Based on SemEval-2016 Task 5 (ABSA).
Aspect-Based Sentiment Analysis (Original ABSA)
intermediateIdentify aspect terms in review text and classify their sentiment polarity, based on SemEval-2014 Task 4 (Pontiki et al.). Annotators highlight aspect terms and assign sentiment labels across restaurant and laptop review domains.
Biomedical Entity Linking (MedMentions)
advancedEntity mention detection and UMLS concept linking for biomedical text based on MedMentions. Annotators identify biomedical entity mentions in PubMed abstracts and link them to UMLS Concept Unique Identifiers (CUIs), supporting large-scale biomedical knowledge base construction and clinical NLP.
Biomedical Named Entity Recognition (JNLPBA)
advancedNamed entity recognition for biomedical text based on the JNLPBA shared task. Annotate entities including proteins, DNA, RNA, cell lines, and cell types following BioNLP community standards.
BioNLP 2011 - Gene Regulation Event Extraction
advancedBiomedical event extraction for gene regulation, based on the BioNLP 2011 Shared Task (Kim et al., ACL Workshop 2011). Annotators identify biological entities and mark regulatory events such as gene expression, transcription, and protein catabolism in scientific abstracts.
Audio Annotation30
DISPLACE 2024 - Speaker and Language Diarization
advancedSpeaker and language diarization in multilingual conversational audio. Annotators mark speaker turn boundaries, identify speakers, and label the language of each segment in conversational environments (Kundu et al., INTERSPEECH 2024).
Sound Event Detection
advancedTemporal sound event annotation with strong labels following DCASE Challenge protocols.
Speaker Diarization
intermediateIdentify and label different speakers in audio recordings with timestamp-based segment annotation.
ToBI Prosodic Annotation
advancedMulti-tier prosodic annotation following the Tones and Break Indices (ToBI) framework. Annotators label pitch accents, phrase accents, boundary tones, and break indices on speech utterances, producing a layered prosodic transcription aligned to the audio timeline (Silverman et al., Speech Communication 1992).
Acoustic Scene Classification
beginnerClassify audio recordings by acoustic environment following the TUT/DCASE dataset format.
Audio Transcription Review
intermediateReview and correct automatic speech recognition transcriptions with waveform visualization.
Audio-Visual Sentiment Analysis
intermediateRate sentiment in speech segments following CMU-MOSI and CMU-MOSEI multimodal annotation protocols.
AudioHate - Audio Hate Speech Detection
intermediateAudio hate speech detection with explanations. Annotators classify audio clips for hate speech presence, identify target groups, and note acoustic indicators such as tone, emphasis, and prosody (Guo et al., SIGDIAL 2024).
Image Annotation40
Breakfast Actions Segmentation
advancedFine-grained temporal action segmentation of breakfast preparation activities. Annotators label sequences of cooking actions like 'take cup', 'pour milk', 'stir'.
EPIC-KITCHENS Egocentric Action Annotation
advancedAnnotate fine-grained actions in egocentric kitchen videos with verb-noun pairs. Identify cooking actions from a first-person perspective.
FineGym Action Segmentation
advancedAnnotate fine-grained gymnastic actions with hierarchical labels. Identify specific elements, sub-actions, and routines in competition videos.
FineSports Fine-grained Action Recognition
advancedFine-grained sports action annotation with hierarchical labels and person tracking. Annotators draw bounding boxes around athletes and label fine-grained actions within a sports action hierarchy.
Harmony4D Human Interaction Tracking
advancedClose-range human interaction tracking and annotation. Annotators track multiple people during close physical interactions (dancing, martial arts, collaborative tasks) with bounding boxes and interaction labels.
How2Sign Sign Language Multi-Tier Annotation
advancedMulti-tier ELAN-style annotation of continuous American Sign Language videos. Annotators segment sign glosses, mark mouthing patterns, classify sign handedness, and provide English translations aligned to video timelines. Based on the How2Sign large-scale multimodal ASL dataset.
MSAD Multi-Scenario Anomaly Detection
intermediateVideo anomaly detection across multiple scenarios. Annotators watch surveillance-style videos and mark temporal segments containing anomalous events, classifying the anomaly type.
ADE20K Semantic Segmentation
advancedComprehensive scene parsing with 150 semantic categories (Zhou et al., CVPR 2017). Annotate indoor and outdoor scenes with pixel-level labels covering objects, parts, and stuff classes.
Video Annotation28
ActivityNet Captions Dense Annotation
advancedDense temporal annotation with natural language descriptions. Annotators segment videos into events and write descriptive captions for each temporal segment.
ActivityNet Temporal Localization
intermediateTemporal activity localization in untrimmed videos. Annotators identify activity instances by marking precise start and end timestamps across 200 activity classes.
AVA Atomic Visual Actions
advancedSpatio-temporal action annotation in movie clips. Annotators localize people with bounding boxes and label their atomic actions (pose, person-object, person-person interactions) in 1-second intervals.
Charades Indoor Activity Segmentation
intermediateMulti-label temporal activity segmentation in indoor home videos. Annotators identify action instances using compositional verb-object labels (e.g., 'opening door', 'sitting on chair') with precise temporal boundaries.
Charades-STA Temporal Grounding
intermediateGround natural language descriptions to video segments. Given a sentence describing an action, identify the exact temporal boundaries where that action occurs.
Clinical TempEval - Temporal Information Extraction from Clinical Notes
advancedExtraction of temporal information from clinical text, identifying time expressions, event mentions, and their temporal relations. Based on SemEval-2016 Task 12 (Clinical TempEval).
DiDeMo Moment Retrieval
intermediateLocalizing natural language descriptions to specific video moments. Given a text query, annotators identify the corresponding temporal segment in the video.
Ego4D: Egocentric Video Episodic Memory Annotation
advancedAnnotate egocentric (first-person) video for episodic memory tasks including activity segmentation, hand state tracking, natural language query generation, and scene narration. Supports temporal segment annotation with multiple label tiers for the Ego4D benchmark.
Comparison Tasks2
Preference Learning25
Interpretable Semantic Textual Similarity
advancedFine-grained semantic similarity assessment between sentence pairs with span alignment, combining chunk-level annotation with graded similarity scoring. Based on SemEval-2016 Task 2.
SaGA Gesture-Speech Alignment Multi-Tier Annotation
advancedMulti-tier ELAN-style annotation of co-speech gestures and their alignment with spoken language. Annotators segment gesture phases and types on parallel timeline tiers, classify handedness and spatial reference frames, and transcribe concurrent speech. Based on the SaGA corpus.
AlpacaEval: Instruction-Following Preference Evaluation
intermediatePairwise preference annotation for instruction-following language models. Annotators compare two model responses side by side, select their preferred response, indicate preference strength, and rate individual response quality across diverse instruction categories.
AlpacaFarm Preference Simulation
intermediateSimulate human preferences for instruction-following responses. Create preference data for efficient RLHF research and LLM evaluation.
Arena Hard Auto - LLM Pairwise Evaluation
intermediatePairwise evaluation of LLM responses on challenging prompts from the Arena Hard benchmark (Li et al., arXiv 2024). Annotators compare two responses on a continuous scale and rate question difficulty.
BeaverTails Safety Preference
advancedAnnotate AI responses for safety across multiple harm categories. Identify unsafe content and rate response quality for building safer AI systems.
Chatbot Arena - Pairwise Comparison with Best-Worst Scaling
intermediatePairwise comparison and best-worst scaling of chatbot responses, based on the Chatbot Arena framework (Zheng et al., ICML 2024). Annotators compare pairs of LLM-generated responses and rank sets of responses using best-worst scaling methodology.
Constitutional AI Harmlessness Evaluation
intermediateEvaluate AI assistant responses for harmlessness and helpfulness based on the Constitutional AI framework by Anthropic. Annotators rate responses on a harmfulness scale, assess helpfulness, and provide explanations for their judgments.
Surveys51
ESA: Error Span Annotation for Machine Translation
advancedError span annotation for machine translation output. Annotators identify error spans in translations, classify error types (accuracy, fluency, terminology, style), and rate severity.
LongEval: Faithfulness Evaluation for Long-form Summarization
advancedFaithfulness evaluation of long-form summaries. Annotators identify atomic content units in summaries, check each against source documents for faithfulness, and rate overall summary quality.
News Headline Emotion Roles (GoodNewsEveryone)
advancedAnnotate emotions in news headlines with semantic roles. Based on Bostan et al., LREC 2020. Identify emotion, experiencer, cause, target, and textual cue.
NLI with Explanations (e-SNLI)
intermediateNatural language inference with human explanations. Based on e-SNLI (Camburu et al., NeurIPS 2018). Classify entailment/contradiction/neutral and provide natural language justifications.
RT-2 - Robotic Action Annotation
advancedRobotic manipulation task evaluation and action segmentation based on RT-2 (Brohan et al., CoRL 2023). Annotators evaluate task success, describe actions, rate execution quality, and segment video into action phases.
Scientific Claim Verification (SciFact)
advancedVerify scientific claims against evidence from research abstracts. Based on SciFact (Wadden et al., EMNLP 2020). Classify claims as supported, refuted, or having insufficient evidence, and identify rationale sentences.
AnnoMI Counselling Dialogue Annotation
advancedAnnotation of motivational interviewing counselling dialogues based on the AnnoMI dataset. Annotators label therapist and client utterances for MI techniques (open questions, reflections, affirmations) and client change talk (sustain talk, change talk), with quality ratings for therapeutic interactions.
Argument Reasoning Comprehension (ARCT)
advancedIdentify implicit warrants in arguments. Based on Habernal et al., NAACL 2018 / SemEval 2018 Task 12. Given a claim and premise, choose the correct warrant that connects them.
Evaluation Tasks30
Code Review Annotation (CodeReviewer)
advancedAnnotation of code review activities based on the CodeReviewer benchmark. Annotators identify issues in code diffs, classify defect types, assign severity levels, make review decisions, and provide natural language review comments, supporting research in automated code review and software engineering.
EA-MT - Entity-Aware Machine Translation
advancedEntity-aware machine translation evaluation requiring annotators to identify entity spans, classify translation errors, and provide corrected translations. Based on SemEval-2025 Task 2.
FAVA: Fine-grained Hallucination Annotations for Faithful Generation
advancedFine-grained hallucination span annotation. Annotators identify hallucinated spans in LLM output and classify hallucination types (entity error, relation error, contradicted, invented, subjective, unverifiable). Based on the FAVA framework for fine-grained faithfulness evaluation.
MathDial - Tutoring Dialogue Quality Annotation
intermediateAnnotate math tutoring dialogues for guidance correctness, tutoring strategies, and key concepts, based on the MathDial dataset (Macina et al., Findings ACL 2023). Supports evaluation of AI-generated tutoring interactions for K-12 math problems.
#HashtagWars - Learning a Sense of Humor
beginnerHumor ranking of tweets submitted to Comedy Central's @midnight #HashtagWars, classifying comedic quality. Based on SemEval-2017 Task 6.
ArgSciChat Scientific Argumentation Dialogue
intermediateAnnotation of argumentative dialogues about scientific papers based on the ArgSciChat dataset. Annotators label dialogue turns for argument components (claim, evidence, rebuttal) and assess argument quality dimensions such as clarity, relevance, and persuasiveness.
Argument Quality Assessment
intermediateMulti-dimensional argument quality annotation based on the Wachsmuth et al. (2017) taxonomy. Rates arguments on three dimensions: Cogency (logical validity), Effectiveness (persuasive power), and Reasonableness (contribution to resolution). Used in Dagstuhl-ArgQuality and GAQCorpus datasets.
Bias Benchmark for QA (BBQ)
intermediateAnnotate question-answering examples designed to probe social biases. Based on BBQ (Parrish et al., Findings of ACL 2022). Annotators select the correct answer given a context, assess the direction of bias in the question, categorize the type of bias, and explain their reasoning.
Have a design to share?
Contribute your annotation configurations to help the community.