V-WSD - Visual Word Sense Disambiguation
Visual word sense disambiguation task where annotators select the most appropriate image for a target word given its textual context. Based on SemEval-2023 Task 1 (Raganato et al.).
Archivo de configuraciónconfig.yaml
# V-WSD - Visual Word Sense Disambiguation
# Based on Raganato et al., SemEval 2023
# Paper: https://aclanthology.org/2023.semeval-1.1/
# Dataset: https://raganato.github.io/vwsd/
#
# Given a target word and a context sentence, annotators select the image
# (from a set of 10 candidates) that best represents the intended meaning
# of the target word in that context. This tests visual grounding of
# word senses across modalities.
#
# Annotation Guidelines:
# 1. Read the context sentence and identify the target word
# 2. Consider the meaning of the target word in the given context
# 3. Review the descriptions of all 10 candidate images
# 4. Select the image that best matches the intended word sense
# 5. If multiple images seem relevant, choose the most specific match
annotation_task_name: "V-WSD - Visual Word Sense Disambiguation"
task_dir: "."
data_files:
- sample-data.json
item_properties:
id_key: "id"
text_key: "text"
output_annotation_dir: "annotation_output/"
output_annotation_format: "json"
port: 8000
server_name: localhost
annotation_schemes:
- annotation_type: radio
name: image_selection
description: "Which image best represents the meaning of the target word in the given context?"
labels:
- "Image 1"
- "Image 2"
- "Image 3"
- "Image 4"
- "Image 5"
- "Image 6"
- "Image 7"
- "Image 8"
- "Image 9"
- "Image 10"
keyboard_shortcuts:
"Image 1": "1"
"Image 2": "2"
"Image 3": "3"
"Image 4": "4"
"Image 5": "5"
"Image 6": "6"
"Image 7": "7"
"Image 8": "8"
"Image 9": "9"
"Image 10": "0"
tooltips:
"Image 1": "Select if Image 1 best represents the target word meaning"
"Image 2": "Select if Image 2 best represents the target word meaning"
"Image 3": "Select if Image 3 best represents the target word meaning"
"Image 4": "Select if Image 4 best represents the target word meaning"
"Image 5": "Select if Image 5 best represents the target word meaning"
"Image 6": "Select if Image 6 best represents the target word meaning"
"Image 7": "Select if Image 7 best represents the target word meaning"
"Image 8": "Select if Image 8 best represents the target word meaning"
"Image 9": "Select if Image 9 best represents the target word meaning"
"Image 10": "Select if Image 10 best represents the target word meaning"
annotation_instructions: |
You will see a context sentence with a highlighted target word, along with descriptions of 10 candidate images.
Your task is to select the image that best represents the meaning of the target word as used in the given context.
Pay attention to the specific sense of the word -- many words have multiple meanings, and the context determines which sense is intended.
html_layout: |
<div style="padding: 15px; max-width: 800px; margin: auto;">
<div style="background: #fefce8; border: 1px solid #fde68a; border-radius: 8px; padding: 16px; margin-bottom: 16px;">
<strong style="color: #a16207;">Target Word:</strong>
<span style="font-size: 18px; font-weight: bold; color: #854d0e;">{{target_word}}</span>
</div>
<div style="background: #f0f9ff; border: 1px solid #bae6fd; border-radius: 8px; padding: 16px; margin-bottom: 16px;">
<strong style="color: #0369a1;">Context Sentence:</strong>
<p style="font-size: 16px; line-height: 1.7; margin: 8px 0 0 0;">{{text}}</p>
</div>
<div style="background: #f8fafc; border: 1px solid #e2e8f0; border-radius: 8px; padding: 16px;">
<strong style="color: #475569;">Candidate Image Descriptions:</strong>
<p style="font-size: 14px; line-height: 1.8; margin: 8px 0 0 0; white-space: pre-line;">{{image_descriptions}}</p>
</div>
</div>
allow_all_users: true
instances_per_annotator: 50
annotation_per_instance: 2
allow_skip: true
skip_reason_required: false
Datos de ejemplosample-data.json
[
{
"id": "vwsd_001",
"text": "The bank was eroding rapidly due to the heavy rains last spring.",
"target_word": "bank",
"image_descriptions": "Image 1: A large commercial bank building with glass windows\nImage 2: A muddy riverbank with exposed tree roots and flowing water\nImage 3: A piggy bank sitting on a wooden table\nImage 4: A row of ATM machines outside a building\nImage 5: An eroded hillside near a river with sediment flowing downstream\nImage 6: A bank vault with a heavy steel door\nImage 7: A sandy beach along a lakeshore\nImage 8: A person depositing money at a bank counter\nImage 9: A steep cliff along a stream with visible erosion marks\nImage 10: An online banking application on a smartphone screen"
},
{
"id": "vwsd_002",
"text": "She played a beautiful piece on the organ at the church service.",
"target_word": "organ",
"image_descriptions": "Image 1: A detailed anatomical diagram of a human heart\nImage 2: A large pipe organ inside a Gothic cathedral\nImage 3: A kidney shown in a medical textbook illustration\nImage 4: A small electronic keyboard on a stand\nImage 5: A church interior with wooden pews and stained glass\nImage 6: A grand pipe organ with ornate gold pipes in a concert hall\nImage 7: A diagram of the human digestive system\nImage 8: A musician playing a Hammond organ on stage\nImage 9: An organ donor card next to a stethoscope\nImage 10: A street musician with a barrel organ"
}
]
// ... and 8 more itemsObtener este diseño
Clone or download from the repository
Inicio rápido:
git clone https://github.com/davidjurgens/potato-showcase.git cd potato-showcase/semeval/2023/task01-visual-wsd potato start config.yaml
Detalles
Tipos de anotación
Dominio
Casos de uso
Etiquetas
¿Encontró un problema o desea mejorar este diseño?
Abrir un issueDiseños relacionados
ADMIRE - Multimodal Idiomaticity Recognition
Multimodal idiomaticity detection task requiring annotators to identify whether expressions are used idiomatically or literally, with supporting cue analysis. Based on SemEval-2025 Task 1 (ADMIRE).
MAMI - Multimedia Automatic Misogyny Identification
Detection and fine-grained classification of misogynistic content in memes, combining text and image description analysis. Sub-types include stereotyping, shaming, objectification, and violence. Based on SemEval-2022 Task 5 (Fersini et al.).
AfriSenti - African Language Sentiment
Sentiment analysis for tweets in African languages, classifying text as positive, negative, or neutral. Covers 14 African languages including Amharic, Hausa, Igbo, Yoruba, and Swahili. Based on SemEval-2023 Task 12 (Muhammad et al.).