Conversation Quality Attributes

Dialogue quality assessment based on controllable dialogue generation research (See et al., NAACL 2019). Annotators evaluate conversation turns for engagement quality, rate overall conversation quality, and identify specific dialogue attributes.

Configuration Fileconfig.yaml

This Potato config reproduces the annotation task. Save it as config.yaml and run potato start config.yaml to try it.

yaml

# Conversation Quality Attributes
# Based on See et al., NAACL 2019
# Paper: https://aclanthology.org/N19-1170/
# Dataset: https://parl.ai/projects/controllable_dialogue/
#
# This task evaluates the quality of dialogue turns by assessing engagement,
# overall quality, and specific conversation attributes. Based on research
# into controllable dialogue generation and what makes conversations engaging.
#
# Engagement Labels:
# - Engaging: The conversation holds attention and invites further interaction
# - Boring: The conversation is dull, repetitive, or uninteresting
# - Confusing: The conversation is difficult to follow or understand
# - Offensive: The conversation contains inappropriate or harmful content
#
# Conversation Attributes:
# - Interesting: Contains novel or intriguing content
# - Repetitive: Repeats the same ideas or phrases
# - Coherent: Logically connected and easy to follow
# - Informative: Provides useful information or knowledge
# - Empathetic: Shows understanding of emotions and feelings
# - Generic: Uses vague, non-specific, or boilerplate responses
#
# Annotation Guidelines:
# 1. Read the dialogue turn carefully
# 2. Note the speaker context
# 3. Classify the engagement level
# 4. Rate overall conversation quality
# 5. Select all applicable conversation attributes

annotation_task_name: "Conversation Quality Attributes"
task_dir: "."

data_files:
  - sample-data.json

item_properties:
  id_key: "id"
  text_key: "text"

output_annotation_dir: "annotation_output/"
output_annotation_format: "json"

port: 8000
server_name: localhost

annotation_schemes:
  - annotation_type: radio
    name: engagement
    description: "How engaging is this conversation turn?"
    labels:
      - "Engaging"
      - "Boring"
      - "Confusing"
      - "Offensive"
    keyboard_shortcuts:
      "Engaging": "1"
      "Boring": "2"
      "Confusing": "3"
      "Offensive": "4"
    tooltips:
      "Engaging": "The conversation holds attention and invites further interaction"
      "Boring": "The conversation is dull, repetitive, or uninteresting"
      "Confusing": "The conversation is difficult to follow or understand"
      "Offensive": "The conversation contains inappropriate or harmful content"

  - annotation_type: likert
    name: overall_quality
    description: "Rate the overall quality of this conversation turn"
    min_label: "Very Poor"
    max_label: "Excellent"
    size: 5

  - annotation_type: multiselect
    name: conversation_attributes
    description: "Which attributes describe this conversation turn?"
    labels:
      - "Interesting"
      - "Repetitive"
      - "Coherent"
      - "Informative"
      - "Empathetic"
      - "Generic"
    tooltips:
      "Interesting": "Contains novel, intriguing, or thought-provoking content"
      "Repetitive": "Repeats the same ideas, phrases, or patterns"
      "Coherent": "Logically connected and easy to follow in the conversation context"
      "Informative": "Provides useful information, facts, or knowledge"
      "Empathetic": "Shows understanding of or sensitivity to emotions and feelings"
      "Generic": "Uses vague, non-specific, or boilerplate responses"

annotation_instructions: |
  You will be shown a dialogue turn from a conversation, along with the speaker.
  1. Read the dialogue turn carefully.
  2. Classify the engagement level: Engaging, Boring, Confusing, or Offensive.
  3. Rate the overall quality on a 5-point scale from Very Poor to Excellent.
  4. Select all attributes that describe this conversation turn.

html_layout: |
  <div style="padding: 15px; max-width: 800px; margin: auto;">
    <div style="background: #fef3c7; border: 1px solid #fde68a; border-radius: 8px; padding: 8px 12px; margin-bottom: 12px; display: inline-block;">
      <strong style="color: #92400e;">Speaker:</strong>
      <span style="color: #78350f;">{{speaker}}</span>
    </div>
    <div style="background: #f0f9ff; border: 1px solid #bae6fd; border-radius: 8px; padding: 16px; margin-bottom: 16px;">
      <strong style="color: #0369a1;">Dialogue:</strong>
      <p style="font-size: 16px; line-height: 1.7; margin: 8px 0 0 0; white-space: pre-line;">{{text}}</p>
    </div>
  </div>

allow_all_users: true
instances_per_annotator: 50
annotation_per_instance: 2
allow_skip: true
skip_reason_required: false

Sample Datasample-data.json

json

[
  {
    "id": "convqa_001",
    "text": "User: I just got back from a trip to Japan. It was amazing!\nBot: That sounds wonderful! What was the highlight of your trip? I have always been fascinated by Japanese culture, especially the blend of ancient traditions and modern technology.",
    "speaker": "Chatbot"
  },
  {
    "id": "convqa_002",
    "text": "User: I am feeling really stressed about my exams next week.\nBot: I understand that exams can be stressful. Have you tried breaking your study sessions into shorter blocks with breaks in between? Many students find the Pomodoro technique helpful for managing study anxiety.",
    "speaker": "Chatbot"
  }
]

// ... and 8 more items

Get This Design

View on GitHub

Clone or download from the repository

Quick start:

git clone https://github.com/davidjurgens/potato-showcase.git
cd potato-showcase/text/dialogue/conversation-quality-attributes
potato start config.yaml

Dataset & paper

See et al., NAACL 2019

Official dataset ↗Read the paper ↗

Citation (BibTeX)

bibtex

@inproceedings{see-etal-2019-makes,
    title = "What makes a good conversation? How controllable attributes affect human judgments",
    author = "See, Abigail and Roller, Stephen and Kiela, Douwe and Weston, Jason",
    booktitle = "Proceedings of the 2019 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies",
    year = "2019",
    url = "https://aclanthology.org/N19-1170",
    pages = "1702--1723"
}

Details

Annotation Types

radiolikertmultiselect

Domain

NLPDialogue

Use Cases

Dialogue EvaluationChatbot AssessmentConversational AI

Related Designs

AnnoMI Counselling Dialogue Annotation

Annotation of motivational interviewing counselling dialogues based on the AnnoMI dataset. Annotators label therapist and client utterances for MI techniques (open questions, reflections, affirmations) and client change talk (sustain talk, change talk), with quality ratings for therapeutic interactions.

radiomultiselect

Clickbait Detection (Webis Clickbait Corpus)

Classify headlines and social media posts as clickbait or non-clickbait based on the Webis Clickbait Corpus. Identify manipulative content designed to attract clicks through sensationalism, curiosity gaps, or misleading framing.

likertmultiselect

Deceptive Review Detection

Distinguish between truthful and deceptive (fake) reviews. Based on Ott et al., ACL 2011. Identify fake reviews written to deceive vs genuine customer experiences.