Argument Quality Assessment

Multi-dimensional argument quality annotation based on the Wachsmuth et al. (2017) taxonomy. Rates arguments on three dimensions: Cogency (logical validity), Effectiveness (persuasive power), and Reasonableness (contribution to resolution). Used in Dagstuhl-ArgQuality and GAQCorpus datasets.

Configuration Fileconfig.yaml

# Argument Quality Assessment
# Based on Wachsmuth et al., 2017 taxonomy
# Paper: https://aclanthology.org/2020.coling-main.402/
#
# Three main quality dimensions:
#
# 1. COGENCY (Logic) - Is the reasoning valid?
#    - Local Acceptability: Are premises believable?
#    - Local Relevance: Do premises support the conclusion?
#    - Local Sufficiency: Is there enough support?
#
# 2. EFFECTIVENESS (Rhetoric) - Is it persuasive?
#    - Credibility: Does the author seem trustworthy?
#    - Emotional Appeal: Does it engage emotions appropriately?
#    - Clarity: Is the argument easy to understand?
#
# 3. REASONABLENESS (Dialectic) - Does it contribute to resolution?
#    - Global Acceptability: Are claims defensible?
#    - Global Relevance: Does it address the issue?
#    - Global Sufficiency: Does it adequately resolve the issue?
#
# Note: Correlations with overall quality - Cogency (.84), Effectiveness (.81), Reasonableness (.86)

annotation_task_name: "Argument Quality Assessment"
task_dir: "."

data_files:
  - sample-data.json
item_properties:
  id_key: "id"
  text_key: "argument"

context_key: topic

output_annotation_dir: "annotation_output/"
output_annotation_format: "json"

annotation_schemes:
  # Overall Quality Rating
  - annotation_type: radio
    name: overall_quality
    description: "Rate the OVERALL quality of this argument (1=very poor, 5=excellent)"
    labels:
      - "1 - Very Poor"
      - "2 - Poor"
      - "3 - Average"
      - "4 - Good"
      - "5 - Excellent"
    keyboard_shortcuts:
      "1 - Very Poor": "1"
      "2 - Poor": "2"
      "3 - Average": "3"
      "4 - Good": "4"
      "5 - Excellent": "5"
    tooltips:
      "1 - Very Poor": "Fails on most quality dimensions; not convincing at all"
      "2 - Poor": "Significant weaknesses in logic, persuasion, or relevance"
      "3 - Average": "Adequate argument with some strengths and weaknesses"
      "4 - Good": "Strong argument with minor issues"
      "5 - Excellent": "Highly convincing, well-reasoned, and relevant"

  # Dimension 1: Cogency (Logic)
  - annotation_type: radio
    name: cogency
    description: "COGENCY: Is the argument logically valid? Are the premises acceptable, relevant, and sufficient?"
    labels:
      - "1 - Not cogent"
      - "2 - Slightly cogent"
      - "3 - Moderately cogent"
      - "4 - Quite cogent"
      - "5 - Very cogent"
    keyboard_shortcuts:
      "1 - Not cogent": "q"
      "2 - Slightly cogent": "w"
      "3 - Moderately cogent": "e"
      "4 - Quite cogent": "r"
      "5 - Very cogent": "t"
    tooltips:
      "1 - Not cogent": "Premises are false, irrelevant, or completely insufficient"
      "2 - Slightly cogent": "Major logical flaws; premises barely support conclusion"
      "3 - Moderately cogent": "Some logical issues but reasoning is followable"
      "4 - Quite cogent": "Sound reasoning with minor gaps"
      "5 - Very cogent": "Excellent logic; premises clearly and fully support conclusion"

  # Dimension 2: Effectiveness (Rhetoric)
  - annotation_type: radio
    name: effectiveness
    description: "EFFECTIVENESS: Is the argument persuasive? Does it establish credibility and engage the audience?"
    labels:
      - "1 - Not effective"
      - "2 - Slightly effective"
      - "3 - Moderately effective"
      - "4 - Quite effective"
      - "5 - Very effective"
    keyboard_shortcuts:
      "1 - Not effective": "a"
      "2 - Slightly effective": "s"
      "3 - Moderately effective": "d"
      "4 - Quite effective": "f"
      "5 - Very effective": "g"
    tooltips:
      "1 - Not effective": "Unpersuasive; poor clarity, no credibility, inappropriate tone"
      "2 - Slightly effective": "Weak persuasive appeal; hard to follow or unconvincing"
      "3 - Moderately effective": "Somewhat persuasive but could be more compelling"
      "4 - Quite effective": "Persuasive with good clarity and appropriate appeal"
      "5 - Very effective": "Highly persuasive; clear, credible, and engaging"

  # Dimension 3: Reasonableness (Dialectic)
  - annotation_type: radio
    name: reasonableness
    description: "REASONABLENESS: Does the argument contribute to resolving the issue? Is it globally relevant and acceptable?"
    labels:
      - "1 - Not reasonable"
      - "2 - Slightly reasonable"
      - "3 - Moderately reasonable"
      - "4 - Quite reasonable"
      - "5 - Very reasonable"
    keyboard_shortcuts:
      "1 - Not reasonable": "z"
      "2 - Slightly reasonable": "x"
      "3 - Moderately reasonable": "c"
      "4 - Quite reasonable": "v"
      "5 - Very reasonable": "b"
    tooltips:
      "1 - Not reasonable": "Does not address the issue; claims are indefensible"
      "2 - Slightly reasonable": "Tangential to the issue; weak contribution"
      "3 - Moderately reasonable": "Addresses the issue but doesn't fully resolve it"
      "4 - Quite reasonable": "Relevant contribution that advances the discussion"
      "5 - Very reasonable": "Directly addresses and substantially resolves the issue"

  # Specific quality issues (optional detailed feedback)
  - annotation_type: multiselect
    name: quality_issues
    description: "Select any specific quality issues present in this argument"
    labels:
      - "Factual errors"
      - "Logical fallacy"
      - "Missing evidence"
      - "Unclear reasoning"
      - "Off-topic"
      - "Ad hominem attack"
      - "Emotional manipulation"
      - "Overgeneralization"
      - "No issues detected"
    tooltips:
      "Factual errors": "Contains false or unverifiable claims"
      "Logical fallacy": "Contains identifiable reasoning error (e.g., straw man, false dichotomy)"
      "Missing evidence": "Makes claims without supporting evidence"
      "Unclear reasoning": "Hard to follow the logical flow"
      "Off-topic": "Does not address the actual topic/question"
      "Ad hominem attack": "Attacks person rather than argument"
      "Emotional manipulation": "Uses fear, anger, or other emotions inappropriately"
      "Overgeneralization": "Makes sweeping claims from limited evidence"
      "No issues detected": "No significant quality issues identified"

allow_all_users: true
instances_per_annotator: 100
annotation_per_instance: 3
allow_skip: true
skip_reason_required: false

Sample Datasample-data.json

[
  {
    "id": "arg_001",
    "topic": "Should college education be free for all students?",
    "argument": "College education should be free because education is a right, not a privilege. Countries like Germany and Norway offer free university education and have thriving economies. Making college free would reduce student debt, increase social mobility, and create a more educated workforce that benefits everyone."
  },
  {
    "id": "arg_002",
    "topic": "Should college education be free for all students?",
    "argument": "Free college is a terrible idea. Nothing in life is free - someone has to pay for it. Why should hardworking taxpayers subsidize people who want to study useless degrees? These students are just lazy and want handouts."
  }
]

// ... and 8 more items

Get This Design

View on GitHub

Clone or download from the repository

Quick start:

git clone https://github.com/davidjurgens/potato-showcase.git
cd potato-showcase/text/argumentation-stance/argument-quality
potato start config.yaml

Details

Annotation Types

multiselectradio

Domain

NLPArgumentation

Use Cases

Argument MiningDebate AnalysisWriting Evaluation

Related Designs

ArgSciChat Scientific Argumentation Dialogue

Annotation of argumentative dialogues about scientific papers based on the ArgSciChat dataset. Annotators label dialogue turns for argument components (claim, evidence, rebuttal) and assess argument quality dimensions such as clarity, relevance, and persuasiveness.

multiselectradio

ValueEval - Human Values behind Arguments

Identification of human values expressed in arguments, classifying which of the Schwartz basic values an argument appeals to and whether it attains or constrains those values. Based on SemEval-2023 Task 4 (Kiesel et al.).

multiselectradio

DiaSafety Dialogue Safety Annotation

Safety taxonomy annotation for dialogue systems based on the DiaSafety framework. Annotators classify dialogue turns for safety issues across 6 categories: offending user, risk ignorance, unauthorized expertise, toxicity generation, bias, and privacy violations.

radiomultiselect

Argument Quality Assessment

Configuration Fileconfig.yaml

Sample Datasample-data.json

Get This Design

Details

Annotation Types

Domain

Use Cases

Tags

Related Designs

ArgSciChat Scientific Argumentation Dialogue

ValueEval - Human Values behind Arguments

DiaSafety Dialogue Safety Annotation