Machine-Generated Text Detection

Machine-generated text detection task requiring annotators to classify whether a given text was written by a human, generated by a machine, contains mixed authorship, or is uncertain. Based on SemEval-2024 Task 8.

設定ファイルconfig.yaml

# Machine-Generated Text Detection
# Based on Wang et al., SemEval 2024
# Paper: https://aclanthology.org/volumes/2024.semeval-1/
# Dataset: https://github.com/SemEval/semeval-2024-task8
#
# This task asks annotators to determine whether a text was written by
# a human, generated by a machine (LLM), contains mixed human-machine
# authorship, or if the origin is uncertain.

annotation_task_name: "Machine-Generated Text Detection"
task_dir: "."

data_files:
  - sample-data.json

item_properties:
  id_key: "id"
  text_key: "text"

output_annotation_dir: "annotation_output/"
output_annotation_format: "json"

port: 8000
server_name: localhost

annotation_schemes:
  - annotation_type: radio
    name: authorship_judgment
    description: "Who wrote this text?"
    labels:
      - "Human-Written"
      - "Machine-Generated"
      - "Mixed"
      - "Uncertain"
    keyboard_shortcuts:
      "Human-Written": "1"
      "Machine-Generated": "2"
      "Mixed": "3"
      "Uncertain": "4"
    tooltips:
      "Human-Written": "The text appears to be entirely written by a human author"
      "Machine-Generated": "The text appears to be entirely generated by an AI/LLM"
      "Mixed": "The text appears to contain both human-written and machine-generated portions"
      "Uncertain": "You cannot determine the authorship with reasonable confidence"

annotation_instructions: |
  You will be shown a text passage. Your task is to determine whether it was:
  1. Written entirely by a human
  2. Generated entirely by an AI language model
  3. A mixture of human and machine writing
  4. Uncertain if you cannot determine the origin
  Consider factors like fluency, coherence, specificity, style, and any telltale patterns.

html_layout: |
  <div style="padding: 15px; max-width: 800px; margin: auto;">
    <div style="background: #f0f9ff; border: 1px solid #bae6fd; border-radius: 8px; padding: 16px; margin-bottom: 16px;">
      <strong style="color: #0369a1;">Text:</strong>
      <p style="font-size: 16px; line-height: 1.7; margin: 8px 0 0 0;">{{text}}</p>
    </div>
    <div style="background: #f0fdf4; border: 1px solid #bbf7d0; border-radius: 8px; padding: 12px;">
      <strong style="color: #166534;">Domain:</strong> <span>{{domain}}</span>
    </div>
  </div>

allow_all_users: true
instances_per_annotator: 50
annotation_per_instance: 2
allow_skip: true
skip_reason_required: false

サンプルデータsample-data.json

[
  {
    "id": "mgt_001",
    "text": "The old barn stood at the edge of the property, its red paint peeling in long strips that curled like dried leaves. My grandfather built it in 1952 with lumber from the Hendersons' sawmill, back when a handshake still meant something. Every summer, I'd climb to the hayloft and read comics until my mother hollered that supper was ready.",
    "domain": "Creative Writing"
  },
  {
    "id": "mgt_002",
    "text": "Photosynthesis is a fundamental biological process through which plants, algae, and certain bacteria convert light energy into chemical energy. This process involves the absorption of carbon dioxide and water, which are then transformed into glucose and oxygen through a series of complex biochemical reactions. The light-dependent reactions occur in the thylakoid membranes, while the Calvin cycle takes place in the stroma of the chloroplast.",
    "domain": "Science"
  }
]

// ... and 8 more items

このデザインを取得

View on GitHub

Clone or download from the repository

クイックスタート：

git clone https://github.com/davidjurgens/potato-showcase.git
cd potato-showcase/semeval/2024/task08-machine-generated-text
potato start config.yaml

詳細

アノテーションタイプ

radio

ドメイン

SemEvalNLPAI DetectionText Classification

ユースケース

AI Text DetectionAuthorship AttributionContent Authenticity

Machine-Generated Text Detection

設定ファイルconfig.yaml

サンプルデータsample-data.json

このデザインを取得

詳細

アノテーションタイプ

ドメイン

ユースケース

タグ

関連デザイン

LLMs4Subjects - Automated Subject Tagging

Suggestion Mining from Online Reviews and Forums

ADMIRE - Multimodal Idiomaticity Recognition