오류 스팬

Potato에서 MQM 스타일의 오류 어노테이션 인터페이스를 구축하여 번역 품질 평가, 텍스트 오류 표시, 심각도 점수가 있는 유형별 오류 스팬 어노테이션을 수행합니다.

오류 스팬 어노테이션 스키마는 유형별 범주와 심각도 수준으로 텍스트의 오류를 표시하는 MQM 스타일(Multidimensional Quality Metrics) 인터페이스를 제공합니다. 이 스키마는 번역 품질 평가, 텍스트 편집 검토, 콘텐츠 품질 평가, 그리고 세밀한 오류 어노테이션이 필요한 모든 작업에 적합합니다.

심각도 점수와 함께 번역 오류를 표시하는 Potato 오류 구간 인터페이스 Potato의 오류 구간

개요

오류 스팬 스키마는 다음을 제공합니다.

상세 분류를 위한 선택적 하위 유형이 있는 유형별 오류 범주
구성 가능한 점수 차감이 있는 심각도 수준
오류를 표시할수록 낮아지는 실시간 품질 점수
오류 유형과 심각도를 시각적으로 구분하는 색상으로 구분된 스팬

어노테이터가 텍스트 스팬을 선택하고 오류 유형과 심각도를 지정하면, 시스템이 자동으로 품질 점수를 계산합니다.

빠른 시작

yaml

annotation_schemes:
  - annotation_type: error_span
    name: translation_errors
    description: Mark all errors in the translation below.
    error_types:
      - name: Accuracy
      - name: Fluency
      - name: Terminology
    show_score: true
    max_score: 100

구성 옵션

필드	유형	기본값	설명
`annotation_type`	string	필수	반드시 `"error_span"`이어야 함
`name`	string	필수	이 스키마의 고유 식별자
`description`	string	필수	어노테이터에게 표시되는 안내문
`error_types`	array	필수	오류 유형 객체의 목록으로, 각각 `name`과 선택적 `subtypes` 배열을 가짐
`severities`	array	`[{name: "Minor", weight: -1}, {name: "Major", weight: -5}, {name: "Critical", weight: -10}]`	`name`과 `weight`(점수 차감)를 가진 심각도 수준의 목록
`show_score`	boolean	`true`	실시간 품질 점수를 표시
`max_score`	integer	`100`	차감 전 시작 품질 점수

예시

번역 품질 (MQM)

yaml

annotation_schemes:
  - annotation_type: error_span
    name: mqm_errors
    description: >
      Mark all errors in the machine translation.
      Select the error span, choose a category and severity.
    error_types:
      - name: Accuracy
        subtypes:
          - Mistranslation
          - Addition
          - Omission
          - Untranslated
      - name: Fluency
        subtypes:
          - Grammar
          - Spelling
          - Punctuation
          - Register
      - name: Terminology
        subtypes:
          - Inconsistent
          - Wrong Term
      - name: Style
    severities:
      - name: Minor
        weight: -1
      - name: Major
        weight: -5
      - name: Critical
        weight: -10
    show_score: true
    max_score: 100

콘텐츠 편집 검토

yaml

annotation_schemes:
  - annotation_type: error_span
    name: editing_errors
    description: Mark all issues that need editing in this article.
    error_types:
      - name: Factual Error
      - name: Grammar
        subtypes:
          - Subject-Verb Agreement
          - Tense
          - Pronoun Reference
      - name: Style
        subtypes:
          - Wordiness
          - Passive Voice
          - Jargon
      - name: Formatting
    severities:
      - name: Suggestion
        weight: -1
      - name: Required Fix
        weight: -5
    show_score: false

코드 리뷰 어노테이션

yaml

annotation_schemes:
  - annotation_type: error_span
    name: code_errors
    description: Mark issues in this code snippet.
    error_types:
      - name: Bug
        subtypes:
          - Logic Error
          - Off-by-One
          - Null Reference
      - name: Style
        subtypes:
          - Naming
          - Formatting
      - name: Security
        subtypes:
          - Injection
          - Exposure
      - name: Performance
    severities:
      - name: Nitpick
        weight: -1
      - name: Warning
        weight: -3
      - name: Blocker
        weight: -10
    max_score: 100
    show_score: true

출력 형식

json

{
  "translation_errors": {
    "labels": {
      "errors": [
        {
          "start": 12,
          "end": 25,
          "text": "incorrectly translated",
          "error_type": "Accuracy",
          "subtype": "Mistranslation",
          "severity": "Major"
        },
        {
          "start": 45,
          "end": 52,
          "text": "the the",
          "error_type": "Fluency",
          "subtype": "Grammar",
          "severity": "Minor"
        }
      ],
      "score": 94
    }
  }
}

점수는 max_score에 모든 심각도 가중치의 합을 더하여 계산됩니다.

모범 사례

오류 유형의 경계를 명확히 정의하세요 - 어노테이터가 두 오류 유형 중 어느 것을 고를지 고민하지 않도록 하고, 설명에 예시를 제공하세요
세분화를 위해 하위 유형을 사용하세요 - 상위 유형은 인터페이스를 단순하게 유지하고, 하위 유형은 필요할 때 상세한 분석을 가능하게 합니다
심각도 가중치를 신중하게 보정하세요 - 가중치 비율은 실제 영향을 반영해야 하며, 치명적 오류는 사소한 오류보다 의미 있게 더 큰 비용이 들어야 합니다
텍스트 길이에 맞춰 max_score를 설정하세요 - 짧은 텍스트에서는 낮은 max_score가 하나의 오류로 과도한 영향이 생기는 것을 막아줍니다
어노테이션 지침을 제공하세요 - MQM 스타일 어노테이션은 각 오류 유형과 심각도의 예시가 담긴 상세한 지침에서 큰 도움을 받습니다

더 읽어보기

스팬 어노테이션 - 일반적인 스팬 레이블링
스팬 링킹 - 관련 스팬 연결
품질 관리 - 주의 점검 및 골드 스탠다드

구현 세부 사항은 원본 문서를 참조하세요.