このページはまだお使いの言語に翻訳されていません。英語版を表示しています。

Solo Mode

एक 12-phase intelligent workflow के माध्यम से LLM के साथ collaborate करने वाले single annotator के साथ पूरे datasets को label करें।

Solo Mode

v2.3.0 में नया

पारंपरिक annotation projects के लिए कई annotators, inter-annotator agreement computation, adjudication rounds, और महत्वपूर्ण coordination overhead की आवश्यकता होती है। कई research teams के लिए, यह primary bottleneck है: annotation interface नहीं, बल्कि एक team को hire, train, और manage करने की logistics।

Solo Mode multi-annotator paradigm को एक single human expert के साथ एक LLM के साथ collaborate करने से replace करता है। Human एक small, strategically selected subset पर high-quality labels प्रदान करता है। LLM उन labels से सीखता है, बाकी के लिए labels propose करता है, और human केवल उन cases की समीक्षा करता है जहाँ LLM अनिश्चित है या गलत होने की संभावना है। एक 12-phase workflow इस process को automatically orchestrate करता है।

Internal benchmarks में, Solo Mode ने पूर्ण multi-annotator pipelines के साथ 95%+ agreement हासिल किया जबकि केवल 10-15% कुल human labels की आवश्यकता थी।

12-Phase Workflow

Solo Mode 12 phases के माध्यम से progress करता है। System configurable thresholds के आधार पर automatically advance करता है, हालाँकि आप admin dashboard से manually transitions trigger भी कर सकते हैं।

Phase 1: Seed Annotation

Human annotator एक initial seed set label करता है। Potato data distribution की maximum coverage के लिए embedding-based clustering का उपयोग करके diverse, representative instances चुनता है।

Default seed size: 50 instances (seed_count के माध्यम से configurable)

Phase 2: Initial LLM Calibration

LLM seed annotations को few-shot examples के रूप में प्राप्त करता है और एक calibration batch label करता है। Potato baseline accuracy स्थापित करने के लिए held-out seed labels के विरुद्ध LLM predictions की तुलना करता है।

Phase 3: Confusion Analysis

Potato human और LLM के बीच systematic disagreement patterns की पहचान करता है। यह एक confusion matrix बनाता है और सबसे सामान्य error types surface करता है (जैसे, "LLM 40% समय neutral को positive label करता है")।

Phase 4: Guideline Refinement

Confusion analysis के आधार पर, Potato LLM के लिए refined annotation guidelines generate करता है। Human लागू होने से पहले इन guidelines की समीक्षा और edit करता है। यह एक interactive step है जहाँ annotator examples जोड़ सकता है, edge cases clarify कर सकता है, और label definitions समायोजित कर सकता है।

Phase 5: Labeling Function Generation

ALCHEmist framework से inspired होकर, Potato मौजूदा annotations से programmatic labeling functions generate करता है। ये simple pattern-based rules हैं (जैसे, "यदि text में 'excellent' है और कोई negation नहीं है, positive label करें") जो आसान instances को high precision के साथ label कर सकती हैं, harder cases के लिए human और LLM effort reserve करते हुए।

Phase 6: Active Labeling

Human active learning द्वारा selected additional instances label करता है। Potato उन instances को प्राथमिकता देता है जहाँ LLM सबसे अधिक uncertain है, जहाँ labeling functions असहमत हैं, या जहाँ instance embedding space में existing training examples से दूर है।

Phase 7: Automated Refinement Loop

LLM updated guidelines और few-shot examples के साथ पूरे dataset को re-label करता है। Potato सभी human labels के विरुद्ध compare करता है और यदि accuracy threshold से नीचे हो तो confusion analysis और guideline refinement का एक और cycle trigger करता है।

Phase 8: Disagreement Exploration

Human उन सभी instances की समीक्षा करता है जहाँ LLM और labeling functions असहमत हैं। ये typically सबसे अधिक informative और difficult examples हैं। इन cases पर human के labels उच्चतम marginal value प्रदान करते हैं।

Phase 9: Edge Case Synthesis

Potato identified confusion patterns के आधार पर synthetic edge cases generate करने के लिए LLM का उपयोग करता है। Human इन synthetic examples को label करता है, जो फिर hardest cases पर performance सुधारने के लिए LLM के training context में जोड़े जाते हैं।

Phase 10: Cascaded Confidence Escalation

LLM हर शेष unlabeled instance को confidence scores assign करता है। Instances को difficulty के descending order में (ascending confidence में) human के पास escalate किया जाता है। Human तब तक label करता है जब तक quality metrics stable नहीं हो जाते।

Phase 11: Prompt Optimization

DSPy से inspired होकर, Potato accumulated human labels को validation set के रूप में उपयोग करके automated prompt optimization चलाता है। यह multiple prompt variations (instruction phrasing, example ordering, chain-of-thought vs. direct) try करता है और best-performing prompt चुनता है।

Phase 12: Final Validation

Human LLM-labeled instances के random sample की final review करता है। यदि accuracy threshold को पूरा करती है, dataset complete है। यदि नहीं, system Phase 6 पर वापस जाता है।

कॉन्फ़िगरेशन

Quick Start

एक minimal Solo Mode कॉन्फ़िगरेशन:

yaml

task_name: "Sentiment Classification"
task_dir: "."
 
data_files:
  - "data/reviews.jsonl"
 
item_properties:
  id_key: id
  text_key: text
 
solo_mode:
  enabled: true
 
  # LLM provider
  llm:
    endpoint_type: openai
    model: "gpt-4o"
    api_key: ${OPENAI_API_KEY}
 
  # Basic thresholds
  seed_count: 50
  accuracy_threshold: 0.92
  confidence_threshold: 0.85
 
annotation_schemes:
  - annotation_type: radio
    name: sentiment
    labels:
      - Positive
      - Neutral
      - Negative
 
output_annotation_dir: "output/"
output_annotation_format: "jsonl"

पूर्ण कॉन्फ़िगरेशन Reference

yaml

solo_mode:
  enabled: true
 
  # LLM configuration
  llm:
    endpoint_type: openai        # openai, anthropic, ollama, vllm
    model: "gpt-4o"
    api_key: ${OPENAI_API_KEY}
    temperature: 0.1             # low temperature for consistency
    max_tokens: 256
 
  # Phase control
  phases:
    seed:
      count: 50                  # number of seed instances
      selection: diversity        # diversity, random, or stratified
      embedding_model: "all-MiniLM-L6-v2"
 
    calibration:
      batch_size: 100
      holdout_fraction: 0.2      # fraction of seed used for validation
 
    confusion_analysis:
      min_samples: 30
      significance_threshold: 0.05
 
    guideline_refinement:
      auto_suggest: true         # LLM suggests guideline edits
      require_approval: true     # human must approve changes
 
    labeling_functions:
      enabled: true
      max_functions: 20
      min_precision: 0.90        # only keep high-precision rules
      min_coverage: 0.01         # must cover at least 1% of data
 
    active_labeling:
      batch_size: 25
      strategy: uncertainty       # uncertainty, diversity, or hybrid
      max_batches: 10
 
    refinement_loop:
      max_iterations: 3
      improvement_threshold: 0.02
 
    disagreement_exploration:
      max_instances: 200
      sort_by: confidence_gap
 
    edge_case_synthesis:
      enabled: true
      count: 50
      diversity_weight: 0.3
 
    confidence_escalation:
      escalation_budget: 200     # max instances to escalate
      batch_size: 25
      stop_when_stable: true     # stop if last batch accuracy is 100%
 
    prompt_optimization:
      enabled: true
      candidates: 10             # number of prompt variants to try
      metric: f1_macro
      search_strategy: bayesian  # bayesian, grid, or random
 
    final_validation:
      sample_size: 100
      min_accuracy: 0.92
      fallback_phase: 6          # go back to Phase 6 if validation fails
 
  # Instance prioritization across phases
  prioritization:
    pools:
      - name: uncertain
        weight: 0.30
        description: "LLM confidence below threshold"
      - name: disagreement
        weight: 0.25
        description: "LLM and labeling functions disagree"
      - name: boundary
        weight: 0.20
        description: "Near decision boundary in embedding space"
      - name: novel
        weight: 0.10
        description: "Far from all existing labeled examples"
      - name: error_pattern
        weight: 0.10
        description: "Matches known confusion patterns"
      - name: random
        weight: 0.05
        description: "Random sample for calibration"

प्रमुख क्षमताएँ

Confusion Analysis

प्रत्येक labeling round के बाद, Potato human और LLM labels के बीच एक confusion matrix बनाता है। Admin dashboard दिखाता है:

LLM के perspective से per-class precision, recall, और F1
सबसे सामान्य confusion pairs (जैसे, "neutral को positive के रूप में misclassify: 23 instances")
प्रत्येक confusion pair के लिए example instances
Refinement rounds में improvement दिखाने वाले trend charts

Confusion analysis को programmatically access करें:

bash

python -m potato.solo confusion --config config.yaml

Output:

text

Confusion Analysis (Round 2)
============================
Overall Accuracy: 0.87 (target: 0.92)

Top Confusion Pairs:
  neutral -> positive:  23 instances (15.3%)
  negative -> neutral:  11 instances (7.3%)
  positive -> neutral:   5 instances (3.3%)

Per-Class Performance:
  Positive:  P=0.91  R=0.94  F1=0.92
  Neutral:   P=0.78  R=0.71  F1=0.74
  Negative:  P=0.93  R=0.88  F1=0.90

Automated Refinement Loop

Refinement loop LLM labeling, confusion analysis, और guideline updates के बीच iterate करता है। प्रत्येक iteration में:

LLM वर्तमान guidelines के साथ पूरे dataset को label करता है
Potato सभी available human labels के विरुद्ध compare करता है
यदि accuracy threshold से नीचे हो, confusion analysis चलता है
LLM error patterns के आधार पर guideline edits propose करता है
Human edits की समीक्षा और approval देता है
Cycle repeat होता है (अधिकतम max_iterations तक)

yaml

solo_mode:
  llm:
    endpoint_type: anthropic
    model: "claude-sonnet-4-20250514"
    api_key: ${ANTHROPIC_API_KEY}
 
  phases:
    refinement_loop:
      max_iterations: 3
      improvement_threshold: 0.02    # stop if improvement is less than 2%

Labeling Functions (ALCHEmist-Inspired)

Potato human annotations में देखे गए patterns से lightweight labeling functions generate करता है। ये LLM calls नहीं हैं; ये fast, deterministic rules हैं।

उदाहरण generated labeling functions:

python

# Auto-generated labeling function 1
# Precision: 0.96, Coverage: 0.08
def lf_strong_positive_words(text):
    positive = {"excellent", "amazing", "fantastic", "outstanding", "perfect"}
    if any(w in text.lower() for w in positive):
        if not any(neg in text.lower() for neg in {"not", "never", "no"}):
            return "Positive"
    return None  # abstain
 
# Auto-generated labeling function 2
# Precision: 0.93, Coverage: 0.05
def lf_explicit_negative(text):
    negative = {"terrible", "awful", "horrible", "worst", "disgusting"}
    if any(w in text.lower() for w in negative):
        return "Negative"
    return None

Labeling function behavior configure करें:

yaml

solo_mode:
  phases:
    labeling_functions:
      enabled: true
      max_functions: 20
      min_precision: 0.90
      min_coverage: 0.01
      types:
        - keyword_match
        - regex_pattern
        - length_threshold
        - embedding_cluster

Disagreement Explorer

Disagreement explorer उन instances present करता है जहाँ विभिन्न signals conflict करते हैं। प्रत्येक instance के लिए, annotator देखता है:

LLM का predicted label और confidence
Labeling function votes (यदि कोई हो)
Embedding space में nearest labeled neighbors
Raw text/content

यह highest-value annotation activity है: प्रत्येक label एक genuine ambiguity को resolve करता है।

yaml

solo_mode:
  phases:
    disagreement_exploration:
      max_instances: 200
      sort_by: confidence_gap     # or "lf_disagreement" or "random"
      show_llm_reasoning: true    # display LLM's chain-of-thought
      show_nearest_neighbors: 3   # show 3 nearest labeled examples

Cascaded Confidence Escalation

Dataset का bulk LLM द्वारा label होने के बाद, Potato सभी LLM-labeled instances को confidence द्वारा rank करता है और सबसे कम confident ones को human के पास escalate करता है। यह batches में जारी रहता है जब तक quality stable नहीं हो जाती।

yaml

solo_mode:
  phases:
    confidence_escalation:
      escalation_budget: 200
      batch_size: 25
      stop_when_stable: true
      stability_window: 3        # stop if last 3 batches are all correct

Multi-Signal Instance Prioritization

Human labeling को involve करने वाले सभी phases में, Potato सबसे अधिक informative instances select करने के लिए एक weighted pool system का उपयोग करता है। छह pools एक unified priority queue में feed होते हैं:

yaml

solo_mode:
  prioritization:
    pools:
      - name: uncertain
        weight: 0.30
      - name: disagreement
        weight: 0.25
      - name: boundary
        weight: 0.20
      - name: novel
        weight: 0.10
      - name: error_pattern
        weight: 0.10
      - name: random
        weight: 0.05

uncertain: Instances जहाँ LLM की confidence confidence_threshold से नीचे है
disagreement: Instances जहाँ LLM और labeling functions अलग-अलग labels देते हैं
boundary: Embedding space में decision boundary के पास instances
novel: किसी existing labeled example से दूर instances
error_pattern: पिछले rounds के ज्ञात confusion patterns से matching instances
random: Calibration maintain करने और blind spots पकड़ने के लिए small random sample

Edge Case Synthesis

Potato ज्ञात कमज़ोरियों को target करने वाले synthetic examples generate करने के लिए LLM का उपयोग करता है:

yaml

solo_mode:
  phases:
    edge_case_synthesis:
      enabled: true
      count: 50
      diversity_weight: 0.3
      confusion_pairs:            # focus on these error types
        - ["neutral", "positive"]
        - ["negative", "neutral"]

LLM specified label pairs के बीच ambiguous examples generate करता है। Human उन्हें label करता है, और ये labels subsequent LLM labeling rounds के लिए few-shot context में जोड़े जाते हैं।

Prompt Optimization (DSPy-Inspired)

Phase 11 में, Potato LLM के लिए best instruction format खोजने के लिए automated prompt optimization चलाता है:

yaml

solo_mode:
  phases:
    prompt_optimization:
      enabled: true
      candidates: 10
      metric: f1_macro
      search_strategy: bayesian
      variations:
        - instruction_style      # formal vs. conversational
        - example_ordering       # random, by-class, by-difficulty
        - reasoning_mode         # direct, chain-of-thought, self-consistency
        - example_count          # 3, 5, 10, 15 few-shot examples

प्रगति की निगरानी

Admin dashboard Solo Mode progress को real time में दिखाता है:

वर्तमान phase और प्रत्येक phase के भीतर progress
पूर्ण किए गए Human labels vs. कुल budget
समय के साथ LLM accuracy (प्रति round)
Labeling function coverage और precision
Confidence distribution histogram
Estimated time to completion

Command line से access करें:

bash

python -m potato.solo status --config config.yaml

text

Solo Mode Status
================
Current Phase: 6 (Active Labeling) - Batch 3/10
Human Labels: 142 / ~300 estimated total
LLM Accuracy: 0.89 (target: 0.92)
LF Coverage: 0.23 (labeling functions cover 23% of data)
Dataset Size: 10,000 instances
  - Human labeled: 142
  - LF labeled: 2,300
  - LLM labeled: 7,558
  - Unlabeled: 0

Solo Mode vs. Traditional Multi-Annotator कब उपयोग करें

Solo Mode का उपयोग करें जब:

आपके पास एक domain expert हो जो high-quality labels प्रदान कर सकता है
Budget या logistics कई annotators hire करने से रोकते हैं
Task में स्पष्ट, well-defined categories हैं
आपको एक बड़ा dataset (1,000+ instances) label करना है
Speed inter-annotator agreement मापने से अधिक महत्वपूर्ण है

Traditional multi-annotator का उपयोग करें जब:

आपको publication के लिए inter-annotator agreement statistics चाहिए
Task highly subjective है (जैसे, offensiveness, humor)
आपको annotator disagreement patterns का अध्ययन करना है
Regulatory requirements कई independent annotators mandate करते हैं
Label space complex या evolving है (annotation guidelines अभी विकसित हो रही हैं)

Hybrid approach: Initial bulk labeling के लिए Solo Mode का उपयोग करें, फिर agreement statistics compute करने के लिए random 10-20% sample पर एक second annotator assign करें। यह multi-annotator verification की quality assurance के साथ Solo Mode की efficiency देता है।

yaml

solo_mode:
  enabled: true
  # ... solo mode config ...
 
  # Hybrid: assign verification sample to second annotator
  verification:
    enabled: true
    sample_fraction: 0.15
    annotator: "reviewer_1"

आगे पढ़ें

Solo Mode Tutorial: Labeling 10,000 Examples -- step-by-step walkthrough
Active Learning -- underlying active learning system
AI Support -- LLM integration कॉन्फ़िगरेशन
Quality Control -- annotations के लिए quality assurance
MACE -- competence estimation (hybrid mode verification के लिए उपयोगी)

कार्यान्वयन विवरण के लिए, source documentation देखें।