यह पृष्ठ अभी आपकी भाषा में उपलब्ध नहीं है। अंग्रेज़ी संस्करण दिखाया जा रहा है।

Integrazione MTurk

Distribuisci attività di annotazione su Amazon Mechanical Turk.

Integrazione con Amazon Mechanical Turk

Questa guida fornisce istruzioni per distribuire attività di annotazione Potato su Amazon Mechanical Turk (MTurk).

Panoramica

Potato si integra con MTurk tramite il tipo di HIT External Question:

Crei un External Question HIT su MTurk che punta al tuo server Potato
I worker cliccano sul tuo HIT e vengono reindirizzati al tuo server Potato
Potato estrae l'ID worker e altri parametri dall'URL
I worker completano l'attività di annotazione
Al completamento, i worker cliccano su "Submit HIT to MTurk"

Parametri URL

MTurk passa quattro parametri al tuo External Question URL:

Parametro	Descrizione
`workerId`	Identificatore MTurk univoco del worker
`assignmentId`	ID univoco per questa coppia worker-HIT
`hitId`	L'identificatore HIT
`turkSubmitTo`	URL dove il modulo di completamento deve fare POST

Prerequisiti

Requisiti del server

Server pubblicamente accessibile con:
- Porta aperta (tipicamente 8080 o 443)
- HTTPS consigliato (richiesto da alcuni browser)
- Connessione internet stabile
Ambiente Python con Potato installato

Requisiti MTurk

Account MTurk Requester: Registrati su requester.mturk.com
Account finanziato: Aggiungi fondi per la produzione (il sandbox è gratuito)

Avvio rapido

Passaggio 1: Crea la tua configurazione Potato

yaml

# mturk_task.yaml
annotation_task_name: "Sentiment Classification"
task_description: "Classify the sentiment of short text snippets."
 
# MTurk login configuration
login:
  type: url_direct
  url_argument: workerId
 
# Optional completion code
completion_code: "TASK_COMPLETE"
 
# Crowdsourcing settings
hide_navbar: true
jumping_to_id_disabled: true
assignment_strategy: random
max_annotations_per_user: 10
max_annotations_per_item: 3
 
# Data files
data_files:
  - data/items.json
 
# Annotation scheme
annotation_schemes:
  - annotation_type: radio
    name: sentiment
    description: "What is the sentiment of this text?"
    labels:
      - positive
      - neutral
      - negative

Passaggio 2: Avvia il server

bash

# Start the server
potato start mturk_task.yaml -p 8080
 
# Or with HTTPS (recommended)
potato start mturk_task.yaml -p 443 --ssl-cert cert.pem --ssl-key key.pem

Passaggio 3: Crea il tuo HIT su MTurk

Crea un External Question HIT usando questo modello XML:

xml

<?xml version="1.0" encoding="UTF-8"?>
<ExternalQuestion xmlns="http://mechanicalturk.amazonaws.com/AWSMechanicalTurkDataSchemas/2006-07-14/ExternalQuestion.xsd">
  <ExternalURL>https://your-server.com:8080/?workerId=${workerId}&amp;assignmentId=${assignmentId}&amp;hitId=${hitId}&amp;turkSubmitTo=${turkSubmitTo}</ExternalURL>
  <FrameHeight>800</FrameHeight>
</ExternalQuestion>

Importante: Usa & invece di & in XML.

Riferimento alla configurazione

Impostazioni richieste

yaml

login:
  type: url_direct      # Required: enables URL-based authentication
  url_argument: workerId  # Required: MTurk uses 'workerId' parameter

Impostazioni consigliate

yaml

hide_navbar: true           # Prevent workers from skipping
jumping_to_id_disabled: true
assignment_strategy: random
max_annotations_per_user: 10
max_annotations_per_item: 3
task_description: "Brief description for the preview page."
completion_code: "YOUR_CODE"

Test nel Sandbox

Testa sempre nel MTurk Sandbox prima di andare in produzione.

URL Sandbox

Servizio	URL
Requester	https://requestersandbox.mturk.com
Worker	https://workersandbox.mturk.com
Endpoint API	https://mturk-requester-sandbox.us-east-1.amazonaws.com

Test locale

Testa i parametri URL MTurk in locale:

bash

# Test normal workflow
curl "http://localhost:8080/?workerId=TEST_WORKER&assignmentId=TEST_ASSIGNMENT&hitId=TEST_HIT"
 
# Test preview mode
curl "http://localhost:8080/?workerId=TEST_WORKER&assignmentId=ASSIGNMENT_ID_NOT_AVAILABLE&hitId=TEST_HIT"

Integrazione API MTurk (Opzionale)

Per funzionalità avanzate, abilita l'integrazione API MTurk:

bash

pip install boto3

Crea configs/mturk_config.yaml:

yaml

aws_access_key_id: "YOUR_ACCESS_KEY"
aws_secret_access_key: "YOUR_SECRET_KEY"
sandbox: true  # Set to false for production
hit_id: "YOUR_HIT_ID"

Abilita nella tua configurazione principale:

yaml

mturk:
  enabled: true
  config_file_path: configs/mturk_config.yaml

Creare HIT in modo programmatico

python

import boto3
 
mturk = boto3.client(
    'mturk',
    region_name='us-east-1',
    endpoint_url='https://mturk-requester-sandbox.us-east-1.amazonaws.com'
)
 
question_xml = '''<?xml version="1.0" encoding="UTF-8"?>
<ExternalQuestion xmlns="http://mechanicalturk.amazonaws.com/AWSMechanicalTurkDataSchemas/2006-07-14/ExternalQuestion.xsd">
  <ExternalURL>https://your-server.com:8080/?workerId=${workerId}&amp;assignmentId=${assignmentId}&amp;hitId=${hitId}&amp;turkSubmitTo=${turkSubmitTo}</ExternalURL>
  <FrameHeight>800</FrameHeight>
</ExternalQuestion>'''
 
response = mturk.create_hit(
    Title='Sentiment Classification Task',
    Description='Classify the sentiment of short text snippets.',
    Keywords='sentiment, classification, text',
    Reward='0.50',
    MaxAssignments=100,
    LifetimeInSeconds=86400,
    AssignmentDurationInSeconds=3600,
    AutoApprovalDelayInSeconds=604800,
    Question=question_xml
)
 
print(f"Created HIT: {response['HIT']['HITId']}")

Buone pratiche

Progettazione delle attività

Istruzioni chiare: Fornisci esempi dettagliati
Tempo ragionevole: Non mettere fretta ai worker
Pagamento equo: Almeno l'equivalente del salario minimo ($12-15/ora)
Lunghezza gestibile: 5-15 minuti per HIT è ideale

Controllo qualità

Test di qualifica: Seleziona i worker in anticipo
Verifiche dell'attenzione: Includi domande di verifica
Ridondanza: Più worker per elemento (3+ consigliati)
Revisiona campioni: Controlla manualmente un sottoinsieme

Tecnico

Gestisci i casi limite: I worker possono ricaricare o tornare indietro
Salva i progressi: Salvataggio automatico se possibile
Errori eleganti: Mostra messaggi di errore utili

Risoluzione dei problemi

I worker vedono la pagina di anteprima dopo aver accettato

Verifica che il parametro assignmentId venga passato correttamente
La pagina di anteprima si aggiorna automaticamente; chiedi ai worker di aspettare

Il pulsante Invia non funziona

Controlla la console del browser per gli errori
Verifica che il parametro turkSubmitTo sia presente
Controlla i problemi CORS o di contenuto misto

I worker non riescono ad accedere

Verifica che login.url_argument sia impostato su workerId
Assicurati che login.type sia url_direct

Ulteriori letture

Crowdsourcing Integration - Configurazione generale del crowdsourcing
Quality Control - Verifiche dell'attenzione e standard gold
Task Assignment - Strategie di assegnazione

Per i dettagli di implementazione, consulta la documentazione sorgente.