Skip to content

Trajectory Editing for SFT/DPO

Annotators rewrite the steps of an agent trace to fix a wrong reasoning step, correct a tool call, or strengthen the final answer, and Potato exports each original/corrected pair as supervised fine-tuning targets and DPO preference pairs.

The trajectory_edit schema lets annotators rewrite the steps of an agent trace, and exports each correction as training data. Fix a wrong reasoning step, correct a typo'd tool call, or strengthen the final answer, and Potato saves the corrected trajectory next to the original. The trajectory_correction exporter then turns each (original, corrected) pair into supervised fine-tuning (SFT) targets and direct preference optimization (DPO) preference pairs.

This makes Potato a training-data production tool, not only an evaluation tool. It is the editing counterpart to step-level scoring: instead of rating a trajectory, annotators repair it, and the repair becomes a learning signal.

Trajectory correction editor with live diffAn agent step shown with a read-only original and an editable corrected box with a word-level diff

Quick start

Run the included example from the repository root:

bash
python potato/flask_server.py start examples/agent-traces/trajectory-correction/config.yaml -p 8000

How it works

Each agent step renders as a card showing the original text (read-only) and an editable corrected box pre-filled with the original. As the annotator types:

  • a live word-level diff highlights insertions (green) and deletions (red strikethrough),
  • the words and characters changed are counted, and
  • an "edited" flag appears on changed fields.

A "Reset" button restores the original per field. With edit_final_answer: true the final answer gets its own editor. Nothing is required: an unedited trace simply produces no training pair.

Configuration

yaml
annotation_schemes:
  - annotation_type: trajectory_edit
    name: corrected_trajectory
    description: "Fix any wrong steps, then correct the final answer"
    steps_key: steps          # instance field holding the step list
    step_text_key: action     # the default per-step editable field
    editable_fields:          # which fields get an editor
      - action
      # - thought             # add to also edit reasoning
    show_diff: true
    show_edit_distance: true
    allow_reset: true
    require_reason_on_edit: false   # add a per-field "reason" input
    edit_final_answer: true
    final_answer_key: final_answer
OptionDefaultDescription
steps_keystepsInstance field holding the step list.
step_text_keyactionDefault editable field per step.
editable_fields[step_text_key]Which step fields get an editor, e.g. [action, thought].
show_difftrueShow the live word-level diff.
show_edit_distancetrueShow words and characters changed.
allow_resettruePer-field "Reset to original" button.
require_reason_on_editfalsePer-field "reason for edit" input.
edit_final_answerfalseAdd an editor for the final answer.
final_answer_keyfinal_answerInstance field holding the final answer.

Data format

The schema reads steps from the instance under steps_key. Each step is an object whose fields (action, thought, and so on) can be edited; bare-string steps are edited as the step_text_key field.

json
{
  "id": "traj_001",
  "task_description": "Find the weather in San Francisco.",
  "steps": [
    {"thought": "Look it up.", "action": "web_search(queyr='SF weather')"},
    {"thought": "Open it.",    "action": "open_url(results[0])"}
  ],
  "final_answer": "It is sunny."
}

Export

Run the trajectory_correction exporter. It writes three files:

  • trajectory_corrections.json — every record: the original_trace, the reconstructed corrected_trace, and per-field edits with edit distances and reasons.
  • trajectory_sft.jsonl — one line per edited trace: {"prompt": <task>, "completion": <corrected_trace>}.
  • trajectory_dpo.jsonl — one line per edited trace: {"prompt": <task>, "chosen": <corrected_trace>, "rejected": <original_trace>}.

Unedited traces are counted but excluded from SFT/DPO, since training on an unchanged trajectory adds nothing; the skipped count appears in the export stats. With multiple annotators, each annotator who edited a trace yields one SFT/DPO record.

Notes and limitations

  • The diff is word-level. For code-like tool calls with no spaces, a single token may show as wholly changed even for a one-character fix; the character-distance counter is the precise signal.
  • Pair with step-level scoring if you also want per-step correctness or an error taxonomy on the same trace.

For implementation details, see the source documentation.