Skip to content
Showcase/SWE-bench Verified Issue Validation
advancedtext

SWE-bench Verified Issue Validation

Manually validate GitHub issues from SWE-bench to ensure they are well-specified, have adequate test patches, and are solvable. Annotators review the issue description, test patch, and gold patch to determine quality of each benchmark instance.

Submit

Configuration Fileconfig.yaml

# SWE-bench Verified Issue Validation
# Based on "SWE-bench Verified" (Neil Chowdhury, James Aung, Chan Jun Shern et al., OpenAI Technical Report 2024)
# Task: Validate GitHub issues from SWE-bench for specification quality, test adequacy, and solvability

annotation_task_name: "SWE-bench Verified Issue Validation"
task_dir: "."

data_files:
  - sample-data.json
item_properties:
  id_key: "id"
  text_key: "text"

output_annotation_dir: "annotation_output/"
output_annotation_format: "json"

html_layout: |
  <div class="container" style="font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Helvetica, Arial, sans-serif; max-width: 1400px; margin: 0 auto;">
    <div style="background: #0d1117; color: #c9d1d9; padding: 10px 16px; border-radius: 6px 6px 0 0; font-size: 14px; font-weight: 600;">
      <span style="color: #58a6ff;">{{repo_name}}</span>
    </div>
    <div style="display: flex; gap: 16px; margin-top: 4px;">
      <div style="flex: 1; border: 1px solid #30363d; border-radius: 6px; padding: 16px; background: #f6f8fa;">
        <h3 style="margin-top: 0; color: #24292f; border-bottom: 1px solid #d0d7de; padding-bottom: 8px;">Issue Description</h3>
        <div style="white-space: pre-wrap; font-size: 14px; line-height: 1.6; color: #1f2328;">{{text}}</div>
      </div>
      <div style="flex: 1; border: 1px solid #30363d; border-radius: 6px; overflow: hidden;">
        <div style="background: #161b22; color: #c9d1d9; padding: 8px 16px; font-weight: 600; font-size: 13px;">Test Patch</div>
        <pre style="margin: 0; padding: 12px 16px; background: #0d1117; color: #c9d1d9; font-family: 'SFMono-Regular', Consolas, 'Liberation Mono', Menlo, monospace; font-size: 12px; line-height: 1.5; overflow-x: auto; white-space: pre;">{{test_patch}}</pre>
      </div>
    </div>
    <div style="margin-top: 12px; border: 1px solid #30363d; border-radius: 6px; overflow: hidden;">
      <div style="background: #161b22; color: #c9d1d9; padding: 8px 16px; font-weight: 600; font-size: 13px;">Gold Patch (Reference Fix)</div>
      <pre style="margin: 0; padding: 12px 16px; background: #0d1117; color: #c9d1d9; font-family: 'SFMono-Regular', Consolas, 'Liberation Mono', Menlo, monospace; font-size: 12px; line-height: 1.5; overflow-x: auto; white-space: pre;">{{gold_patch}}</pre>
    </div>
  </div>

annotation_schemes:
  - name: "issue_valid"
    description: "Is the issue well-specified with a clear, reproducible problem?"
    annotation_type: radio
    labels:
      - "Well-specified — clear problem with reproducible steps"
      - "Ambiguous — issue unclear or underspecified"
      - "Invalid — not a real bug or feature request"
    keyboard_shortcuts:
      "Well-specified — clear problem with reproducible steps": "1"
      "Ambiguous — issue unclear or underspecified": "2"
      "Invalid — not a real bug or feature request": "3"

  - name: "test_adequate"
    description: "Do the tests adequately verify the fix?"
    annotation_type: radio
    labels:
      - "Sufficient — tests verify the fix completely"
      - "Partial — tests cover some aspects"
      - "Insufficient — tests don't adequately verify"
    keyboard_shortcuts:
      "Sufficient — tests verify the fix completely": "4"
      "Partial — tests cover some aspects": "5"
      "Insufficient — tests don't adequately verify": "6"

  - name: "solution_exists"
    description: "Is a solution feasible within the existing codebase?"
    annotation_type: radio
    labels:
      - "Solvable — clear fix exists in codebase"
      - "Likely solvable — fix probable but complex"
      - "Unlikely — may require architectural changes"
      - "Unsolvable — impossible given constraints"
    keyboard_shortcuts:
      "Solvable — clear fix exists in codebase": "7"
      "Likely solvable — fix probable but complex": "8"
      "Unlikely — may require architectural changes": "9"
      "Unsolvable — impossible given constraints": "0"

  - name: "validation_notes"
    description: "Explain your validation reasoning"
    annotation_type: text

allow_all_users: true
instances_per_annotator: 50
annotation_per_instance: 2

Sample Datasample-data.json

[
  {
    "id": "swebench-val-001",
    "text": "django__django-16527: QuerySet.only() doesn't work with select_related() on reverse OneToOneField relation.\n\nWhen using .only() with .select_related() on a reverse OneToOneField, Django generates a query that includes all fields instead of only the specified ones.\n\nSteps to reproduce:\n1. Create models with OneToOneField relationship\n2. Use queryset.select_related('reverse_relation').only('id', 'reverse_relation__name')\n3. Inspect the generated SQL\n\nExpected: SELECT only specified columns\nActual: SELECT includes all columns from both tables",
    "repo_name": "django/django",
    "test_patch": "diff --git a/tests/select_related_onetoone/tests.py b/tests/select_related_onetoone/tests.py\nindex 3a4e512f8a..b7c2d91e03 100644\n--- a/tests/select_related_onetoone/tests.py\n+++ b/tests/select_related_onetoone/tests.py\n@@ -187,6 +187,18 @@ class ReverseSelectRelatedTestCase(TestCase):\n+    def test_only_with_select_related_reverse_onetoone(self):\n+        with self.assertNumQueries(1):\n+            qs = UserProfile.objects.select_related('user').only(\n+                'id', 'user__username'\n+            )\n+            result = list(qs)\n+            self.assertEqual(len(result), 1)\n+            query_str = str(qs.query)\n+            self.assertNotIn('email', query_str)\n+            self.assertNotIn('first_name', query_str)",
    "gold_patch": "diff --git a/django/db/models/sql/compiler.py b/django/db/models/sql/compiler.py\nindex 8e4a37b2ec..f3c5d12a91 100644\n--- a/django/db/models/sql/compiler.py\n+++ b/django/db/models/sql/compiler.py\n@@ -1042,7 +1042,10 @@ class SQLCompiler:\n         if opts.proxy:\n             return self.deferred_to_columns_cb(opts.proxy_for_model._meta, start_alias)\n-        if start_alias:\n+        if start_alias and self.query.deferred_loading[0]:\n+            only_load = self.query.deferred_loading[0]\n+            fields = [f for f in opts.concrete_fields if f.attname in only_load]\n+            return {start_alias: {f.column for f in fields}}\n         columns = {start_alias: set()}\n         for f in opts.concrete_fields:\n             if f.column in columns[start_alias]:"
  },
  {
    "id": "swebench-val-002",
    "text": "scikit-learn__scikit-learn-25638: HistGradientBoostingClassifier does not accept dataframes with feature names containing special characters.\n\nWhen passing a pandas DataFrame with column names containing brackets or dots, fit() raises a ValueError.\n\nSteps to reproduce:\n```python\nimport pandas as pd\nfrom sklearn.ensemble import HistGradientBoostingClassifier\ndf = pd.DataFrame({'feature[0]': [1,2,3], 'target': [0,1,0]})\nclf = HistGradientBoostingClassifier()\nclf.fit(df[['feature[0]']], df['target'])\n```\nRaises: ValueError: Feature names must match pattern '^[a-zA-Z0-9_]+$'",
    "repo_name": "scikit-learn/scikit-learn",
    "test_patch": "diff --git a/sklearn/tests/test_common.py b/sklearn/tests/test_common.py\nindex 4f2a891b2..e83c90d17 100644\n--- a/sklearn/tests/test_common.py\n+++ b/sklearn/tests/test_common.py\n@@ -421,6 +421,15 @@ def test_estimators_feature_names():\n+def test_feature_names_special_characters():\n+    pd = pytest.importorskip('pandas')\n+    X = pd.DataFrame({'col[0]': [1, 2, 3], 'col.1': [4, 5, 6]})\n+    y = [0, 1, 0]\n+    est = HistGradientBoostingClassifier(max_iter=1)\n+    est.fit(X, y)\n+    assert est.feature_names_in_[0] == 'col[0]'\n+    assert est.feature_names_in_[1] == 'col.1'",
    "gold_patch": "diff --git a/sklearn/utils/validation.py b/sklearn/utils/validation.py\nindex 72ef2a50c..a5b3f891d 100644\n--- a/sklearn/utils/validation.py\n+++ b/sklearn/utils/validation.py\n@@ -1843,8 +1843,7 @@ def _check_feature_names(X, *, reset, feature_names_out=None):\n     if hasattr(X, 'columns'):\n         feature_names = np.asarray(X.columns, dtype=object)\n-        pattern = re.compile(r'^[a-zA-Z0-9_]+$')\n-        invalid = [name for name in feature_names if not pattern.match(name)]\n-        if invalid:\n-            raise ValueError(f'Feature names must match...')\n+        # Accept any string feature names - special characters are valid\n+        pass"
  }
]

// ... and 6 more items

Get This Design

View on GitHub

Clone or download from the repository

Quick start:

git clone https://github.com/davidjurgens/potato-showcase.git
cd potato-showcase/agentic/swebench-verified-validation
potato start config.yaml

Details

Annotation Types

radiotext

Domain

Software EngineeringCode Generation

Use Cases

Benchmark ValidationIssue Triage

Tags

swe-benchgithub-issuescode-diffbenchmarkagentic-coding

Found an issue or want to improve this design?

Open an Issue