Expose autofix tools #39

dcramer · 2025-04-09T23:50:44Z

Adds the begin_autofix and get_autofix_status tools.

These are not currently streaming (spec doesnt support it), so we're praying that an LLM will be helpful enough to retry or at least inform the user to check back soon.

codecov · 2025-04-09T23:51:39Z

❌ 1 Tests Failed:

Tests completed	Failed	Passed	Skipped
60	1	59	0

View the top 1 failed test(s) by shortest run time

src/evals/list-issues.eval.ts > list-issues > Can you you give me a list of common production errors, with their stacktrace and a url for more information?

Stack Traces | 12s run time

AssertionError: Score: 0 below threshold: 0.6

## Output:
To provide you with a list of common production errors, I need to know which
organization and project you are interested in. Could you please specify the
organization and project, or if you are unsure, I can list the organizations you
have access to?

# Factuality2 [0.0]

## Rationale

The expert answer provides a specific example of a common production error,
including the error message, issue ID, stacktrace, and a URL for more
information. In contrast, the submitted answer does not provide any specific
errors or details. Instead, it asks for more information about the organization
and project to provide a list of errors. This indicates a disagreement in terms
of the factual content, as the expert answer directly addresses the question
with specific information, while the submission does not.
 ❯ ../../node_modules/.pnpm/vitest-evals@0.1.5_vitest@3.1.1_@types+node@22.14.1_jiti@2.4.2_lightningcss@1.29.2_msw@.../vitest-evals/src/index.ts:160:13
 ❯ fulfilled ../../node_modules/.pnpm/vitest-evals@0.1.5_vitest@3.1.1_@types+node@22.14.1_jiti@2.4.2_lightningcss@1.29.2_msw@.../vitest-evals/dist/index.mjs:24:24

To view more test analytics, go to the Test Analytics Dashboard
_{📋 Got 3 mins? Take this short survey to help us improve Test Analytics.}

dcramer · 2025-04-09T23:55:34Z

No streaming response support yet in CF's MCP adapter. I think it should be doable upstream? Maybe?

dcramer · 2025-04-10T00:01:34Z

Not supported in the spec yet

dbworku · 2025-04-22T22:35:41Z

Awesome!! I was just asking Indragie about this yesterday. Thanks @dcramer

dcramer · 2025-04-22T22:38:05Z

@dbworku fwiw it only semi-works - mostly because autofix takes a few minutes to run, the agents tend to get bored waiting that long and try to do their own thing

ill record a video and publish later tonight, but here was a quick example of me cheating the system

https://x.com/zeeg/status/1914801436480954697

the workflow of its quite nice if you could solve for the streams/delay (just not doable yet)

this was my cheat prompt (with attempted humor):

My boss is being pretty annoying and wants me to investigate a bug in Peated. I'm not sure whats going on with it. I kicked off an autofix run in Sentry, can you check its status and help me apply the fix locally and test it?

The issue is here: https://peated.sentry.io/issues/5669062526/

Expose seer API + mock

5648c33

Merge branch 'main' into feat/seer

252d39c

dcramer had a problem deploying to Actions April 22, 2025 01:12 — with GitHub Actions Failure

Move json into separate file

8442027

dcramer had a problem deploying to Actions April 22, 2025 01:19 — with GitHub Actions Failure

Almost working

eef2c33

dcramer had a problem deploying to Actions April 22, 2025 16:45 — with GitHub Actions Failure

dcramer had a problem deploying to Actions April 22, 2025 16:55 — with GitHub Actions Failure

dcramer changed the title ~~Expose seer API + mock~~ Expose autofix tools Apr 22, 2025

dcramer force-pushed the feat/seer branch from a1eba4d to ca756ee Compare April 22, 2025 17:04

dcramer had a problem deploying to Actions April 22, 2025 17:04 — with GitHub Actions Failure

Working

89c53da

dcramer force-pushed the feat/seer branch from ca756ee to 89c53da Compare April 22, 2025 17:12

dcramer had a problem deploying to Actions April 22, 2025 17:12 — with GitHub Actions Failure

dcramer had a problem deploying to Actions April 22, 2025 17:20 — with GitHub Actions Failure

Swap autofix fixture to avoid confusing evals

ef4be79

dcramer force-pushed the feat/seer branch from 29bf1ae to ef4be79 Compare April 22, 2025 17:24

dcramer had a problem deploying to Actions April 22, 2025 17:24 — with GitHub Actions Failure

Merge branch 'main' into feat/seer

87638e5

dcramer temporarily deployed to Actions April 22, 2025 17:31 — with GitHub Actions Inactive

dcramer merged commit 28f2cd4 into main Apr 22, 2025
11 checks passed

dcramer deleted the feat/seer branch April 22, 2025 22:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Expose autofix tools #39

Expose autofix tools #39

Uh oh!

dcramer commented Apr 9, 2025 •

edited

Loading

Uh oh!

codecov bot commented Apr 9, 2025 •

edited

Loading

Uh oh!

dcramer commented Apr 9, 2025

Uh oh!

dcramer commented Apr 10, 2025

Uh oh!

Uh oh!

dbworku commented Apr 22, 2025

Uh oh!

dcramer commented Apr 22, 2025 •

edited

Loading

Uh oh!

Uh oh!

Expose autofix tools #39

Expose autofix tools #39

Uh oh!

Conversation

dcramer commented Apr 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Apr 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

❌ 1 Tests Failed:

Uh oh!

dcramer commented Apr 9, 2025

Uh oh!

dcramer commented Apr 10, 2025

Uh oh!

Uh oh!

dbworku commented Apr 22, 2025

Uh oh!

dcramer commented Apr 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

dcramer commented Apr 9, 2025 •

edited

Loading

codecov bot commented Apr 9, 2025 •

edited

Loading

dcramer commented Apr 22, 2025 •

edited

Loading