Skip to content

Expose autofix tools #39

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Apr 22, 2025
Merged

Expose autofix tools #39

merged 7 commits into from
Apr 22, 2025

Conversation

dcramer
Copy link
Member

@dcramer dcramer commented Apr 9, 2025

Adds the begin_autofix and get_autofix_status tools.

These are not currently streaming (spec doesnt support it), so we're praying that an LLM will be helpful enough to retry or at least inform the user to check back soon.

Copy link

codecov bot commented Apr 9, 2025

❌ 1 Tests Failed:

Tests completed Failed Passed Skipped
60 1 59 0
View the top 1 failed test(s) by shortest run time
src/evals/list-issues.eval.ts > list-issues > Can you you give me a list of common production errors, with their stacktrace and a url for more information?
Stack Traces | 12s run time
AssertionError: Score: 0 below threshold: 0.6

## Output:
To provide you with a list of common production errors, I need to know which
organization and project you are interested in. Could you please specify the
organization and project, or if you are unsure, I can list the organizations you
have access to?

# Factuality2 [0.0]

## Rationale

The expert answer provides a specific example of a common production error,
including the error message, issue ID, stacktrace, and a URL for more
information. In contrast, the submitted answer does not provide any specific
errors or details. Instead, it asks for more information about the organization
and project to provide a list of errors. This indicates a disagreement in terms
of the factual content, as the expert answer directly addresses the question
with specific information, while the submission does not.
 ❯ ../../node_modules/.pnpm/vitest-evals@0.1.5_vitest@3.1.1_@types+node@22.14.1_jiti@2.4.2_lightningcss@1.29.2_msw@.../vitest-evals/src/index.ts:160:13fulfilled ../../node_modules/.pnpm/vitest-evals@0.1.5_vitest@3.1.1_@types+node@22.14.1_jiti@2.4.2_lightningcss@1.29.2_msw@.../vitest-evals/dist/index.mjs:24:24

To view more test analytics, go to the Test Analytics Dashboard
📋 Got 3 mins? Take this short survey to help us improve Test Analytics.

@dcramer
Copy link
Member Author

dcramer commented Apr 9, 2025

No streaming response support yet in CF's MCP adapter. I think it should be doable upstream? Maybe?

@dcramer
Copy link
Member Author

dcramer commented Apr 10, 2025

Not supported in the spec yet

@dcramer dcramer changed the title Expose seer API + mock Expose autofix tools Apr 22, 2025
@dcramer dcramer merged commit 28f2cd4 into main Apr 22, 2025
11 checks passed
@dbworku
Copy link

dbworku commented Apr 22, 2025

Awesome!! I was just asking Indragie about this yesterday. Thanks @dcramer

@dcramer
Copy link
Member Author

dcramer commented Apr 22, 2025

@dbworku fwiw it only semi-works - mostly because autofix takes a few minutes to run, the agents tend to get bored waiting that long and try to do their own thing

ill record a video and publish later tonight, but here was a quick example of me cheating the system

https://x.com/zeeg/status/1914801436480954697

the workflow of its quite nice if you could solve for the streams/delay (just not doable yet)

this was my cheat prompt (with attempted humor):

My boss is being pretty annoying and wants me to investigate a bug in Peated. I'm not sure whats going on with it. I kicked off an autofix run in Sentry, can you check its status and help me apply the fix locally and test it?

The issue is here: https://peated.sentry.io/issues/5669062526/

@dcramer dcramer deleted the feat/seer branch April 22, 2025 22:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants