Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add action message #79

Merged
merged 15 commits into from
Oct 4, 2024
Merged

add action message #79

merged 15 commits into from
Oct 4, 2024

Conversation

filip-michalsky
Copy link
Collaborator

why

This is second, reworked attempt.

Paul mentioned a vision where stagehand will be a tool we can pass to an LLM to perform an action on the web. In that sense, we DO want to return success/ failure signal and a reason for the failure so that the agent/developer can heal their pipelines (either manually or agentically).

what changed

we are returning a message from a finished act() method. This is currently not utilized in evals so we need to add evals to test it.

test plan

need to add evals for this.

@filip-michalsky
Copy link
Collaborator Author

My thinking behind this updated API design:

By return a message from act() we do not break anything in existing evals. It's a simply optional return value which developer can use to retry actions and perhaps change the prompt until the action gets done. It can also help with the prompt optimizer where an another LLM can change the action prompt if previous action failed, etc.

@filip-michalsky
Copy link
Collaborator Author

This needs to get fixed - I resolved the merge conflicts but now missing some banalyzer dependencies. Also homedepot eval is still not passing.

@@ -16,7 +16,7 @@ async function processElements(chunk: number) {
const chunkHeight = viewportHeight * chunk;
const offsetTop = chunkHeight;

window.scrollTo(0, offsetTop);
window.scrollTo({ top: offsetTop, left: 0, behavior: 'smooth' });
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added "smooth" to the scrolling behavior, the hypothesis is that it reduces hard reloads of the site for slow sites but its just based on the one eval (homedepot)

@pkiv pkiv merged commit 7edb817 into main Oct 4, 2024
1 check passed
@filip-michalsky filip-michalsky deleted the fm/return-no-action branch October 6, 2024 23:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants