Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarifications on constructing user instructions #8

Open
BillyZhang24kobe opened this issue Oct 14, 2024 · 0 comments
Open

Clarifications on constructing user instructions #8

BillyZhang24kobe opened this issue Oct 14, 2024 · 0 comments

Comments

@BillyZhang24kobe
Copy link

Hi,

Thanks a lot for building such an amazing benchmark. I have a question related to the process you used to create the user instructions for each task. In your paper, specifically in section 4 (i.e. Benchmark Construction) in Stage III, you mentioned "we write an initial user instruction, run a trial with gpt-4-turbo function calling agent, polish the user instruction by examining the trajectory, and do this iteratively until we are certain no ambiguities exist". I wonder if there are any attribute formats you followed when you constructed these user instructions?

Take this instruction as an example: You are aarav-garcia-1177. For your upcoming trip from ATL to PHL, you want to change for the cheapest economy flight and for the day after the original reservation. You are happy with original payment for refund. It seems that the user instruction always follows this attribute format: user identity (aarav-garcia-1177), goal (For your upcoming trip from ATL to PHL, you want to change for the cheapest economy flight and for the day after the original reservation), and preferences (You are happy with original payment for refund.). Sometimes you also add the language style in the instruction (e.g. You are reactive to the agent and will not say anything that is not asked.). Could you please clarify if you ever considered any attribute formats when you created the user instructions for the tasks in tau-bench? If so, would you mind sharing more details on these attribute formats considered during benchmark construction?

Thank you so much in advance!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant