You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks a lot for building such an amazing benchmark. I have a question related to the process you used to create the user instructions for each task. In your paper, specifically in section 4 (i.e. Benchmark Construction) in Stage III, you mentioned "we write an initial user instruction, run a trial with gpt-4-turbo function calling agent, polish the user instruction by examining the trajectory, and do this iteratively until we are certain no ambiguities exist". I wonder if there are any attribute formats you followed when you constructed these user instructions?
Take this instruction as an example: You are aarav-garcia-1177. For your upcoming trip from ATL to PHL, you want to change for the cheapest economy flight and for the day after the original reservation. You are happy with original payment for refund. It seems that the user instruction always follows this attribute format: user identity (aarav-garcia-1177), goal (For your upcoming trip from ATL to PHL, you want to change for the cheapest economy flight and for the day after the original reservation), and preferences (You are happy with original payment for refund.). Sometimes you also add the language style in the instruction (e.g. You are reactive to the agent and will not say anything that is not asked.). Could you please clarify if you ever considered any attribute formats when you created the user instructions for the tasks in tau-bench? If so, would you mind sharing more details on these attribute formats considered during benchmark construction?
Thank you so much in advance!
The text was updated successfully, but these errors were encountered:
Hi,
Thanks a lot for building such an amazing benchmark. I have a question related to the process you used to create the user instructions for each task. In your paper, specifically in section 4 (i.e. Benchmark Construction) in Stage III, you mentioned "we write an initial user instruction, run a trial with gpt-4-turbo function calling agent, polish the user instruction by examining the trajectory, and do this iteratively until we are certain no ambiguities exist". I wonder if there are any attribute formats you followed when you constructed these user instructions?
Take this instruction as an example: You are aarav-garcia-1177. For your upcoming trip from ATL to PHL, you want to change for the cheapest economy flight and for the day after the original reservation. You are happy with original payment for refund. It seems that the user instruction always follows this attribute format: user identity (aarav-garcia-1177), goal (For your upcoming trip from ATL to PHL, you want to change for the cheapest economy flight and for the day after the original reservation), and preferences (You are happy with original payment for refund.). Sometimes you also add the language style in the instruction (e.g. You are reactive to the agent and will not say anything that is not asked.). Could you please clarify if you ever considered any attribute formats when you created the user instructions for the tasks in tau-bench? If so, would you mind sharing more details on these attribute formats considered during benchmark construction?
Thank you so much in advance!
The text was updated successfully, but these errors were encountered: