Replies: 2 comments 3 replies
-
Hey @rawwerks -- thanks for giving TextGrad a try!! For code improvement -- we did not want to give us an unfair advantage by using the actual runtime in the objective (since the baseline, Reflexion, does not use it, we wanted to have an equal comparison). Just like in PyTorch -- TextGrad requires one to define how to backprop through a given function. We can abstract this away through extending String Based Functions, or implement more general functions to handle any e.g. 0-1 metric that you may like. The tradeoff would be that by implementing more specific ways to backprop through your metric, you can improve the optimization performance. This is a good lesson for us to make it more explicit and easy for the users to see and use such metrics -- we'll definitely do that. Thank you so much for this question! |
Beta Was this translation helpful? Give feedback.
-
Having hybrid LLM + numerical evals is essential. I have seen string equality proposed but that's not an alternative. Did you work on a hybrid autograd + textgrad approach? |
Beta Was this translation helpful? Give feedback.
-
I'm curious if the textgrad team has any examples that use a function as the evaluation? (Either in addition to or instead of the LLM engine text evaluation.)
I am coming from DSPy, so I am used to setting a metric function (0-1) as the evaluation. As a specific example, I was surprised that the code improvement example used only the LLM's evaluation of a string as the evaluation function. (Specifically, I was surprised that instead of trying to minimize run time as a numerical value, the example has the LLM read the numerical value and decide it should be minimizing it.)
Text-only evaluation is super cool and a very creative approach! I'm worried that it is unnecessarily relying on the LLM as the sole evaluator. (And frankly I have found LLMs to be a very poor judge of quality -- for example, it is much harder to make a good LLM editor than a good LLM writer.)
Ideally, I would like to be able to mix quantitative/algorithmic/deterministic evaluations with the text-based (and inherently non-deterministic) evaluations of LLMs. Is this possible with textgrad today?
Beta Was this translation helpful? Give feedback.
All reactions