-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added new extract answer feature #148
Conversation
"If the proposed answer is empty, invalid, or ambiguous, " | ||
"return an empty string." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you upstream some of the tests from https://github.com/Future-House/paper-qa/blob/v5.8.0/tests/test_litqa.py#L117 to here? I think we should also have "multiple options are matched" mentioned somewhere
I would be nice if paper-qa
can just import this function and use it, instead of having its own evaluation LLM prompt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup - will do
try: | ||
from litellm import acompletion | ||
except ImportError as e: | ||
raise ImportError( | ||
"eval_answer requires the 'llm' extra for 'litellm'. Please:" | ||
" `pip install aviary[llm]`." | ||
) from e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you look at how ToolSelector.__init__
takes acompletion: "Callable[..., Awaitable[ModelResponse]] | None" = None
, and apply it as an input arg here?
What this does is let people use local models.
Also, feel free to YAGNI on this one
@@ -22,6 +22,21 @@ | |||
"temperature": 0, | |||
} | |||
|
|||
LLM_EXTRACT_CONFIG = { | |||
"prompt": ( | |||
"You are evaluating answers for a test which has fixed options. " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add some statement that focuses the LLM on the message history?
Otherwise, in paper-qa
, we witnessed the LLM using its innate knowledge
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no message history here (?) Not sure what you mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this function is responsible for both (1) extracting a letter that (2) ensuring it matches a multiple choice option.
What we saw in paper-qa
was in the case of an empty string answer, the LLM would pull on its innate knowledge and could select the correct multiple choice option.
So for this:
Can you add some statement that focuses the LLM on the message history?
I guess what I should of said was can you add a statement that focuses the LLM on just the proposed: str
, and tries to avoid pulling on any innate knowledge?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea - I was smart and didn't put the question into these, so there's no way it could get confused and try to answer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we simultaneously posted
There is no question. So I don't see how it would be possible for it to attempt to answer. I don't know what else I could write to make it more clear in the prompt.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahhh I see, very clever! I follow and you're right
Do you mind documenting that rationale somewhere in the code? Maybe a docstring in extract_answer_llm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving, there are outstanding comments still
@@ -22,6 +22,21 @@ | |||
"temperature": 0, | |||
} | |||
|
|||
LLM_EXTRACT_CONFIG = { | |||
"prompt": ( | |||
"You are evaluating answers for a test which has fixed options. " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahhh I see, very clever! I follow and you're right
Do you mind documenting that rationale somewhere in the code? Maybe a docstring in extract_answer_llm
Hey @jamesbraza - I'm going to close this, because at this point YAGNI. Unless you find a use for it in paper-qa eval stuff |
Oh I forgot about this one, I will combine it atop #157 is merged |
Standardizing extraction of answers