-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Review] #27
Comments
great points, Druce!
We will incorporate your feedback, shortly
Best Regards,
…--
Thársis
<http://linkedin.com/in/tharsissouza>
On Mon, Jan 27, 2025 at 12:20 PM Druce Vertes ***@***.***> wrote:
Format
What's the book format where you found this issue?
[ ] pdf
[x ] web
[ ] ipynb
Chapter
evals
Issue Description
amazeballs, a few small thoughts
glider - what is a 3b evaluator llm, paper says 3.8b parameters, (also
says 3b evaluator but I couldn't figure out what that means or just a typo)
maybe add log loss here : For discriminative tasks, LLM-based applications
may produce log-probabilities or discrete predictions, traditional machine
learning metrics like log loss, accuracy, precision, recall, and F1 score
can be applied.
maybe artificial analysis <https://artificialanalysis.ai/> worth a
mention in review of benchmarks, maybe not, i like that it does performance
analysis, latency/throughput, nice dashboard to look up eg R1 and see how
it stacks up, API providers , meta analysis
https://artificialanalysis.ai/models/deepseek-r1
in the table comparing langsmith, promptfoo, lighteval, seems noteworthy
that langsmith needs API key and collects all your prompts and traces, even
though on the website in the got a question section they say they never
look at it https://www.langchain.com/langsmith
just my midwit observations
—
Reply to this email directly, view it on GitHub
<#27>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADTMY3ILDAQDJ3WNU4EL7NT2MZTD7AVCNFSM6AAAAABV6S3EB6VHI2DSMVQWIX3LMV43ASLTON2WKOZSHAYTGNRQGIYTGNA>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Format
What's the book format where you found this issue?
[ ] pdf
[x ] web
[ ] ipynb
Chapter
evals
Issue Description
amazeballs, a few small thoughts
glider - what is a 3b evaluator llm, paper says 3.8b parameters, (also says 3b evaluator but I couldn't figure out what that means or just a typo)
maybe add log loss here : For discriminative tasks, LLM-based applications may produce log-probabilities or discrete predictions, traditional machine learning metrics like log loss, accuracy, precision, recall, and F1 score can be applied.
maybe artificial analysis worth a mention in review of benchmarks, maybe not, i like that it does performance analysis, latency/throughput, nice dashboard to look up eg R1 and see how it stacks up, API providers , meta analysis https://artificialanalysis.ai/models/deepseek-r1
in the table comparing langsmith, promptfoo, lighteval, seems noteworthy that langsmith needs API key and collects all your prompts and traces, even though on the website in the got a question section they say they never look at it https://www.langchain.com/langsmith
just my midwit observations
The text was updated successfully, but these errors were encountered: