Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Review] #27

Open
druce opened this issue Jan 27, 2025 · 1 comment
Open

[Review] #27

druce opened this issue Jan 27, 2025 · 1 comment

Comments

@druce
Copy link

druce commented Jan 27, 2025

Format

What's the book format where you found this issue?
[ ] pdf
[x ] web
[ ] ipynb

Chapter

evals

Issue Description

amazeballs, a few small thoughts

glider - what is a 3b evaluator llm, paper says 3.8b parameters, (also says 3b evaluator but I couldn't figure out what that means or just a typo)

maybe add log loss here : For discriminative tasks, LLM-based applications may produce log-probabilities or discrete predictions, traditional machine learning metrics like log loss, accuracy, precision, recall, and F1 score can be applied.

maybe artificial analysis worth a mention in review of benchmarks, maybe not, i like that it does performance analysis, latency/throughput, nice dashboard to look up eg R1 and see how it stacks up, API providers , meta analysis https://artificialanalysis.ai/models/deepseek-r1

in the table comparing langsmith, promptfoo, lighteval, seems noteworthy that langsmith needs API key and collects all your prompts and traces, even though on the website in the got a question section they say they never look at it https://www.langchain.com/langsmith

just my midwit observations

@souzatharsis
Copy link
Owner

souzatharsis commented Jan 27, 2025 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants