Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

F1 score in 01_fine-tuning-titan-lite.ipynb #242

Open
jicowan opened this issue Apr 25, 2024 · 2 comments
Open

F1 score in 01_fine-tuning-titan-lite.ipynb #242

jicowan opened this issue Apr 25, 2024 · 2 comments

Comments

@jicowan
Copy link

jicowan commented Apr 25, 2024

I had to run the following code block 2x before it would output the scores. The first time I ran it, the output was blank:

from bert_score import score
reference_summary = [reference_summary]
fine_tuned_model_P, fine_tuned_R, fine_tuned_F1 = score(fine_tuned_generated_response, reference_summary, lang="en")
base_model_P, base_model_R, base_model_F1 = score(base_model_generated_response, reference_summary, lang="en")
print("F1 score: base model ", base_model_F1)
print("F1 score: fine-tuned model", fine_tuned_F1)

Final output:

F1 score: base model  tensor([0.8868])
F1 score: fine-tuned model tensor([0.8532])
@jicowan
Copy link
Author

jicowan commented Apr 25, 2024

You might want to consider using the Model Evaluation feature within Bedrock to compare the models rather than using the score function.

@w601sxs
Copy link
Contributor

w601sxs commented May 15, 2024

Thanks @jicowan - we will get back to this; prioritizing other bugs for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants