F1 score in 01_fine-tuning-titan-lite.ipynb #242

jicowan · 2024-04-25T19:59:09Z

I had to run the following code block 2x before it would output the scores. The first time I ran it, the output was blank:

from bert_score import score
reference_summary = [reference_summary]
fine_tuned_model_P, fine_tuned_R, fine_tuned_F1 = score(fine_tuned_generated_response, reference_summary, lang="en")
base_model_P, base_model_R, base_model_F1 = score(base_model_generated_response, reference_summary, lang="en")
print("F1 score: base model ", base_model_F1)
print("F1 score: fine-tuned model", fine_tuned_F1)

Final output:

F1 score: base model  tensor([0.8868])
F1 score: fine-tuned model tensor([0.8532])

jicowan · 2024-04-25T21:01:57Z

You might want to consider using the Model Evaluation feature within Bedrock to compare the models rather than using the score function.

w601sxs · 2024-05-15T19:10:35Z

Thanks @jicowan - we will get back to this; prioritizing other bugs for now.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

F1 score in 01_fine-tuning-titan-lite.ipynb #242

F1 score in 01_fine-tuning-titan-lite.ipynb #242

jicowan commented Apr 25, 2024

jicowan commented Apr 25, 2024

w601sxs commented May 15, 2024

F1 score in 01_fine-tuning-titan-lite.ipynb #242

F1 score in 01_fine-tuning-titan-lite.ipynb #242

Comments

jicowan commented Apr 25, 2024

jicowan commented Apr 25, 2024

w601sxs commented May 15, 2024