refactor(benchmarks) Update evaluation accuracy for general NLP chall…

…enge (#4286)
adap · Oct 6, 2024 · 7a7d912 · 7a7d912
1 parent 849ab1d
commit 7a7d912
Showing 1 changed file with 3 additions and 3 deletions.
diff --git a/benchmarks/flowertune-llm/evaluation/README.md b/benchmarks/flowertune-llm/evaluation/README.md
@@ -17,9 +17,9 @@ The default template generated by `flwr new` (see the [Project Creation Instruct
 
 ### General NLP
 
-|          | MT-1 | MT-2 | MT-Avg |  
-|:--------:|:----:|:----:|:------:|
-| MT Score | 5.54 | 5.52 |  5.53  |
+|         | STEM  |  SS   | Humanities |  Avg  |
+|:-------:|:-----:|:-----:|:----------:|:-----:|
+| Acc (%) | 12.37 | 13.49 |   12.60    | 12.82 |
 
 ### Finance