From 84b4fd7c8da19a1ef042c4b9a0aea85cfba19ea1 Mon Sep 17 00:00:00 2001 From: Yan Gao Date: Sat, 7 Sep 2024 09:46:27 +0100 Subject: [PATCH 1/2] fix(benchmarks:skip) Fix a git clone depth issue for generalNLP challenge (#4155) --- benchmarks/flowertune-llm/evaluation/general-nlp/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/benchmarks/flowertune-llm/evaluation/general-nlp/README.md b/benchmarks/flowertune-llm/evaluation/general-nlp/README.md index 51c801494f6d..18666968108d 100644 --- a/benchmarks/flowertune-llm/evaluation/general-nlp/README.md +++ b/benchmarks/flowertune-llm/evaluation/general-nlp/README.md @@ -23,7 +23,7 @@ huggingface-cli login Download data from [FastChat](https://github.com/lm-sys/FastChat): ```shell -git clone --depth=1 https://github.com/lm-sys/FastChat.git && cd FastChat && git checkout d561f87b24de197e25e3ddf7e09af93ced8dfe36 && mv fastchat/llm_judge/data ../data && cd .. && rm -rf FastChat +git clone https://github.com/lm-sys/FastChat.git && cd FastChat && git checkout d561f87b24de197e25e3ddf7e09af93ced8dfe36 && mv fastchat/llm_judge/data ../data && cd .. && rm -rf FastChat ``` From 0e7c1b06c32ab90e0d3cf64825ed51eedd715509 Mon Sep 17 00:00:00 2001 From: Yan Gao Date: Sat, 7 Sep 2024 09:53:22 +0100 Subject: [PATCH 2/2] fix(benchmarks:skip) Update accuracy values for code challenge (#4157) --- benchmarks/flowertune-llm/evaluation/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/benchmarks/flowertune-llm/evaluation/README.md b/benchmarks/flowertune-llm/evaluation/README.md index 1b6383df296a..d7216c089d8a 100644 --- a/benchmarks/flowertune-llm/evaluation/README.md +++ b/benchmarks/flowertune-llm/evaluation/README.md @@ -37,7 +37,7 @@ The default template generated by `flwr new` (see the [Project Creation Instruct | | MBPP | HumanEval | MultiPL-E (JS) | MultiPL-E (C++) | Avg | |:----------:|:-----:|:---------:|:--------------:|:---------------:|:-----:| -| Pass@1 (%) | 32.60 | 26.83 | 29.81 | 24.22 | 28.37 | +| Pass@1 (%) | 31.60 | 23.78 | 28.57 | 25.47 | 27.36 | ## Make submission on FlowerTune LLM Leaderboard