Open
Description
@XiangLi1999 Thanks for your great work!
I am trying to reproduce the results on wikitext but meet some problems.
I use your script:
python run_generation.py --model_name_or_path gpt2-xl --model_type gpt2 --length 256 --prompt_file wikitext --student_name_or_path gpt2 --st_coef 1.0 --student_temperature 0.5 --outfile outputs/temp_out.jsonl --ignore_prefix no
And then evaluate the output file by:
python eval_script.py ./outputs/temp_out.jsonl
The output is
{'name': './outputs/temp_out.jsonl', 'rep-2': 9.5, 'rep-3': 1.87, 'rep-4': 0.4, 'diversity': 0.8845241939999999, 'mauve': 0.8812567264373257, 'coherence': 0.5913593170305366} (I disable other metrics)
which is different from reported results in the paper (coherence = 0.59 v.s. 0.69).
I find that ./outputs_ignorePrefix_ccnews_256/wikitext_results/wikitext_gpt2-1.0-t0.5_gpt2-xl_256.jsonl
can produces correct metric values. May I ask two questions:
- What is the generation script used to produce the correct outputs?
- What does the values in
wikitext_gpt2-1.0-t0.5_gpt2-xl_256.jsonl
mean? For example, 256 seems output length, 0.5 is student temperature. What does 1.0 indicate?
Metadata
Metadata
Assignees
Labels
No labels