some questions about code in the frozen lake experiment #1

hlhang9527 · 2021-03-23T12:07:53Z

Hello Professor,
Recently, I've been studying your paper and reproducing your code, and I have some question as follows:

In the evaluate_single_switch_policy(policy, teacher_env,
student_final_env, timesteps=10000) function of teacher_learning, does the params timesteps means the N_s interaction units in paper?
In the evaluate_single_switch_policy() function of frozen_single_switch_utils, does the lines 52-58 implement the get_student() of ALgorithm 1 CISR?
The threshold is pre-defined in the teacher_learning function, how do we set the range of domain as(-0.5, 5.5) if we don't know the final reward in advance?

I hope you can give me some advice, thank you very much!

Provide feedback