You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello Professor,
Recently, I've been studying your paper and reproducing your code, and I have some question as follows:
In the evaluate_single_switch_policy(policy, teacher_env,
student_final_env, timesteps=10000) function of teacher_learning, does the params timesteps means the N_s interaction units in paper?
In the evaluate_single_switch_policy() function of frozen_single_switch_utils, does the lines 52-58 implement the get_student() of ALgorithm 1 CISR?
The threshold is pre-defined in the teacher_learning function, how do we set the range of domain as(-0.5, 5.5) if we don't know the final reward in advance?
I hope you can give me some advice, thank you very much!
The text was updated successfully, but these errors were encountered:
Hello Professor,
Recently, I've been studying your paper and reproducing your code, and I have some question as follows:
student_final_env, timesteps=10000) function of teacher_learning, does the params timesteps means the N_s interaction units in paper?
I hope you can give me some advice, thank you very much!
The text was updated successfully, but these errors were encountered: