[Feature Request] same random seed for every env in AsyncEval #253

1-Bart-1 · 2024-07-29T12:10:25Z

🚀 Feature

When training with ARS in combination with AsyncEval, multiple environments are run at the same time. When seeding these environments, all environments get a different seed. There should be an option to seed all the environments with the same seed at the start of each time ARS.evaluate_candidates() is run.

stable-baselines3-contrib/sb3_contrib/ars/ars.py

Line 165 in 5c81398

def evaluate_candidates(

stable-baselines3-contrib/sb3_contrib/common/vec_env/async_eval.py

Line 154 in 5c81398

def seed(self, seed: Optional[int] = None) -> List[Union[None, int]]:

Motivation

Some environments have random values generated in the reset function, for instance external factors that are random. When running evaluate_candidates, these random values can have an effect on the returned rewards, which makes that some good sets of params get bad rewards and bad params get good rewards. This makes training slower. In order to mitigate this, while still generating different random values for external values, all environments in AsyncEval should be seeded with the same random number at the start of evaluate_candidates, or this should at least be an option.

Pitch

Add the following lines to the start of ARS.evaluate_candidates:


        if async_eval is not None:
            # Multiprocess asynchronous version
            async_eval.send_jobs(candidate_weights, self.pop_size)
            results = async_eval.get_results()
            async_eval.seed(self.seed)

Add the following lines to AsyncEval.seed

  if seed is None or seed == 0:
            for idx, remote in enumerate(self.remotes):
                remote.send(("seed", np.random.randint(2**32 - 1, dtype="int64").item() )) # seed all envs with the same random seed
        else:
            for idx, remote in enumerate(self.remotes):
                remote.send(("seed", seed + idx))
        return None

And change the worker so that it doesnt return values after seed:

elif cmd == "seed":
                # Note: the seed will only be effective at the next reset
                vec_env.seed(seed=data)

Alternatives

None.

Additional context

At least in my specific environment this method leads to great improvements in training.

Checklist

I have checked that there is no similar issue in the repo
If I'm requesting a new feature, I have proposed alternatives

The text was updated successfully, but these errors were encountered:

araffin · 2024-07-31T07:23:56Z

Hello,
I think you are missing an important alternative, which is also recommended: evaluating each candidate for multiple episodes to remove noise due to env stochasticity.

Also, even if you seed the each env the same the first time, they will end up in different states after each trial because of the different candidates (you also don't want to optimize for a specific seed of your env).

See issues in RL Zoo:

[Enhancement] Multiple model iterations per Optuna trial and mean performance objective DLR-RM/rl-baselines3-zoo#204 (and linked issues)
[Question] One-time evaluation score is not good indicator, and therefore probably should not be the tuning target? DLR-RM/rl-baselines3-zoo#314

1-Bart-1 added the enhancement New feature or request label Jul 29, 2024

araffin added the check the checklist You have checked the required items in the checklist but you didn't do what is written... label Jul 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] same random seed for every env in AsyncEval #253

[Feature Request] same random seed for every env in AsyncEval #253

1-Bart-1 commented Jul 29, 2024 •

edited

Loading

araffin commented Jul 31, 2024

[Feature Request] same random seed for every env in AsyncEval #253

[Feature Request] same random seed for every env in AsyncEval #253

Comments

1-Bart-1 commented Jul 29, 2024 • edited Loading

🚀 Feature

Motivation

Pitch

Alternatives

Additional context

Checklist

araffin commented Jul 31, 2024

1-Bart-1 commented Jul 29, 2024 •

edited

Loading