Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: producing NAN values during training in MaskablePPO #221

Open
4 tasks done
vahidqo opened this issue Dec 14, 2023 · 5 comments
Open
4 tasks done

[Bug]: producing NAN values during training in MaskablePPO #221

vahidqo opened this issue Dec 14, 2023 · 5 comments
Labels
bug Something isn't working custom gym env Issue related to Custom Gym Env more information needed Please fill the issue template completely No tech support We do not do tech support

Comments

@vahidqo
Copy link

vahidqo commented Dec 14, 2023

🐛 Bug

During training, nan values are produced by the algorithm. These nan values are produced in the neural network. I found several ideas in issues that were proposed, I tried all of them but still got the error. The solutions were: changing np.float64 to np.float32, which doesn't work. Using use_expln=True, which MaskablePPO doesn't have. I also changed the model's parameter such as gamma, but still got the same error. Tried to decrease learning rate that again faced the error

To Reproduce

class custom(gym.Env):
.
.
.
env = custom()
def mask_fn(env: gym.Env) -> List[bool]:
    return env.valid_action_mask()
env = ActionMasker(env, mask_fn)
model = MaskablePPO(MaskableActorCriticPolicy, env, gamma=0.001, verbose=0)
checkpoint_callback = CheckpointCallback(save_freq=10000, save_path='logs',
                                         name_prefix='rl_model')
model.learn(500000, callback=checkpoint_callback)
model.save("JOM")

Relevant log output / Error message

ValueError                                Traceback (most recent call last)
<ipython-input-9-abee064644f3> in <cell line: 3>()
      1 checkpoint_callback = CheckpointCallback(save_freq=10000, save_path='logs',
      2                                          name_prefix='rl_model')
----> 3 model.learn(500000, callback=checkpoint_callback)
      4 model.save("JOM")

8 frames
/usr/local/lib/python3.10/dist-packages/sb3_contrib/ppo_mask/ppo_mask.py in learn(self, total_timesteps, callback, log_interval, tb_log_name, reset_num_timesteps, use_masking, progress_bar)
    545                 self.logger.dump(step=self.num_timesteps)
    546 
--> 547             self.train()
    548 
    549         callback.on_training_end()

/usr/local/lib/python3.10/dist-packages/sb3_contrib/ppo_mask/ppo_mask.py in train(self)
    410                     actions = rollout_data.actions.long().flatten()
    411 
--> 412                 values, log_prob, entropy = self.policy.evaluate_actions(
    413                     rollout_data.observations,
    414                     actions,

/usr/local/lib/python3.10/dist-packages/sb3_contrib/common/maskable/policies.py in evaluate_actions(self, obs, actions, action_masks)
    331             latent_vf = self.mlp_extractor.forward_critic(vf_features)
    332 
--> 333         distribution = self._get_action_dist_from_latent(latent_pi)
    334         if action_masks is not None:
    335             distribution.apply_masking(action_masks)

/usr/local/lib/python3.10/dist-packages/sb3_contrib/common/maskable/policies.py in _get_action_dist_from_latent(self, latent_pi)
    244         """
    245         action_logits = self.action_net(latent_pi)
--> 246         return self.action_dist.proba_distribution(action_logits=action_logits)
    247 
    248     def _predict(

/usr/local/lib/python3.10/dist-packages/sb3_contrib/common/maskable/distributions.py in proba_distribution(self, action_logits)
    192         reshaped_logits = action_logits.view(-1, sum(self.action_dims))
    193 
--> 194         self.distributions = [
    195             MaskableCategorical(logits=split) for split in th.split(reshaped_logits, tuple(self.action_dims), dim=1)
    196         ]

/usr/local/lib/python3.10/dist-packages/sb3_contrib/common/maskable/distributions.py in <listcomp>(.0)
    193 
    194         self.distributions = [
--> 195             MaskableCategorical(logits=split) for split in th.split(reshaped_logits, tuple(self.action_dims), dim=1)
    196         ]
    197         return self

/usr/local/lib/python3.10/dist-packages/sb3_contrib/common/maskable/distributions.py in __init__(self, probs, logits, validate_args, masks)
     40     ):
     41         self.masks: Optional[th.Tensor] = None
---> 42         super().__init__(probs, logits, validate_args)
     43         self._original_logits = self.logits
     44         self.apply_masking(masks)

/usr/local/lib/python3.10/dist-packages/torch/distributions/categorical.py in __init__(self, probs, logits, validate_args)
     68             self._param.size()[:-1] if self._param.ndimension() > 1 else torch.Size()
     69         )
---> 70         super().__init__(batch_shape, validate_args=validate_args)
     71 
     72     def expand(self, batch_shape, _instance=None):

/usr/local/lib/python3.10/dist-packages/torch/distributions/distribution.py in __init__(self, batch_shape, event_shape, validate_args)
     66                 valid = constraint.check(value)
     67                 if not valid.all():
---> 68                     raise ValueError(
     69                         f"Expected parameter {param} "
     70                         f"({type(value).__name__} of shape {tuple(value.shape)}) "

ValueError: Expected parameter logits (Tensor of shape (64, 2)) of distribution MaskableCategorical(logits: torch.Size([64, 2])) to satisfy the constraint IndependentConstraint(Real(), 1), but found invalid values:
tensor([[nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan]], grad_fn=<SubBackward0>)

System Info

No response

Checklist

@vahidqo vahidqo added the bug Something isn't working label Dec 14, 2023
@vahidqo
Copy link
Author

vahidqo commented Jan 9, 2024

Hi,

Could you please let me know if this is my code problem or the package problem? @araffin

Thank you

@araffin araffin added more information needed Please fill the issue template completely custom gym env Issue related to Custom Gym Env No tech support We do not do tech support labels Jan 10, 2024
@vahidqo
Copy link
Author

vahidqo commented Jan 11, 2024

@araffin Thank you for your response.
Could you please explain what you mean by "more information"? Should I post all the environment code?

@vahidqo
Copy link
Author

vahidqo commented Jan 13, 2024

The detailed error is: @araffin

An error occurred during training: Function 'MseLossBackward0' returned nan values in its 1th output.
C:\Users\12368\AppData\Roaming\Python\Python311\site-packages\torch\autograd\__init__.py:200: UserWarning: Error detected in MseLossBackward0. Traceback of forward call that caused the error:
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "C:\Users\12368\AppData\Roaming\Python\Python311\site-packages\ipykernel_launcher.py", line 17, in <module>
    app.launch_new_instance()
  File "C:\Users\12368\AppData\Roaming\Python\Python311\site-packages\traitlets\config\application.py", line 1046, in launch_instance
    app.start()
  File "C:\Users\12368\AppData\Roaming\Python\Python311\site-packages\ipykernel\kernelapp.py", line 736, in start
    self.io_loop.start()
  File "C:\Users\12368\AppData\Roaming\Python\Python311\site-packages\tornado\platform\asyncio.py", line 195, in start
    self.asyncio_loop.run_forever()
  File "C:\Program Files\Python311\Lib\asyncio\base_events.py", line 607, in run_forever
    self._run_once()
  File "C:\Program Files\Python311\Lib\asyncio\base_events.py", line 1922, in _run_once
    handle._run()
  File "C:\Program Files\Python311\Lib\asyncio\events.py", line 80, in _run
    self._context.run(self._callback, *self._args)
  File "C:\Users\12368\AppData\Roaming\Python\Python311\site-packages\ipykernel\kernelbase.py", line 516, in dispatch_queue
    await self.process_one()
  File "C:\Users\12368\AppData\Roaming\Python\Python311\site-packages\ipykernel\kernelbase.py", line 505, in process_one
    await dispatch(*args)
  File "C:\Users\12368\AppData\Roaming\Python\Python311\site-packages\ipykernel\kernelbase.py", line 412, in dispatch_shell
    await result
  File "C:\Users\12368\AppData\Roaming\Python\Python311\site-packages\ipykernel\kernelbase.py", line 740, in execute_request
    reply_content = await reply_content
  File "C:\Users\12368\AppData\Roaming\Python\Python311\site-packages\ipykernel\ipkernel.py", line 422, in do_execute
    res = shell.run_cell(
  File "C:\Users\12368\AppData\Roaming\Python\Python311\site-packages\ipykernel\zmqshell.py", line 546, in run_cell
    return super().run_cell(*args, **kwargs)
  File "C:\Users\12368\AppData\Roaming\Python\Python311\site-packages\IPython\core\interactiveshell.py", line 3024, in run_cell
    result = self._run_cell(
  File "C:\Users\12368\AppData\Roaming\Python\Python311\site-packages\IPython\core\interactiveshell.py", line 3079, in _run_cell
    result = runner(coro)
  File "C:\Users\12368\AppData\Roaming\Python\Python311\site-packages\IPython\core\async_helpers.py", line 129, in _pseudo_sync_runner
    coro.send(None)
  File "C:\Users\12368\AppData\Roaming\Python\Python311\site-packages\IPython\core\interactiveshell.py", line 3284, in run_cell_async
    has_raised = await self.run_ast_nodes(code_ast.body, cell_name,
  File "C:\Users\12368\AppData\Roaming\Python\Python311\site-packages\IPython\core\interactiveshell.py", line 3466, in run_ast_nodes
    if await self.run_code(code, result, async_=asy):
  File "C:\Users\12368\AppData\Roaming\Python\Python311\site-packages\IPython\core\interactiveshell.py", line 3526, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "C:\Users\12368\AppData\Local\Temp\ipykernel_23684\999724894.py", line 2, in <module>
    model.learn(1000000, callback=checkpoint_callback)
  File "C:\Users\12368\AppData\Roaming\Python\Python311\site-packages\sb3_contrib\ppo_mask\ppo_mask.py", line 547, in learn
    self.train()
  File "C:\Users\12368\AppData\Roaming\Python\Python311\site-packages\sb3_contrib\ppo_mask\ppo_mask.py", line 447, in train
    value_loss = F.mse_loss(rollout_data.returns, values_pred)
  File "C:\Users\12368\AppData\Roaming\Python\Python311\site-packages\torch\nn\functional.py", line 3295, in mse_loss
    return torch._C._nn.mse_loss(expanded_input, expanded_target, _Reduction.get_enum(reduction))
 (Triggered internally at ..\torch\csrc\autograd\python_anomaly_mode.cpp:119.)
  Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass

@vahidqo
Copy link
Author

vahidqo commented Jan 15, 2024

@araffin More info if that helps:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[72], line 2
      1 try:
----> 2     model.learn(1000000)
      3 except (AssertionError, ValueError) as e:
      4     print("An error occurred during training:", e)

File ~\AppData\Roaming\Python\Python311\site-packages\sb3_contrib\ppo_mask\ppo_mask.py:547, in MaskablePPO.learn(self, total_timesteps, callback, log_interval, tb_log_name, reset_num_timesteps, use_masking, progress_bar)
    544         self.logger.record("time/total_timesteps", self.num_timesteps, exclude="tensorboard")
    545         self.logger.dump(step=self.num_timesteps)
--> 547     self.train()
    549 callback.on_training_end()
    551 return self

File ~\AppData\Roaming\Python\Python311\site-packages\sb3_contrib\ppo_mask\ppo_mask.py:478, in MaskablePPO.train(self)
    476 # Optimization step
    477 self.policy.optimizer.zero_grad()
--> 478 loss.backward()
    479 # Clip grad norm
    480 th.nn.utils.clip_grad_norm_(self.policy.parameters(), self.max_grad_norm)

File ~\AppData\Roaming\Python\Python311\site-packages\torch\_tensor.py:487, in Tensor.backward(self, gradient, retain_graph, create_graph, inputs)
    477 if has_torch_function_unary(self):
    478     return handle_torch_function(
    479         Tensor.backward,
    480         (self,),
   (...)
    485         inputs=inputs,
    486     )
--> 487 torch.autograd.backward(
    488     self, gradient, retain_graph, create_graph, inputs=inputs
    489 )

File ~\AppData\Roaming\Python\Python311\site-packages\torch\autograd\__init__.py:200, in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables, inputs)
    195     retain_graph = create_graph
    197 # The reason we repeat same the comment below is that
    198 # some Python versions print out the first line of a multi-line function
    199 # calls in the traceback and some print out the last line
--> 200 Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
    201     tensors, grad_tensors_, retain_graph, create_graph, inputs,
    202     allow_unreachable=True, accumulate_grad=True)

RuntimeError: Function 'MseLossBackward0' returned nan values in its 1th output.

@araffin
Copy link
Member

araffin commented Jan 16, 2024

Might be a duplicate of #81 or #195
Probably a combination from your env/hyperparameters.

Please note that we do not offer tech support, see #81 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working custom gym env Issue related to Custom Gym Env more information needed Please fill the issue template completely No tech support We do not do tech support
Projects
None yet
Development

No branches or pull requests

2 participants