refactor(lyd): refactor dt_policy in new pipeline #687

AltmanD · 2023-07-18T07:53:11Z

Description

新的pipeline下DT效果不如旧的pipeline好

PaParaZz1 · 2023-07-18T11:29:35Z

ding/envs/env_wrappers/env_wrappers.py

@@ -1174,6 +1174,23 @@ def reset(self):
            return self.env.reset()


+class AllinObsWrapper(gym.Wrapper):


add overview description for this wrapper

PaParaZz1 · 2023-07-18T11:30:17Z

ding/policy/command_mode_policy_instance.py

@@ -42,7 +42,8 @@

 from .d4pg import D4PGPolicy
 from .cql import CQLPolicy, CQLDiscretePolicy
-from .decision_transformer import DTPolicy
+# from .decision_transformer import DTPolicy


remove the commented code

PaParaZz1 · 2023-07-18T11:31:12Z

ding/policy/dt.py

+class DTPolicy(Policy):
+    r"""
+    Overview:
+        Policy class of DT algorithm in discrete environments.


add the full name of DT and the paper link

PaParaZz1 · 2023-07-18T11:31:36Z

ding/policy/dt.py

+
+    def _init_learn(self) -> None:
+        r"""
+            Overview:


polish indents

PaParaZz1 · 2023-07-18T11:32:17Z

dizoo/box2d/lunarlander/config/lunarlander_decision_transformer.py

@@ -27,7 +27,7 @@
        embed_dim=128,
        n_heads=1,
        dropout_p=0.1,
-        log_dir='/home/puyuan/DI-engine/dizoo/box2d/lunarlander/dt_log_1000eps',
+        log_dir='/mnt/nfs/luyd/DI-engine/dizoo/box2d/lunarlander/dt_log_1000eps',        


don't upload the absolute path

PaParaZz1 · 2023-07-18T11:32:40Z

dizoo/box2d/lunarlander/dt_log_1000eps/dt_LunarLander-v2_log_23-07-13-08-57-28.csv

@@ -0,0 +1,2 @@
+duration,num_updates,eval_avg_reward,eval_avg_ep_len,eval_d4rl_score


remove unnecessary log files

PaParaZz1 · 2023-07-18T11:36:38Z

ding/utils/data/dataset.py

-
-        self.context_len = context_len
+    def __init__(self, cfg: dict) -> None:
+        dataset_path = cfg.policy.collect.get('data_path', None)


don't use xxx.get, we must ensure all the configs are fixed after compile_config

PaParaZz1 · 2023-07-18T11:51:26Z

ding/policy/dt.py

+
+        self.running_rtg = [self.rtg_target / self.rtg_scale] * self.eval_batch_size
+        self.t = [0] * self.eval_batch_size
+        self.timesteps = torch.arange(start=0, end=self.max_eval_ep_len, step=1).repeat(self.eval_batch_size, 1).to(self.device)


indicate device when use torch.arange

AltmanD added 3 commits July 14, 2023 10:18

Revise old version dt pipline

12c9dd5

Add new dt pipline

887b587

Add DT in new pipeline

737b7b6

AltmanD closed this Jul 18, 2023

AltmanD reopened this Jul 18, 2023

AltmanD marked this pull request as ready for review July 18, 2023 07:54

PaParaZz1 added algo Add new algorithm or improve old one refactor refactor module or component labels Jul 18, 2023

PaParaZz1 requested changes Jul 18, 2023

View reviewed changes

AltmanD changed the title ~~Dev dt in new pipeline~~ refactor(lyd): refactor dt_policy in new pipeline Jul 18, 2023

AltmanD added 2 commits July 25, 2023 17:09

Add img input to support atari

fce01fc

Fix according to comment

8b330a6

AltmanD closed this Jul 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(lyd): refactor dt_policy in new pipeline #687

refactor(lyd): refactor dt_policy in new pipeline #687

AltmanD commented Jul 18, 2023

PaParaZz1 Jul 18, 2023

PaParaZz1 Jul 18, 2023

PaParaZz1 Jul 18, 2023

PaParaZz1 Jul 18, 2023

PaParaZz1 Jul 18, 2023

PaParaZz1 Jul 18, 2023

PaParaZz1 Jul 18, 2023

PaParaZz1 Jul 18, 2023

		@@ -1174,6 +1174,23 @@ def reset(self):
		return self.env.reset()


		class AllinObsWrapper(gym.Wrapper):

		@@ -0,0 +1,2 @@
		duration,num_updates,eval_avg_reward,eval_avg_ep_len,eval_d4rl_score

refactor(lyd): refactor dt_policy in new pipeline #687

refactor(lyd): refactor dt_policy in new pipeline #687

Conversation

AltmanD commented Jul 18, 2023

Description

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment