Skip to content

Commit abcb6d2

Browse files
committed
Add contributing guide file.
1 parent abfd595 commit abcb6d2

File tree

5 files changed

+14
-27
lines changed

5 files changed

+14
-27
lines changed

CONTRIBUTING.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
[Git Guide](https://di-engine-docs.readthedocs.io/en/latest/24_cooperation/git_guide.html)
2+
3+
[GitHub Cooperation Guide](https://di-engine-docs.readthedocs.io/en/latest/24_cooperation/issue_pr.html)
4+
5+
- [Code Style](https://di-engine-docs.readthedocs.io/en/latest/21_code_style/index.html)
6+
- [Unit Test](https://di-engine-docs.readthedocs.io/en/latest/22_test/index.html)
7+
- [Code Review](https://di-engine-docs.readthedocs.io/en/latest/24_cooperation/issue_pr.html#pr-s-code-review)

README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
44

5-
English | [简体中文(Simplified Chinese)](https://github.com/opendilab/GenerativeRL_Preview/blob/main/README.zh.md)
5+
English | [简体中文(Simplified Chinese)](https://github.com/opendilab/GenerativeRL/blob/main/README.zh.md)
66

77
**GenerativeRL**, short for Generative Reinforcement Learning, is a Python library for solving reinforcement learning (RL) problems using generative models, such as diffusion models and flow models. This library aims to provide a framework for combining the power of generative models with the decision-making capabilities of reinforcement learning algorithms.
88

@@ -62,8 +62,8 @@ pip install grl
6262
Or, if you want to install from source:
6363

6464
```bash
65-
git clone https://github.com/opendilab/GenerativeRL_Preview.git
66-
cd GenerativeRL_Preview
65+
git clone https://github.com/opendilab/GenerativeRL.git
66+
cd GenerativeRL
6767
pip install -e .
6868
```
6969

README.zh.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
44

5-
[英语 (English)](https://github.com/opendilab/GenerativeRL_Preview/blob/main/README.md) | 简体中文
5+
[英语 (English)](https://github.com/opendilab/GenerativeRL/blob/main/README.md) | 简体中文
66

77
**GenerativeRL** 是一个使用生成式模型解决强化学习问题的算法库,支持扩散模型和流模型等不同类型的生成式模型。这个库旨在提供一个框架,将生成式模型的能力与强化学习算法的决策能力相结合。
88

@@ -59,8 +59,8 @@ pip install grl
5959
或者,如果你想从源码安装:
6060

6161
```bash
62-
git clone https://github.com/opendilab/GenerativeRL_Preview.git
63-
cd GenerativeRL_Preview
62+
git clone https://github.com/opendilab/GenerativeRL.git
63+
cd GenerativeRL
6464
pip install -e .
6565
```
6666

docs/source/tutorials/installation/index.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,4 +17,4 @@ If you want to try a preview of the latest features, you can install the latest
1717

1818
.. code-block:: console
1919
20-
$ pip install git+https://github.com/opendilab/GenerativeRL_Preview.git
20+
$ pip install git+https://github.com/opendilab/GenerativeRL.git

grl/algorithms/srpo.py

Lines changed: 0 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -385,33 +385,18 @@ def policy(obs: np.ndarray) -> np.ndarray:
385385
lr=config.parameter.behaviour_policy.learning_rate,
386386
)
387387

388-
# checkpoint = torch.load(
389-
# "/root/github/GenerativeRL_Preview/grl_pipelines/d4rl-halfcheetah-srpo/2024-04-17 06:22:21/checkpoint_diffusion_600000.pt"
390-
# )
391-
# self.model["SRPOPolicy"].sro.diffusion_model.model.load_state_dict(
392-
# checkpoint["diffusion_model"]
393-
# )
394-
# behaviour_model_optimizer.load_state_dict(
395-
# checkpoint["behaviour_model_optimizer"]
396-
# )
397-
398388
for train_diffusion_iter in track(
399389
range(config.parameter.behaviour_policy.iterations),
400390
description="Behaviour policy training",
401391
):
402392
data = next(data_generator)
403-
# data["s"].shape torch.Size([2048, 17]) data["a"].shape torch.Size([2048, 6]) data["r"].shape torch.Size([2048, 1])
404393
behaviour_model_training_loss = self.model[
405394
"SRPOPolicy"
406395
].behaviour_policy_loss(data["a"], data["s"])
407396
behaviour_model_optimizer.zero_grad()
408397
behaviour_model_training_loss.backward()
409398
behaviour_model_optimizer.step()
410399

411-
# if train_iter == 0 or (train_iter + 1) % config.parameter.evaluation.evaluation_interval == 0:
412-
# evaluation_results = evaluate(self.model["SRPOPolicy"], train_iter=train_iter)
413-
# wandb_run.log(data=evaluation_results, commit=False)
414-
415400
wandb_run.log(
416401
data=dict(
417402
train_diffusion_iter=train_diffusion_iter,
@@ -444,11 +429,6 @@ def policy(obs: np.ndarray) -> np.ndarray:
444429
lr=config.parameter.critic.learning_rate,
445430
)
446431

447-
# checkpoint = torch.load(
448-
# "/root/github/GenerativeRL_Preview/grl_pipelines/d4rl-halfcheetah-srpo/2024-04-17 06:22:21/checkpoint_critic_600000.pt"
449-
# )
450-
# self.model["SRPOPolicy"].critic.q0.load_state_dict(checkpoint["q_model"])
451-
# self.model["SRPOPolicy"].critic.vf.load_state_dict(checkpoint["v_model"])
452432
data_generator = get_train_data(
453433
DataLoader(
454434
self.dataset,

0 commit comments

Comments
 (0)