NREL · zxymark221 · Feb 20, 2024 · Feb 21, 2024 · Feb 21, 2024 · Feb 21, 2024
diff --git a/.gitignore b/.gitignore
@@ -1,2 +1,4 @@
 .*
 !/.gitignore
+**.pyc
+**/__pycache__
diff --git a/languages/python/RLlib/README.md b/languages/python/RLlib/README.md
@@ -0,0 +1,110 @@
+## Use this tutorial
+
+This tutorial provides examples on how to use [RLlib](https://docs.ray.io/en/master/rllib/) for reinforcement learning, with an emphasis on building customized environments for your own optimal control problems. The tutorial is developed assuming using NREL HPC system Kestrel, but it can be easily modified to be able to run on a local computer.
+
+We suggest follow this tutorial with the following order:
+
+1. Understand how to build a custom environment for your problem. Detailed guidelines are provided [here](custom_gym_env/README.md).
+
+2. Train the RL agent/policy/controller by following [this guideline](train/README.md).
+
+3. Test the trained RL agent as explained [here](test/README.md).
+
+But before that, please follow the instructions below to set up a Python Conda environment.
+
+## Create Anaconda environment
+
+Follow the following steps to create an Anaconda environment for this experiment:
+
+### 1st step: Log in on Kestrel (Can be skipped if work on local computer)
+
+Login on Kestrel with:
+```
+ssh kestrel
+```
+if you have hostname configured, or
+```
+ssh <username>@kestrel.hpc.nrel.gov
+```
+
+### 2nd step: Set up Anaconda environment
+
+To use `conda` on Kestrel (different from Eagle), the Anaconda module needs to be loaded.
+```
+module purge
+module load anaconda3
+```
+
+We suggest creating a conda environment on your `\projects` rather than `\home` or `\scratch`.
+
+***Example:***
+
+Use the following script to create a conda environment:
+```
+mkdir -p /projects/$HPC_HANDLE/$USER/conda_envs
+conda create --prefix=/projects/$HPC_HANDLE/$USER/conda_envs/rl_hpc python=3.10
+```
+
+Here, `$HPC_HANDLE` is the project handle and `$USER` is your HPC user name.
+
+Activate the conda environment and install packages:
+
+```
+conda activate /projects/$HPC_HANDLE/$USER/conda_envs/rl_hpc
+
+pip install -r requirements.txt
+```
+
+### 3rd step: Test OpenAI Gym API
+
+After installation is complete, make sure everything is working correctly. You can test your installation by running a small example using one of the standard Gym environments (e.g. `CartPole-v1`).
+
+Activate the enironment and start a Python session
+```
+module purge
+module load anaconda3
+conda activate /projects/$HPC_HANDLE/$USER/conda_envs/rl_hpc
+python
+```
+Request an interactive session on Kestrel, and then, run the following:
+```python
+import gymnasium as gym
+
+env = gym.make("CartPole-v1")
+obs, info = env.reset()
+
+done = False
+
+while not done:
+    action = env.action_space.sample()
+    obs, rew, terminated, truncated, info = env.step(action)
+    done = (terminated or truncated)
+    print(action, obs, rew, done)
+```
+If everything works correctly, you will see an output similar to:
+```
+0 [-0.04506794 -0.22440939 -0.00831435  0.26149667] 1.0 False
+1 [-0.04955613 -0.02916975 -0.00308441 -0.03379707] 1.0 False
+0 [-0.05013952 -0.22424733 -0.00376036  0.2579111 ] 1.0 False
+0 [-0.05462447 -0.4193154   0.00139787  0.54940559] 1.0 False
+0 [-0.06301078 -0.61445696  0.01238598  0.84252861] 1.0 False
+1 [-0.07529992 -0.41950623  0.02923655  0.55376634] 1.0 False
+0 [-0.08369004 -0.61502627  0.04031188  0.85551538] 1.0 False
+0 [-0.09599057 -0.8106737   0.05742218  1.16059658] 1.0 False
+0 [-0.11220404 -1.00649474  0.08063412  1.47071687] 1.0 False
+1 [-0.13233393 -0.81244634  0.11004845  1.20427076] 1.0 False
+1 [-0.14858286 -0.61890536  0.13413387  0.94800442] 1.0 False
+0 [-0.16096097 -0.8155534   0.15309396  1.27964413] 1.0 False
+1 [-0.17727204 -0.62267747  0.17868684  1.03854806] 1.0 False
+0 [-0.18972559 -0.81966549  0.1994578   1.38158021] 1.0 False
+0 [-0.2061189  -1.0166379   0.22708941  1.72943365] 1.0 True
+```
+
+### 4th step: Test other libraries
+The following libraries should also be imported without an error.
+
+```
+import ray
+import torch
+```
+
diff --git a/languages/python/RLlib/custom_gym_env/README.md b/languages/python/RLlib/custom_gym_env/README.md
@@ -0,0 +1,100 @@
+# Create a customized Gym environment
+
+This section demonstrate how one can create their own Gym environment, carefully tailor-made to one's needs. At NREL, this could be for an optimal control problem in grid operation, building energy management or traffic control.
+
+## High-level overview
+
+To facilitate the deep RL implementation and tests of new algorithms, OpenAI Gym has been the standard interface to connect RL agent/algorithms with the problems. Given such a standard interface, RL training and experiments can be done in a plug-and-play manner, as shown in the figure below. We can use any RL agent implementation (e.g., RLlib and Stable-Baseline3) with different RL algorithms (e.g., PPO, SAC, A3C, and DDPG) to learn a policy for different problems (e.g., cart-pole, lunar landing or the problem of your interest) via the standard interface.
+
+<p align="center">
+    <img src="../tutorial_img/gym_and_agent.png" alt="Running Tensorboard" width="70%"><br>
+    <em>The interaction between Gym environment and RL agent.</em>
+</p>
+
+To allow using RL training framework such as RLlib in this tutorial to train an optimal policy for the problem of our interest, the customized environment should follow the Gym API standard.
+
+## Gym environment API structure 
+
+After the latest release of 0.26.2, in the [OpenAI Gym repo](https://github.com/openai/gym), it is [announced](https://github.com/openai/gym?tab=readme-ov-file#important-notice) that all future maintenance and updates are moved to [Gymnasium](https://github.com/Farama-Foundation/Gymnasium). So this tutorial will follow the Gymnasium API guidelines, but we will still refer the environment as "gym" environment for simplicity.
+
+This tutorial also focuses on __episodic__ environment, meaning the optimal control is implemented on a limited step control horizon. The episode ends either when a fixed number of steps are reached or terminal states are reached. 
+
+As the first step, import gymnasium
+```
+import gymnasium as gym
+```
+
+The cusmtom-made gym environment should follow the following structure with three core functions:
+
+```
+class CustomEnv(gym.Env):
+
+    def __init__(self):
+        # Initialized the environment.
+        # Called only once when instantiate this class.
+        ...
+
+    def reset(self, seed=None, options={}):
+        # Reset the environment to the beginning of the control episode.
+        # Called once an episode is complete.
+        ...
+        return obs, info
+
+    def step(self, action):
+        # Implement the control using the provided action.
+        # Called at each step/control interval within the episode.
+        ...
+        return obs, reward, terminated, truncated, info
+```
+
+Next, we explain more details:
+
+Typically, the environment should inherit the `gym.Env` class by definining `class CustomEnv(gym.Env)`. The three core gym functions are:
+
+ * `def __init__(self)`: Initializes the environment. More specifically, it generally involves the following three tasks:
+   - Defining necessary variables/hyperparameters.  
+   - Defining the dimensionality of the observation and action space for the problem, which are given using the parameters `self.observation_space` and `self.action_space`, respectively. Depending on their nature, they can take discrete, continuous, or a combination of values. See [this link](https://gymnasium.farama.org/api/spaces/) for more details.
+   - [Optional] Initializing the external simulator if you need one.
+ * `def reset(self, seed=None, options={})`: RL training requires repeated trial-and-error. Once an episode ends, either terminated or truncated, the `reset` function allows the agent to reset the environment back to the initial state, start over  and try again. 
+ Specific things could include:
+   - Resetting the state of the environment
+   - Resetting other utilities, such as setting the step counter back to zero, i.e., `self.step_count = 0`.
+   - Resetting the simulator if you have one.
+
+   Argument `seed` is used to set the random seed and `options` can pass in some desired configuration for resetting, i.e., resetting to a specific inital state if otherwise generated randomly.
+
+   Outputs of the reset function are `obs` and `info`, `obs` is the observation that the RL agent uses as a policy input, while `info` can be a dictionary including some auxilary information (not necessary for learning, but could be used for other purposes such as debugging.)
+
+ * `def step(self, action)`: It defines the inner mechanics of the environment, and moves the simulation one step ahead using the provided `action`. Specific tasks inside this function can be categorized into the following four:
+    - Advance the system for one step through the state transition function $s_{t+1} = f(s_t, a_t)$ defined by the Markov decision process. If you have external simulator, take one step in the simulator as well. Obtain the new observation and remember to increase to the step counter if necessary: `self.step_count += 1`.
+    - Calculate the reward based on this step's control.
+    - Determine if current episode should end or not, either terminated or truncated. For example, if `self.step_count >= MAX_STEP_ALLOWED`, set `truncated=True`.
+    - [Optional] Prepare additional information and put them in the `info` dictionary. 
+
+    Outputs of this function are:
+    - `obs`: Usually a Numpy array collecting the new observation after the control of this step.
+    - `reward`: A scalar reward reflecting the performance of this step's control.
+    - `terminated`: A Boolean reflecting if terminal states are reached. E.g., the goal area is reached.
+    - `truncated`: A Boolean reflecting if time limit is reached or other conditions that require stops the simulation.
+    - `info`: A python dictionary containing auxilary information. If none, use `info={}`.
+
+## Example: create a customized environment
+
+This tutorial provides an example of customized environment called "CarPass", letting an RL agent to learn how to control a moving car to the target way point as fast as possible while avoid colliding with a parked stationary car and be out of bound. The figure below shows an illustration.
+
+<p align="center">
+    <img src="../tutorial_img/custom_env_illustration.png" alt="Running Tensorboard" width="40%"><br>
+    <em>Illustration of the car pass environment.</em>
+</p>
+
+See the [full implementation](custom_env.py) for details, and the comments and docstrings should provide some explanations. 
+
+One thing to notice is that there is a `render()` function in the customized environment, which is used to render the 2D image as shown above for visualization purpose. It is optional, so no need to implement `render()` if not necessary.
+
+## More examples
+
+Grid:
+- Distribution System critical load restoration environment, see [this repo](https://github.com/NREL/rlc4clr/tree/main/rlc4clr/clr_envs/envs).
+
+Building Control:
+- Five-zone building HVAC Control, see [this file](https://github.com/NREL/learning-building-control/blob/main/lbc/building_env.py). This example still uses OpenAI Gym API instead of Gymnasium API though.