Skip to content

Commit b00d9bd

Browse files
authored
Update report (#518)
* Update report * Update headers
1 parent dc8cc1e commit b00d9bd

File tree

1 file changed

+41
-35
lines changed
  • docs/homepage/blog/ospp_final_term_report_210370741

1 file changed

+41
-35
lines changed

docs/homepage/blog/ospp_final_term_report_210370741/index.md

Lines changed: 41 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
@def title = "General Pipeline for Offline Reinforcement Learning Evaluation Report"
22
@def description = """
3-
This is a technical report of the Summer OSPP project [Establish a General Pipeline for Offline Reinforcement Learning Evaluation](https://summer.iscas.ac.cn/#/org/prodetail/210370741?lang=en) used for final term evaluation. It provides an overview of the work done during mid-term and during final evaluation phases.
3+
This is a technical report of the Summer OSPP project [Establish a General Pipeline for Offline Reinforcement Learning Evaluation](https://summer.iscas.ac.cn/#/org/prodetail/210370741?lang=en) used for final term evaluation. It provides an overview of the work done during mid-term and the final evaluation phases.
4+
"""
45
@def is_enable_toc = true
56
@def has_code = true
67
@def has_math = true
@@ -15,27 +16,31 @@
1516
"affiliationURL":"https://www.nitt.edu/"
1617
}
1718
],
18-
"publishedDate":"2021-08-15",
19+
"publishedDate":"2021-09-30",
1920
"citationText":"Prasidh Srikumar, 2021"
2021
}"""
2122

22-
@def bibliography = "bibliography.bib"
23+
@def appendix = """
24+
### Corrections
25+
If you see mistakes or want to suggest changes, please [create an issue](https://github.com/JuliaReinforcementLearning/ReinforcementLearning.jl/issues) in the source repository.
26+
"""
2327

24-
## 1. Introduction
28+
@def bibliography = "bibliography.bib"
2529

26-
### Project Name
30+
# Introduction
2731

32+
## Project Name
2833
Establish a General Pipeline for Offline Reinforcement Learning Evaluation
2934

30-
### Background
35+
## Background
3136

3237
In recent years, there have been several breakthroughs in the field of Reinforcement Learning with numerous practical applications where RL bots have been able to achieve superhuman performance. This is also reflected in the industry where several cutting edge solutions have been developed based on RL ([Tesla Motors](https://www.tesla.com/), [AutoML](https://cloud.google.com/automl), [DeepMind data center cooling solutions](https://deepmind.com/blog/article/deepmind-ai-reduces-google-data-centre-cooling-bill-40) just to name a few).
3338

3439
One of the most prominent challenges in RL is the lack of reliable environments for training RL agents. **Offline RL** has played a pivotal role in solving this problem by removing the need for the agent to interact with the environment to improve its policy over time. This brings forth the problem of not having reliable tests to verify the performance of RL algorithms. Such tests are facilitated by standard datasets ([RL Unplugged](https://arxiv.org/abs/2006.13888)\dcite{DBLP:journals/corr/abs-2006-13888}, [D4RL](https://arxiv.org/abs/2004.07219)\dcite{DBLP:journals/corr/abs-2004-07219} and [An Optimistic Perspective on Offline Reinforcement Learning](https://arxiv.org/abs/1907.04543)\dcite{agarwal2020optimistic}) that are used to train Offline RL agents and benchmark against other algorithms and implementations. [ReinforcementLearningDatasets.jl](https://github.com/JuliaReinforcementLearning/ReinforcementLearning.jl/tree/master/src/ReinforcementLearningDatasets) provides a simple solution to access various standard datasets that are available for Offline RL benchmarking across a variety of tasks.
3540

3641
Another problem in Offline RL is Offline Model Selection. For this, there are numerous policies that are available in [Benchmarks for Deep Off-Policy Evaluation](https://openreview.net/forum?id=kWSeGEeHvF8)\dcite{DBLP:journals/corr/abs-2103-16596}. ReinforcementLearningDatasets.jl will also help in loading policies that will aid in model selection in ReinforcementLearning.jl package.
3742

38-
## 2. Project Overview
43+
## Project Overview
3944

4045
### Objectives
4146

@@ -80,7 +85,7 @@ Refer the following [discussion](https://github.com/JuliaReinforcementLearning/R
8085

8186
There are some changes to the original timeline based on a few time constraints but the basic objectives of the project are accomplished.
8287

83-
## 3. Implementation datasets
88+
## Datasets
8489

8590
### Documentation
8691

@@ -177,7 +182,7 @@ ds = ds = rl_unplugged_atari_dataset(
177182

178183
The type that is returned is a `Channel{AtariRLTransition}` which returns batches with the given specifications from the buffer when `take!` is used. The point to be noted here is that it takes seconds to load the datasets into the `Channel` and the loading is highly customizable.
179184

180-
```
185+
```julia
181186
julia> ds = ds = rl_unplugged_atari_dataset(
182187
"Pong",
183188
1,
@@ -210,7 +215,7 @@ julia> size(batch.reward)
210215
```
211216
212217
213-
### Relevant commits, discussions and PRs
218+
## Relevant commits, discussions and PRs
214219
215220
- [Updated RLDatasets.jl #403](https://github.com/JuliaReinforcementLearning/ReinforcementLearning.jl/pull/403)
216221
- [Expand to d4rl-pybullet #416](https://github.com/JuliaReinforcementLearning/ReinforcementLearning.jl/pull/416)
@@ -219,7 +224,7 @@ julia> size(batch.reward)
219224
- [Features for Offline Reinforcement Learning Pipeline #359](https://github.com/JuliaReinforcementLearning/ReinforcementLearning.jl/discussions/359)
220225
- [Fix record_type issue #24](https://github.com/JuliaReinforcementLearning/TFRecord.jl/pull/24)
221226
222-
## 4. Implementation Details and Challenges Faced
227+
## Implementation Details and Challenges Faced
223228
224229
The challenge that was faced during the first week was to chart out a direction for RLDatasets.jl. I researched the implementations of the pipeline in [d3rlpy](https://github.com/takuseno/d3rlpy), [TF.data.Dataset](https://www.tensorflow.org/datasets) etc and then narrowed down some inspiring ideas in the [discussion](https://github.com/JuliaReinforcementLearning/ReinforcementLearning.jl/discussions/359).
225230
@@ -384,27 +389,19 @@ res = Channel{AtariRLTransition}(n_preallocations; taskref=taskref, spawn=true)
384389
end
385390
```
386391
387-
## 5. Implications
388-
389-
Equipping RL.jl with RLDatasets.jl is a key step in making the package more industry relevant because different offline algorithms can be compared with respect to a variety of standard offline dataset benchmarks. It is also meant to improve the implementations of existing offline algorithms and make it on par with the SOTA implementations. This package provides a seamless way of downloading and accessing existing datasets and also supports loading datasets into memory with ease, which if implemented separately, would be tedious for the user.
390-
391-
After the implementation of [Benchmarks for Deep Off-Policy Evaluation](https://github.com/google-research/deep_ope), testing and comparing algorithms would be much easier than before. This package would also make SOTA offline RL more accessible and reliable than ever before in ReinforcementLearning.jl.
392-
393392
# Technical report (final term evaluation)
394-
The following is the final term evaluation report of "General Pipeline for Offline Reinforcement Learning Evaluation Report" in OSPP. Details of all the work that has been done after the mid-term evaluation and some explanation on the current status of the Package are given. Some exciting work that is possible based on this project is also given.
395-
396-
## Completed Work
397-
The following work has been done post mid-term evaluation.
393+
The following is the final term evaluation report of "General Pipeline for Offline Reinforcement Learning Evaluation Report" in OSPP. Details of all the work that has been done after the mid-term evaluation and some explanation on the current status of the package are given. Some exciting work that is possible based on this project is also given.
398394
399-
### Summary
400-
The following is the summary of the project work.
395+
## Summary
401396
402397
- Polished and finalized the structure of the package. Improved usability by updating the [docs](https://juliareinforcementlearning.org/docs/rldatasets/) accordingly.
403398
- Fixed the `run` error that was shown in windows.
404399
- Added `Bsuite` and all `DM` environments including [`DeepMind Control Suite Dataset`](https://github.com/deepmind/deepmind-research/tree/master/rl_unplugged#deepmind-control-suite-dataset), [`DeepMind Lab Dataset`](https://github.com/deepmind/deepmind-research/tree/master/rl_unplugged#deepmind-lab-dataset) and [`DeepMind Locomotion Dataset`](https://github.com/deepmind/deepmind-research/tree/master/rl_unplugged#deepmind-locomotion-dataset_) in RL Unplugged Datasets\dcite{DBLP:journals/corr/abs-2006-13888}.
405400
- Added [Deep OPE](https://github.com/google-research/deep_ope)\dcite{DBLP:journals/corr/abs-2103-16596} models for D4RL datasets.
406-
- Researched and implemented FQE\dcite{DBLP:journals/corr/abs-2007-09055} for which the basic implementation works. There are some flaws that need to be fixed.
401+
- Researched and implemented FQE\dcite{DBLP:journals/corr/abs-2007-09055} for which the basic implementation works but there are some flaws that need to be fixed.
407402
403+
## Completed Work
404+
The following work has been done post mid-term evaluation.
408405
409406
### Bsuite Datasets
410407
It involved work similar to RL Unplugged Atari Datasets which involves multi threaded dataloading. It is implemented using a [`Ring Buffer`](https://github.com/JuliaReinforcementLearning/ReinforcementLearning.jl/blob/f1837a93c4c061925d92167c3480a423007dae5c/src/ReinforcementLearningDatasets/src/rl_unplugged/util.jl#L89) for storing and loading batches of data.
@@ -561,7 +558,7 @@ RingBuffer{NamedTuple{(:reward, :episodic_reward, :discount, :state, :next_state
561558
```
562559
### Deep OPE
563560
564-
Support is given for D4RL policies provided in [Deep OPE](https://github.com/google-research/deep_ope).
561+
Support is given for D4RL policies provided in [Deep OPE](https://github.com/google-research/deep_ope)\dcite{DBLP:journals/corr/abs-2103-16596}.
565562
566563
#### Implementation
567564
The policies that are given [here](https://github.com/JuliaReinforcementLearning/ReinforcementLearning.jl/blob/master/src/ReinforcementLearningDatasets/src/deep_ope/d4rl/d4rl_policies.jl) are loaded using `d4rl_policy` function.
@@ -674,7 +671,7 @@ The implementation in RLZoo is based on [Hyperparameter Selection for Offline Re
674671
675672
\dfig{body;OPE_and_Online_Hyperparameter_Selection.png}
676673
677-
The average of values chosen by the policies based on initial states can be taken as the reward that the policy would gain from the environment. So, the same can be used for online hyper parameter selection.
674+
The average of values calculated by FQE based on initial states can be taken as the reward that the policy would gain from the environment. So, the same can be used for online hyper parameter selection.
678675
679676
The pseudocode for the implementation and the objective function are as follows.
680677
@@ -743,45 +740,54 @@ end
743740
#### Results
744741
The [implementation](https://github.com/JuliaReinforcementLearning/ReinforcementLearning.jl/pull/515) is still a work in progress because of some sampling error. But the algorithm that I implemented without RL.jl framework works as expected.
745742
746-
##### Parameter Values:
743+
##### Parameter Values
747744
- Policy => CRR Policy
748745
- Env => PendulumEnv
749746
- q_networks => Two 64 neuron layers with `n_s+n_a` input neurons and `1` output neuron.
750747
- optimizer => ADAM(0.005)
751748
- loss => Flux.Losses.mse
752749
- γ => 0.99
753-
- batch_size => 256
754-
- update_freq, update_step => 1
755-
- tar_update_freq => 256
750+
- batch\_size => 256
751+
- update\_freq, update\_step => 1
752+
- tar\_update\_freq => 256
753+
- number of training steps => 40_000
756754
757755
##### Evaluation Results
758756
759-
/dfig{body;FQE_Evaluation_Result.png}
757+
The values evaluated by FQE for 100 initial states.
758+
759+
\dfig{body;FQE_Evaluation_Result.png}
760760
761761
mean=-243.0258f0
762762
763763
##### Actual Values
764764
765-
/dfig{body;Actual_Evaluation_Result.png}
765+
The values obtained by running the agent in the environment for 100 iterations.
766+
767+
\dfig{body;Actual_Evaluation_Result.png}
766768
767769
mean=-265.7068139137983
768770
769-
### Relevant Commits and PRs
771+
## Relevant Commits and PRs
770772
771773
- [Fix RLDatasets.jl documentation (#467)](https://github.com/JuliaReinforcementLearning/ReinforcementLearning.jl/commit/b29c9f01240d6aae9e6f7acc28a0a1e95cf29f76#diff-d7a7b3de8d5eedecb629c4d80b6b249d68d15d6f66a7ef768bf4eb937fd5a5d7)
772774
- [Add bsuite datasets (#482)](https://github.com/JuliaReinforcementLearning/ReinforcementLearning.jl/commit/4326df59296a6edc488b77f29c4968853280db85#diff-d7a7b3de8d5eedecb629c4d80b6b249d68d15d6f66a7ef768bf4eb937fd5a5d7)
773775
- [Add dm datasets (#495)](https://github.com/JuliaReinforcementLearning/ReinforcementLearning.jl/commit/9185c8548197dd4a6ef0cd7c84c3531c491e6447#diff-d7a7b3de8d5eedecb629c4d80b6b249d68d15d6f66a7ef768bf4eb937fd5a5d7)
774776
- [Add support for deep ope in RLDatasets.jl (#500)](https://github.com/JuliaReinforcementLearning/ReinforcementLearning.jl/commit/1a00766e9df3edc19cd7377a595b4563261a0356#diff-d7a7b3de8d5eedecb629c4d80b6b249d68d15d6f66a7ef768bf4eb937fd5a5d7)
775777
- [WIP to implement FQE #515](https://github.com/JuliaReinforcementLearning/ReinforcementLearning.jl/pull/515)
776778
777-
### Conclusion
779+
## Conclusion
778780
The foundations of RLDatasets.jl package has been laid during the course of the project. The basic datasets except for Real World Datasets from RL Unplugged have been supported. Furthermore, D4RL policies have been successfully loaded and tested. The algorithm for FQE has been tried out with a minor implementation detail pending.
779781
780782
With the completion of FQE the four requirements of OPE as laid out by [Deep OPE](https://github.com/google-research/deep_ope)\dcite{DBLP:journals/corr/abs-2103-16596} will be completed for D4RL.
781783
782784
\dfig{body;OPE_Requirements.png}
783785
784-
#### Future Scope
786+
### Implications
787+
788+
Equipping RL.jl with RLDatasets.jl is a key step in making the package more industry relevant because different offline algorithms can be compared with respect to a variety of standard offline dataset benchmarks. It is also meant to improve the implementations of existing offline algorithms and make it on par with the SOTA implementations. This package provides a seamless way of downloading and accessing existing datasets and also supports loading datasets into memory with ease, which if implemented separately, would be tedious for the user. It also incorporates policies that can be useful for testing Off Policy Evaluation Methods.
789+
790+
### Future Scope
785791
There are several exciting work that are possible from this point.
786792
787793
- Testing and improvement of already existing Offline Algorithms in RLZoo.jl.

0 commit comments

Comments
 (0)