You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/homepage/blog/ospp_final_term_report_210370741/index.md
+41-35Lines changed: 41 additions & 35 deletions
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,7 @@
1
1
@def title = "General Pipeline for Offline Reinforcement Learning Evaluation Report"
2
2
@def description = """
3
-
This is a technical report of the Summer OSPP project [Establish a General Pipeline for Offline Reinforcement Learning Evaluation](https://summer.iscas.ac.cn/#/org/prodetail/210370741?lang=en) used for final term evaluation. It provides an overview of the work done during mid-term and during final evaluation phases.
3
+
This is a technical report of the Summer OSPP project [Establish a General Pipeline for Offline Reinforcement Learning Evaluation](https://summer.iscas.ac.cn/#/org/prodetail/210370741?lang=en) used for final term evaluation. It provides an overview of the work done during mid-term and the final evaluation phases.
4
+
"""
4
5
@def is_enable_toc = true
5
6
@def has_code = true
6
7
@def has_math = true
@@ -15,27 +16,31 @@
15
16
"affiliationURL":"https://www.nitt.edu/"
16
17
}
17
18
],
18
-
"publishedDate":"2021-08-15",
19
+
"publishedDate":"2021-09-30",
19
20
"citationText":"Prasidh Srikumar, 2021"
20
21
}"""
21
22
22
-
@def bibliography = "bibliography.bib"
23
+
@def appendix = """
24
+
### Corrections
25
+
If you see mistakes or want to suggest changes, please [create an issue](https://github.com/JuliaReinforcementLearning/ReinforcementLearning.jl/issues) in the source repository.
26
+
"""
23
27
24
-
## 1. Introduction
28
+
@def bibliography = "bibliography.bib"
25
29
26
-
### Project Name
30
+
#Introduction
27
31
32
+
## Project Name
28
33
Establish a General Pipeline for Offline Reinforcement Learning Evaluation
29
34
30
-
###Background
35
+
## Background
31
36
32
37
In recent years, there have been several breakthroughs in the field of Reinforcement Learning with numerous practical applications where RL bots have been able to achieve superhuman performance. This is also reflected in the industry where several cutting edge solutions have been developed based on RL ([Tesla Motors](https://www.tesla.com/), [AutoML](https://cloud.google.com/automl), [DeepMind data center cooling solutions](https://deepmind.com/blog/article/deepmind-ai-reduces-google-data-centre-cooling-bill-40) just to name a few).
33
38
34
39
One of the most prominent challenges in RL is the lack of reliable environments for training RL agents. **Offline RL** has played a pivotal role in solving this problem by removing the need for the agent to interact with the environment to improve its policy over time. This brings forth the problem of not having reliable tests to verify the performance of RL algorithms. Such tests are facilitated by standard datasets ([RL Unplugged](https://arxiv.org/abs/2006.13888)\dcite{DBLP:journals/corr/abs-2006-13888}, [D4RL](https://arxiv.org/abs/2004.07219)\dcite{DBLP:journals/corr/abs-2004-07219} and [An Optimistic Perspective on Offline Reinforcement Learning](https://arxiv.org/abs/1907.04543)\dcite{agarwal2020optimistic}) that are used to train Offline RL agents and benchmark against other algorithms and implementations. [ReinforcementLearningDatasets.jl](https://github.com/JuliaReinforcementLearning/ReinforcementLearning.jl/tree/master/src/ReinforcementLearningDatasets) provides a simple solution to access various standard datasets that are available for Offline RL benchmarking across a variety of tasks.
35
40
36
41
Another problem in Offline RL is Offline Model Selection. For this, there are numerous policies that are available in [Benchmarks for Deep Off-Policy Evaluation](https://openreview.net/forum?id=kWSeGEeHvF8)\dcite{DBLP:journals/corr/abs-2103-16596}. ReinforcementLearningDatasets.jl will also help in loading policies that will aid in model selection in ReinforcementLearning.jl package.
37
42
38
-
## 2. Project Overview
43
+
## Project Overview
39
44
40
45
### Objectives
41
46
@@ -80,7 +85,7 @@ Refer the following [discussion](https://github.com/JuliaReinforcementLearning/R
80
85
81
86
There are some changes to the original timeline based on a few time constraints but the basic objectives of the project are accomplished.
The type that is returned is a `Channel{AtariRLTransition}` which returns batches with the given specifications from the buffer when `take!` is used. The point to be noted here is that it takes seconds to load the datasets into the `Channel` and the loading is highly customizable.
The challenge that was faced during the first week was to chart out a direction for RLDatasets.jl. I researched the implementations of the pipeline in [d3rlpy](https://github.com/takuseno/d3rlpy), [TF.data.Dataset](https://www.tensorflow.org/datasets) etc and then narrowed down some inspiring ideas in the [discussion](https://github.com/JuliaReinforcementLearning/ReinforcementLearning.jl/discussions/359).
225
230
@@ -384,27 +389,19 @@ res = Channel{AtariRLTransition}(n_preallocations; taskref=taskref, spawn=true)
384
389
end
385
390
```
386
391
387
-
## 5. Implications
388
-
389
-
Equipping RL.jl with RLDatasets.jl is a key step in making the package more industry relevant because different offline algorithms can be compared with respect to a variety of standard offline dataset benchmarks. It is also meant to improve the implementations of existing offline algorithms and make it on par with the SOTA implementations. This package provides a seamless way of downloading and accessing existing datasets and also supports loading datasets into memory with ease, which if implemented separately, would be tedious for the user.
390
-
391
-
After the implementation of [Benchmarks for Deep Off-Policy Evaluation](https://github.com/google-research/deep_ope), testing and comparing algorithms would be much easier than before. This package would also make SOTA offline RL more accessible and reliable than ever before in ReinforcementLearning.jl.
392
-
393
392
# Technical report (final term evaluation)
394
-
The following is the final term evaluation report of "General Pipeline for Offline Reinforcement Learning Evaluation Report" in OSPP. Details of all the work that has been done after the mid-term evaluation and some explanation on the current status of the Package are given. Some exciting work that is possible based on this project is also given.
395
-
396
-
## Completed Work
397
-
The following work has been done post mid-term evaluation.
393
+
The following is the final term evaluation report of "General Pipeline for Offline Reinforcement Learning Evaluation Report" in OSPP. Details of all the work that has been done after the mid-term evaluation and some explanation on the current status of the package are given. Some exciting work that is possible based on this project is also given.
398
394
399
-
### Summary
400
-
The following is the summary of the project work.
395
+
## Summary
401
396
402
397
- Polished and finalized the structure of the package. Improved usability by updating the [docs](https://juliareinforcementlearning.org/docs/rldatasets/) accordingly.
403
398
- Fixed the `run` error that was shown in windows.
404
399
- Added `Bsuite` and all `DM` environments including [`DeepMind Control Suite Dataset`](https://github.com/deepmind/deepmind-research/tree/master/rl_unplugged#deepmind-control-suite-dataset), [`DeepMind Lab Dataset`](https://github.com/deepmind/deepmind-research/tree/master/rl_unplugged#deepmind-lab-dataset) and [`DeepMind Locomotion Dataset`](https://github.com/deepmind/deepmind-research/tree/master/rl_unplugged#deepmind-locomotion-dataset_) in RL Unplugged Datasets\dcite{DBLP:journals/corr/abs-2006-13888}.
405
400
- Added [Deep OPE](https://github.com/google-research/deep_ope)\dcite{DBLP:journals/corr/abs-2103-16596} models for D4RL datasets.
406
-
- Researched and implemented FQE\dcite{DBLP:journals/corr/abs-2007-09055} for which the basic implementation works. There are some flaws that need to be fixed.
401
+
- Researched and implemented FQE\dcite{DBLP:journals/corr/abs-2007-09055} for which the basic implementation works but there are some flaws that need to be fixed.
407
402
403
+
## Completed Work
404
+
The following work has been done post mid-term evaluation.
408
405
409
406
### Bsuite Datasets
410
407
It involved work similar to RL Unplugged Atari Datasets which involves multi threaded dataloading. It is implemented using a [`Ring Buffer`](https://github.com/JuliaReinforcementLearning/ReinforcementLearning.jl/blob/f1837a93c4c061925d92167c3480a423007dae5c/src/ReinforcementLearningDatasets/src/rl_unplugged/util.jl#L89) for storing and loading batches of data.
Support is given for D4RL policies provided in [Deep OPE](https://github.com/google-research/deep_ope).
561
+
Support is given for D4RL policies provided in [Deep OPE](https://github.com/google-research/deep_ope)\dcite{DBLP:journals/corr/abs-2103-16596}.
565
562
566
563
#### Implementation
567
564
The policies that are given [here](https://github.com/JuliaReinforcementLearning/ReinforcementLearning.jl/blob/master/src/ReinforcementLearningDatasets/src/deep_ope/d4rl/d4rl_policies.jl) are loaded using `d4rl_policy` function.
@@ -674,7 +671,7 @@ The implementation in RLZoo is based on [Hyperparameter Selection for Offline Re
The average of values chosen by the policies based on initial states can be taken as the reward that the policy would gain from the environment. So, the same can be used for online hyper parameter selection.
674
+
The average of values calculated by FQE based on initial states can be taken as the reward that the policy would gain from the environment. So, the same can be used for online hyper parameter selection.
678
675
679
676
The pseudocode for the implementation and the objective function are as follows.
680
677
@@ -743,45 +740,54 @@ end
743
740
#### Results
744
741
The [implementation](https://github.com/JuliaReinforcementLearning/ReinforcementLearning.jl/pull/515) is still a work in progress because of some sampling error. But the algorithm that I implemented without RL.jl framework works as expected.
745
742
746
-
##### Parameter Values:
743
+
##### Parameter Values
747
744
- Policy => CRR Policy
748
745
- Env => PendulumEnv
749
746
- q_networks => Two 64 neuron layers with `n_s+n_a` input neurons and `1` output neuron.
750
747
- optimizer => ADAM(0.005)
751
748
- loss => Flux.Losses.mse
752
749
- γ => 0.99
753
-
- batch_size => 256
754
-
- update_freq, update_step => 1
755
-
- tar_update_freq => 256
750
+
- batch\_size => 256
751
+
- update\_freq, update\_step => 1
752
+
- tar\_update\_freq => 256
753
+
- number of training steps => 40_000
756
754
757
755
##### Evaluation Results
758
756
759
-
/dfig{body;FQE_Evaluation_Result.png}
757
+
The values evaluated by FQE for 100 initial states.
758
+
759
+
\dfig{body;FQE_Evaluation_Result.png}
760
760
761
761
mean=-243.0258f0
762
762
763
763
##### Actual Values
764
764
765
-
/dfig{body;Actual_Evaluation_Result.png}
765
+
The values obtained by running the agent in the environment for 100 iterations.
- [Add support for deep ope in RLDatasets.jl (#500)](https://github.com/JuliaReinforcementLearning/ReinforcementLearning.jl/commit/1a00766e9df3edc19cd7377a595b4563261a0356#diff-d7a7b3de8d5eedecb629c4d80b6b249d68d15d6f66a7ef768bf4eb937fd5a5d7)
775
777
- [WIP to implement FQE #515](https://github.com/JuliaReinforcementLearning/ReinforcementLearning.jl/pull/515)
776
778
777
-
### Conclusion
779
+
## Conclusion
778
780
The foundations of RLDatasets.jl package has been laid during the course of the project. The basic datasets except for Real World Datasets from RL Unplugged have been supported. Furthermore, D4RL policies have been successfully loaded and tested. The algorithm for FQE has been tried out with a minor implementation detail pending.
779
781
780
782
With the completion of FQE the four requirements of OPE as laid out by [Deep OPE](https://github.com/google-research/deep_ope)\dcite{DBLP:journals/corr/abs-2103-16596} will be completed for D4RL.
781
783
782
784
\dfig{body;OPE_Requirements.png}
783
785
784
-
#### Future Scope
786
+
### Implications
787
+
788
+
Equipping RL.jl with RLDatasets.jl is a key step in making the package more industry relevant because different offline algorithms can be compared with respect to a variety of standard offline dataset benchmarks. It is also meant to improve the implementations of existing offline algorithms and make it on par with the SOTA implementations. This package provides a seamless way of downloading and accessing existing datasets and also supports loading datasets into memory with ease, which if implemented separately, would be tedious for the user. It also incorporates policies that can be useful for testing Off Policy Evaluation Methods.
789
+
790
+
### Future Scope
785
791
There are several exciting work that are possible from this point.
786
792
787
793
- Testing and improvement of already existing Offline Algorithms in RLZoo.jl.
0 commit comments