-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
4844984
commit cbbb291
Showing
1 changed file
with
51 additions
and
13 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,13 +1,51 @@ | ||
[data:image/s3,"s3://crabby-images/e854e/e854e0c6b1efc5e4b0d88358b4f7613927c487f9" alt="Review Assignment Due Date"](https://classroom.github.com/a/6jR5oQmn) | ||
# پروژه دوم: حل مسأله با فرآیند تصمیم مارکوف | ||
مسیریابی در محیط با دنبال کردن یک سیاست در MDP ... | ||
|
||
# منابع آموزشی گیت و گیتهاب | ||
- [آموزش گیت (Git)، گیت هاب و گیت لب - فرادرس (جادی میرمیرانی)](https://faradars.org/courses/fvgit9609-git-github-gitlab) | ||
- [۲۰ دستور پراستفاده در گیت به همراه مثال](https://dzone.com/articles/top-20-git-commands-with-examples) | ||
- [چیتشیت گیت کوئرا](https://quera.org/college/cheatsheet/git) | ||
|
||
# نکات مهم | ||
- استفاده از گیت و گیتهاب در انجام پروژه **اجباری** است. | ||
- تاریخ ارائه شفاهی، متعاقباً اطلاعرسانی میشود. | ||
- مهلت ارسال پروژه **در سامانه کوئرا ذکر شده است**. | ||
# Reinforcement Learning Project: CliffWalking | ||
This project implements a reinforcement learning environment called "CliffWalking," which is a variation of the classic Cliff Walking problem. The environment is designed as a subclass of CliffWalkingEnv from the Gym library. The project includes functionalities for policy evaluation and policy iteration within the Markov Decision Process (MDP) framework. | ||
|
||
## MDP | ||
MDP stands for Markov Decision Process. It is a mathematical framework used to model decision-making problems in situations where outcomes are partly random and partly under the control of a decision-maker. | ||
|
||
In an MDP, the decision-making problem is represented as a tuple (S, A, P, R), | ||
where: | ||
- S is the set of possible states in the environment. | ||
- A is the set of possible actions that the decision-maker can take. | ||
- P is the state transition probability matrix, which defines the probability of transitioning from one state to another when a particular action is taken. | ||
- R is the reward function, which assigns a numerical reward to each state-action pair. | ||
|
||
The goal is to find an optimal policy that maximizes the expected cumulative reward over time. | ||
|
||
## Policy Evaluation and Policy Iteration | ||
The project implements policy evaluation and policy iteration algorithms for solving the CliffWalking environment. Policy evaluation estimates the value function for a given policy, while policy iteration alternates between policy evaluation and improvement to find the optimal policy in an MDP. | ||
|
||
## Environment: CliffWalking | ||
The implemented environment in this project called "CliffWalking" is a variation of the classic Cliff Walking problem. The environment is implemented as a subclass of CliffWalkingEnv from the gym library. | ||
|
||
|
||
data:image/s3,"s3://crabby-images/3d58d/3d58d66ffd076fceaf55fee9c88456b012f9b2f7" alt="" | ||
|
||
|
||
### Attributes | ||
- UP, RIGHT, DOWN, LEFT: Constants representing possible actions. | ||
### Methods | ||
- __init__(self, is_hardmode=True, num_cliffs=10, *args, **kwargs): Constructor method initializing the environment. | ||
- _calculate_transition_prob(self, current, delta): Helper method for calculating transition probabilities. | ||
- is_valid(self): Depth-first search (DFS) method to check for a valid path. | ||
- step(self, action): Overrides the step method for taking actions and returning state, reward, and termination status. | ||
- _render_gui(self, mode): Method for rendering the environment using the pygame library. | ||
|
||
## How to Run | ||
1. Clone the Repository: | ||
```bash | ||
https://github.com/SheidaAbedpour/MDP-CliffWalking.git | ||
``` | ||
2. Install Dependencies: | ||
```bash | ||
pip install -r requirement.txt | ||
``` | ||
3. Run project: | ||
```bash | ||
python main.py | ||
``` | ||
4. View the results, including the optimal policy and corresponding values. | ||
|
||
## Acknowledgments | ||
This project is based on the [CliffWalking](https://gymnasium.farama.org/environments/toy_text/cliff_walking/) environment from the Gym library. The project structure and documentation follow best practices and guidelines. |