Update README.md

SheidaAbedpour · Jan 28, 2024 · cbbb291 · cbbb291
1 parent 4844984
commit cbbb291
Showing 1 changed file with 51 additions and 13 deletions.
diff --git a/README.md b/README.md
@@ -1,13 +1,51 @@
-[![Review Assignment Due Date](https://classroom.github.com/assets/deadline-readme-button-24ddc0f5d75046c5622901739e7c5dd533143b0c8e959d652212380cedb1ea36.svg)](https://classroom.github.com/a/6jR5oQmn)
-# پروژه دوم: حل مسأله با فرآیند تصمیم مارکوف
-مسیریابی در محیط با دنبال کردن یک سیاست در MDP ...
-
-# منابع آموزشی گیت و گیت‌هاب
-- [آموزش گیت (Git)، گیت هاب و گیت لب - فرادرس (جادی میرمیرانی)](https://faradars.org/courses/fvgit9609-git-github-gitlab)
-- [۲۰ دستور پراستفاده در گیت به همراه مثال](https://dzone.com/articles/top-20-git-commands-with-examples)
-- [چیت‌شیت گیت کوئرا](https://quera.org/college/cheatsheet/git)
-
-# نکات مهم
-- استفاده از گیت و گیت‌هاب در انجام پروژه **اجباری** است.
-- تاریخ ارائه شفاهی، متعاقباً اطلاع‌رسانی می‌شود.
-- مهلت ارسال پروژه **در سامانه کوئرا ذکر شده است**.
+# Reinforcement Learning Project: CliffWalking
+This project implements a reinforcement learning environment called "CliffWalking," which is a variation of the classic Cliff Walking problem. The environment is designed as a subclass of CliffWalkingEnv from the Gym library. The project includes functionalities for policy evaluation and policy iteration within the Markov Decision Process (MDP) framework.
+
+## MDP
+MDP stands for Markov Decision Process. It is a mathematical framework used to model decision-making problems in situations where outcomes are partly random and partly under the control of a decision-maker.
+
+In an MDP, the decision-making problem is represented as a tuple (S, A, P, R), 
+where: 
+- S is the set of possible states in the environment. 
+- A is the set of possible actions that the decision-maker can take. 
+- P is the state transition probability matrix, which defines the probability of transitioning from one state to another when a particular action is taken. 
+- R is the reward function, which assigns a numerical reward to each state-action pair. 
+
+The goal is to find an optimal policy that maximizes the expected cumulative reward over time.
+
+## Policy Evaluation and Policy Iteration
+The project implements policy evaluation and policy iteration algorithms for solving the CliffWalking environment. Policy evaluation estimates the value function for a given policy, while policy iteration alternates between policy evaluation and improvement to find the optimal policy in an MDP.
+
+## Environment: CliffWalking
+The implemented environment in this project called "CliffWalking" is a variation of the classic Cliff Walking problem. The environment is implemented as a subclass of CliffWalkingEnv from the gym library.
+
+
+![](https://gymnasium.farama.org/_images/cliff_walking.gif)
+
+
+### Attributes
+- UP, RIGHT, DOWN, LEFT: Constants representing possible actions.
+### Methods
+  - __init__(self, is_hardmode=True, num_cliffs=10, *args, **kwargs): Constructor method initializing the environment.
+  - _calculate_transition_prob(self, current, delta): Helper method for calculating transition probabilities.
+  - is_valid(self): Depth-first search (DFS) method to check for a valid path.
+  - step(self, action): Overrides the step method for taking actions and returning state, reward, and termination status.
+  - _render_gui(self, mode): Method for rendering the environment using the pygame library.
+
+## How to Run 
+1. Clone the Repository:
+```bash
+https://github.com/SheidaAbedpour/MDP-CliffWalking.git
+```
+2. Install Dependencies:
+```bash
+pip install -r requirement.txt
+```
+3. Run project:
+```bash
+python main.py
+```
+4. View the results, including the optimal policy and corresponding values.
+
+## Acknowledgments
+This project is based on the [CliffWalking](https://gymnasium.farama.org/environments/toy_text/cliff_walking/) environment from the Gym library. The project structure and documentation follow best practices and guidelines.