# MIT Deep RL

## MIT 6.S091: Introduction to Deep Reinforcement Learning (Deep RL)

RL is teaching BY experience. A good strategy for an agent would be to always choose an action that maximizes the (discounted) future reward

Defining a useful state space, action space and reward are hard part. Getting meaningful data fro the formalization is very hard.

###

{% embed url="<https://www.youtube.com/watch?v=zR11FLZ-O9M>" %}

### Environment and Actions

![Deep RL lecture -MIT](https://3698175758-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MAwtzMy_pbrChIExFtN%2Fuploads%2Fgit-blob-7abd0e3f3038eb85caff2457b9661114122c88f5%2Fimage.png?alt=media)

Challenge in RL in real-world applications are how to provide the experience? One option is providing Realistic simulation + transfer learning

#### Components of RL agent

![](https://3698175758-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MAwtzMy_pbrChIExFtN%2Fuploads%2Fgit-blob-9e48fdb064342ac4d3edd0ff9ab90bbe71c345fc%2Fimage.png?alt=media)

### Maximize the reward

A good strategy for an agent would be to always choose an action that maximizes the (discounted) future reward

![](https://3698175758-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MAwtzMy_pbrChIExFtN%2Fuploads%2Fgit-blob-2f2cd0400598f5700ea1185658cbe50d074c3ad5%2Fimage.png?alt=media)

### Optimal Policy

Both Environment model and Reward structures have big impact on optimal policy

### Types of Reinforcement Learning

It can be classified either Model-based or Model-Free

* Model-based: e.g. Chess etc
* Model-free
  * Value-based: Off-Policy, can choose the best action. Example: Q-Learning
  * Policy-based: On-Policy, Directly learn the best policy

![](https://3698175758-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MAwtzMy_pbrChIExFtN%2Fuploads%2Fgit-blob-3d929e7253c21967c28b3c00faa9a94b292645a8%2Fimage.png?alt=media)

![](https://3698175758-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MAwtzMy_pbrChIExFtN%2Fuploads%2Fgit-blob-5d33a0d860fd48b692a5f432c87e14f0f0ee86d2%2Fimage.png?alt=media)

#### Q-Learning (Deep Q-Learning Network)

It is a Model-Free, Off-Policy, Value-based Method

![](https://3698175758-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MAwtzMy_pbrChIExFtN%2Fuploads%2Fgit-blob-ac17e38025946677fe63592f2d175a52c3a07679%2Fimage.png?alt=media)

A conventional method of Q-Learning, it is basically a Q-table that updates. But it is not practical with limited rows/cols of table.

Deep Q-Learning uses a neural network to approximate the Q-Function. This does not require to know and understand the physics of the environment.

![](https://3698175758-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MAwtzMy_pbrChIExFtN%2Fuploads%2Fgit-blob-5b9a18637b049e9550572a46aaa5ee45839b7649%2Fimage.png?alt=media)

![](https://3698175758-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MAwtzMy_pbrChIExFtN%2Fuploads%2Fgit-blob-af2d622c3f914af6d69682524913f6a56b8b2590%2Fimage.png?alt=media)

![](https://3698175758-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MAwtzMy_pbrChIExFtN%2Fuploads%2Fgit-blob-d4161f4f1c11539bbc462ac36a016c2c6ef8fd64%2Fimage.png?alt=media)

### Policy Gradient

![](https://3698175758-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MAwtzMy_pbrChIExFtN%2Fuploads%2Fgit-blob-98dee453d084f1550a5f1b5f8eda2abb2b7b5fb0%2Fimage.png?alt=media)

Vanilla Policy Gradient

![](https://3698175758-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MAwtzMy_pbrChIExFtN%2Fuploads%2Fgit-blob-effc2b8e0070b7fa368a4e86c34f1a4b2b7c6a78%2Fimage.png?alt=media)

#### Advantage Actor-Critic (A2C)

Combined DQN and REINFORCE

![](https://3698175758-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MAwtzMy_pbrChIExFtN%2Fuploads%2Fgit-blob-494cec26dac1ddc30e2f8c0fcff6bad6929c7f03%2Fimage.png?alt=media)

#### Deep Deterministic Policy Gradient (DDPG)

![](https://3698175758-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MAwtzMy_pbrChIExFtN%2Fuploads%2Fgit-blob-c56681c445e9eafcfd8ec36610286e8f8528d05f%2Fimage.png?alt=media)

![](https://3698175758-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MAwtzMy_pbrChIExFtN%2Fuploads%2Fgit-blob-779ae3afbf0e56a057a308c00ea72e3282326617%2Fimage.png?alt=media)

![](https://3698175758-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MAwtzMy_pbrChIExFtN%2Fuploads%2Fgit-blob-985607015105593f617a9d332e01d4d858dd005e%2Fimage.png?alt=media)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://ykkim.gitbook.io/wiki/reinforcement-learning/introduction/rl-mitdeepai.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
