MIT Deep RL
MIT 6.S091: Introduction to Deep Reinforcement Learning (Deep RL)
RL is teaching BY experience. A good strategy for an agent would be to always choose an action that maximizes the (discounted) future reward
Defining a useful state space, action space and reward are hard part. Getting meaningful data fro the formalization is very hard.
Environment and Actions

Challenge in RL in real-world applications are how to provide the experience? One option is providing Realistic simulation + transfer learning
Components of RL agent

Maximize the reward
A good strategy for an agent would be to always choose an action that maximizes the (discounted) future reward

Optimal Policy
Both Environment model and Reward structures have big impact on optimal policy
Types of Reinforcement Learning
It can be classified either Model-based or Model-Free
Model-based: e.g. Chess etc
Model-free
Value-based: Off-Policy, can choose the best action. Example: Q-Learning
Policy-based: On-Policy, Directly learn the best policy


Q-Learning (Deep Q-Learning Network)
It is a Model-Free, Off-Policy, Value-based Method

A conventional method of Q-Learning, it is basically a Q-table that updates. But it is not practical with limited rows/cols of table.
Deep Q-Learning uses a neural network to approximate the Q-Function. This does not require to know and understand the physics of the environment.



Policy Gradient

Vanilla Policy Gradient

Advantage Actor-Critic (A2C)
Combined DQN and REINFORCE

Deep Deterministic Policy Gradient (DDPG)



Last updated
Was this helpful?