🖍️
gitbook_docs
  • Introduction
  • Machine Learning
    • Recommended Courses
      • For Undergrad Research
      • Math for Machine Learning
    • ML Notes
      • Covariance Correlation
      • Feature Selection
      • Linear Regression
      • Entropy, Cross-Entropy, KL Divergence
      • Bayesian Classifier
        • Terminology Review
        • Bayesian Classifier for Normally Distributed classes
      • Linear Discriminant Analysis
      • Logistic Regression
        • Logistic Regression Math
      • Logistic Regression-MaximumLikelihood
      • SVM
        • SVM concept
        • SVM math
      • Cross Validation
      • Parameter, Density Estimation
        • MAP, MLE
        • Gaussian Mixture Model
      • E-M
      • Density Estimation(non-parametric)
      • Unsupervised Learning
      • Clustering
      • kNN
      • WaveletTransform
      • Decision Tree
    • Probability and Statistics for Machine Learning
      • Introduction
      • Basics of Data Analysis
      • Probability for Discrete Random Variable
      • Poisson Distribution
      • Chi-Square Distribution
      • P-value and Statistical Hypothesis
      • Power and Sample Size
      • Hypothesis Test Old
      • Hypothesis Test
      • Multi Armed Bandit
      • Bayesian Inference
      • Bayesian Updating with Continuous Priors
      • Discrete Distribution
      • Comparison of Bayesian and frequentist inference
      • Confidence Intervals for Normal Data
      • Frequenist Methods
      • Null Hypothesis Significance Testing
      • Confidence Intervals: Three Views
      • Confidence Intervals for the Mean of Non-normal Data
      • Probabilistic Prediction
  • Industrial AI
    • PHM Dataset
    • BearingFault_Journal
      • Support Vector Machine based
      • Autoregressive(AR) model based
      • Envelope Extraction based
      • Wavelet Decomposition based
      • Prediction of RUL with Deep Convolution Nueral Network
      • Prediction of RUL with Information Entropy
      • Feature Model and Feature Selection
    • TempCore Journal
      • Machine learning of mechanical properties of steels
      • Online prediction of mechanical properties of hot rolled steel plate using machine learning
      • Prediction and Analysis of Tensile Properties of Austenitic Stainless Steel Using Artificial Neural
      • Tempcore, new process for the production of high quality reinforcing
      • TEMPCORE, the most convenient process to produce low cost high strength rebars from 8 to 75 mm
      • Experimental investigation and simulation of structure and tensile properties of Tempcore treated re
    • Notes
  • LiDAR
    • Processing of Point Cloud
    • Intro. 3D Object Detection
    • PointNet
    • PointNet++
    • Frustrum-PointNet
    • VoxelNet
    • Point RCNN
    • PointPillars
    • LaserNet
  • Simulator
    • Simulator List
    • CARLA
    • Airsim
      • Setup
      • Tutorial
        • T#1
        • T#2
        • T#3: Opencv CPP
        • T#4: Opencv Py
        • Untitled
        • T#5: End2End Driving
  • Resources
    • Useful Resources
    • Github
    • Jekyll
  • Reinforcement Learning
    • RL Overview
      • RL Bootcamp
      • MIT Deep RL
    • Textbook
    • Basics
    • Continuous Space RL
  • Unsupervised Learning
    • Introduction
  • Unclassified
    • Ethics
    • Conference Guideline
  • FPGA
    • Untitled
  • Numerical Method
    • NM API reference
Powered by GitBook
On this page
  • MIT 6.S091: Introduction to Deep Reinforcement Learning (Deep RL)
  • Environment and Actions
  • Maximize the reward
  • Optimal Policy
  • Types of Reinforcement Learning
  • Policy Gradient

Was this helpful?

  1. Reinforcement Learning
  2. RL Overview

MIT Deep RL

PreviousRL BootcampNextTextbook

Last updated 3 years ago

Was this helpful?

MIT 6.S091: Introduction to Deep Reinforcement Learning (Deep RL)

RL is teaching BY experience. A good strategy for an agent would be to always choose an action that maximizes the (discounted) future reward

Defining a useful state space, action space and reward are hard part. Getting meaningful data fro the formalization is very hard.

Environment and Actions

Challenge in RL in real-world applications are how to provide the experience? One option is providing Realistic simulation + transfer learning

Components of RL agent

Maximize the reward

A good strategy for an agent would be to always choose an action that maximizes the (discounted) future reward

Optimal Policy

Both Environment model and Reward structures have big impact on optimal policy

Types of Reinforcement Learning

It can be classified either Model-based or Model-Free

  • Model-based: e.g. Chess etc

  • Model-free

    • Value-based: Off-Policy, can choose the best action. Example: Q-Learning

    • Policy-based: On-Policy, Directly learn the best policy

Q-Learning (Deep Q-Learning Network)

It is a Model-Free, Off-Policy, Value-based Method

A conventional method of Q-Learning, it is basically a Q-table that updates. But it is not practical with limited rows/cols of table.

Deep Q-Learning uses a neural network to approximate the Q-Function. This does not require to know and understand the physics of the environment.

Policy Gradient

Vanilla Policy Gradient

Advantage Actor-Critic (A2C)

Combined DQN and REINFORCE

Deep Deterministic Policy Gradient (DDPG)

Deep RL lecture -MIT