AIC - Module Reinforcement Learning

AIC - Module Reinforcement Learning

Freek Stulp (ENSTA-ParisTech) and Michele Sebag (LRI)

Note that I put some solutions on the Material page.

For Prioritized Sweeping Projects

Here is the MDP version of the Maze:



The objectives of this course are to understand and acquire practical experience with:

The "teaser" slides are available here.

Prerequisites and Requirements

Basic linear algebra, the Python programming language


The book and code we use are available here.

Outline and Schedule

The course will be at the P.U.I.O building, rooms E203/E204

The generic format is that each week, there is a lecture (approx. 1 hour), followed by two hours of exercises and project work at the computer. Below is the schedule of the course, with links to the lectures and exercises.

Date Number 14h00-15h00 15h15-16h00 16h00-17h15
23.11 Week 1 Introduction + Markov Decision Processes + Dynamic Programming ExerII
30.11 Week 2
07.12 Week 3 Monte Carlo methods + Temporal Differencing methods ExerII/ExerIII ExerIII
14.12 Week 4 Projects + Function Approximation ExerIII ExerIII
Christmas holidays
04.01 Week 5 Function Approximation ExerIV ExerIV
11.01 Week 6 Direct Policy Search ExerIV ExerIV
18.01 Week 7 Summary + Case Studies Project Project
01.02 Week 8Exam


Here is a more detailed overview of the topics treated during the course, along with the corresponding exercises.
  1. Introduction
  2. Problem formalization
  3. Dynamic Programming
  4. Discrete RL
  5. Continuous RL

Possible projects:

  1. Prioritized Sweeping
  2. DYNA
  3. Eligibility Traces