Note that I put some solutions on the Material page.
For Prioritized Sweeping Projects
Here is the MDP version of the Maze: mdp_maze.py
The objectives of this course are to understand and acquire practical experience with:
- Formalization of Reinforcement Learning (RL) problems
- Dynamic Programming for model-based optimization
- Solution methods for discrete model-free RL problems
- Value function approximation for continuous RL problems.
- Direct policy search with parameterized policies.
- Applications of RL
The "teaser" slides are available here.
Prerequisites and RequirementsBasic linear algebra, the Python programming language
The book and code we use are available here.
Outline and Schedule
The course will be at the P.U.I.O building, rooms E203/E204
The generic format is that each week, there is a lecture (approx. 1 hour), followed by two hours of exercises and project work at the computer. Below is the schedule of the course, with links to the lectures and exercises.
|23.11||Week 1||Introduction + Markov Decision Processes + Dynamic Programming||ExerII|
|07.12||Week 3||Monte Carlo methods + Temporal Differencing methods||ExerII/ExerIII||ExerIII|
|14.12||Week 4||Projects + Function Approximation||ExerIII||ExerIII|
|04.01||Week 5||Function Approximation||ExerIV||ExerIV|
|11.01||Week 6||Direct Policy Search||ExerIV||ExerIV|
|18.01||Week 7||Summary + Case Studies||Project||Project|
SyllabusHere is a more detailed overview of the topics treated during the course, along with the corresponding exercises.
- What is RL?
- Course AIC/RL
- Markov Decision Processes
- Monte Carlo Value Learning - ExerIII
- Monte Carlo Q-Value Learning - ExerIII
- Temporal Differencing (V and Q) - ExerIII
- Dicretization - ExerIV
- Function Approximation - ExerIV
- Direct Policy Search - ExerIV
- Prioritized Sweeping
- Eligibility Traces