## Note that I put some solutions on the Material page.

# For Prioritized Sweeping Projects

Here is the MDP version of the Maze: mdp_maze.py

# Summary

### Objectives

The objectives of this course are to understand and acquire practical experience with:

- Formalization of Reinforcement Learning (RL) problems
- Dynamic Programming for model-based optimization
- Solution methods for discrete model-free RL problems
- Value function approximation for continuous RL problems.
- Direct policy search with parameterized policies.
- Applications of RL

The "teaser" slides are available here.

### Prerequisites and Requirements

Basic linear algebra, the Python programming language### Material

The book and code we use are available here.

# Outline and Schedule

The course will be at the P.U.I.O building, rooms E203/E204

The generic format is that each week, there is a lecture (approx. 1 hour), followed by two hours of exercises and project work at the computer. Below is the schedule of the course, with links to the lectures and exercises.

Date |
Number |
14h00-15h00 |
15h15-16h00 |
16h00-17h15 |

23.11 | Week 1 | Introduction + Markov Decision Processes + Dynamic Programming | ExerII | |

30.11 | Week 2 | |||

07.12 | Week 3 | Monte Carlo methods + Temporal Differencing methods | ExerII/ExerIII | ExerIII |

14.12 | Week 4 | Projects + Function Approximation | ExerIII | ExerIII |

04.01 | Week 5 | Function Approximation | ExerIV | ExerIV |

11.01 | Week 6 | Direct Policy Search | ExerIV | ExerIV |

18.01 | Week 7 | Summary + Case Studies | Project | Project |

01.02 | Week 8 | Exam |

### Syllabus

Here is a more detailed overview of the topics treated during the course, along with the corresponding exercises.- Introduction
- What is RL?
- Applications
- Course AIC/RL
- Problem formalization
- Markov Decision Processes
- Dynamic Programming
- Discrete RL
- Monte Carlo Value Learning -
**ExerIII** - Monte Carlo Q-Value Learning -
**ExerIII** - Temporal Differencing (V and Q) -
**ExerIII** - Continuous RL
- Dicretization -
**ExerIV** - Function Approximation -
**ExerIV** - Direct Policy Search -
**ExerIV**

Possible projects:

- Prioritized Sweeping
- DYNA
- Eligibility Traces