DOLCIT Seminar

Thursday June 20, 2019 4:00 PM

Optimization Aspects of Temporal Abstraction in Reinforcement Learning

Speaker: Pierre-Luc Bacon, Stanford Artificial Intelligence Lab, Stanford University
Location: Annenberg 213

Temporal abstraction refers to the idea that complicated sequential decision making problems can sometimes be simplified by considering the "big picture" first. In this talk, I will give an overview of some of my work on learning such temporal abstractions end-to-end within the "option-critic" architecture (Bacon et al., 2017). I will then explain how other related hierarchical RL frameworks, such as Feudal RL by Dayan and Hinton (1993), can also be approached under the same option-critic architecture. However, we will see that that this formulation leads to a so-called "bilevel" optimization problem. While this is a more difficult problem, the good news is that the literature on bilevel optimization is rich and many of its tools have yet to be re-discovered by our community. I will finally show how "iterative differentiation" techniques (Griewank and Walther, 2008) can be applied to our problem while providing a new interpretation to the "inverse RL" approach of Rust (1988).

Series RSRG/DOLCIT Seminar Series

Contact: Pamela Albertson