Rigorous Systems Research Group (RSRG) Seminar
In this work we study how to tackle decision-making for safety-critical systems under uncertainty. To that end, we formulate Reinforcement Learning with Almost Sure constraints, in which one seeks a policy that allows no more than $\Delta\in\mathbb{N}$ unsafe events in any trajectory, with probability one. We argue that this type of constraint might be better suited for safety-critical systems as opposed to the usual average constraint employed in Constrained Markov Decision Processes, and that moreover, having constraints of this kind makes feasible policies much easier to find.
The talk is didactically split in two parts, first considering $\Delta=0$ and then the $\Delta\geq 0$ case. At the core of our theory is a barrier-based decomposition of the Q-function, that decouples the problems of optimality and feasibility and allows them to be learned either independently or in conjunction. We develop an algorithm for characterizing the set of all feasible policies that provably converges in expected finite time. We further develop sample-complexity bounds for learning this set with high probability. Simulations corroborate our theoretical findings and showcase how our algorithm can be wrapped around other learning algorithms to hasten the search of first feasible and then optimal policies.
Bio: Agustin Castellano is starting his 2nd year as a Ph.D. student in Electrical and Computer Engineering at Johns Hopkins University (JHU), under the supervision of Enrique Mallada. He is currently working on Reinforcement Learning algorithms for safety-critical systems, work that is currently supported by a MINDS Fellowship at JHU. Prior to this, he completed his M.Sc. in Electrical Engineering in Universidad de la República, Uruguay, where he focused on designing and applying learning algorithms for power system optimization in presence of storage. For his dissertation he was awarded the first prize by the National Academy of Engineers.