Finite-Time Performance Bounds and Adaptive Learning Rate Selection for TD Learning
Temporal difference learning is a widely-used algorithm to estimate the value function of an MDP under a given policy. Here, we consider TD learning with linear function approximation and a constant learning rate, and obtain bounds on its finite-time performance. Motivated by these bounds, we will present a heuristic to adapt the learning rate to achieve fast convergence. Joint work with Lei Ying and Harsh Gupta.
Contact: Kamyar Azizzadenesheli email@example.com