Average Reward Reinforcement Learning
Principal Investigator: Vaneet Aggarwal
Most real world problems have infinite horizon average reward objectives, while this case has not been as well understood. The key reason is that the contraction operation that gives the key results in the discounted setup no longer holds. In our work, we aim to give the foundations of average reward reinforcement learning.
Representative Publications
Bhrij Patel, Wesley A. Suttle, Alec Koppel, Vaneet Aggarwal, Brian M. Sadler, Amrit Singh Bedi, and Dinesh Manocha, "Global Optimality without Mixing Time Oracles in Average-reward RL via Multi-level Actor-Critic," in Proc. ICML, Jul 2024
Qinbo Bai, Washim Uddin Mondal, and Vaneet Aggarwal, "Regret Analysis of Policy Gradient Algorithm for Infinite Horizon Average Reward Markov Decision Processes," in Proc. AAAI, Feb 2024.
Qinbo Bai, Washim Uddin Mondal, and Vaneet Aggarwal, "Learning General Parameterized Policies for Infinite Horizon Average Reward Constrained MDPs via Primal-Dual Policy Gradient Algorithm," in Proc. Neurips, Dec 2024
Swetha Ganesh, Washim Uddin Mondal, Vaneet Aggarwal, "Variance-Reduced Policy Gradient Approaches for Infinite Horizon Average Reward Markov Decision Processes," arXiv, Apr 2024
Swetha Ganesh and Vaneet Aggarwal, "An Accelerated Multi-level Monte Carlo Approach for Average Reward Reinforcement Learning with General Policy Parametrization," arXiv, Jul 2024.

