Non-Linear Utilities and Constraints in Online Decision Making: Learning-based approach
Principal Investigator: Vaneet Aggarwal
Many real-world problems require optimization of an objective that is non-linear in cumulative rewards, e.g., fairness objectives in scheduling. Further, most engineering problems have constraints, e.g., reaching safely, average power constraints. In this work, we aim to innovate on efficient algorithms taking into account non-linear objectives and constraints. Reinforcement Learning algorithms such as DQN owe their success to Markov Decision Processes, and the fact that maximizing the sum of rewards allows using backward induction and reduce to the Bellman optimality equation. These fail to hold in the presence of non-linear objectives and/or constraints making the analysis challenging. We have proposed multiple forms of algorithms with provable guarantees for these problems. As shown alongside, for a network traffic optimization problem, the proposed Actor-Critic (IFRL AC) and Policy Gradient (IFRL PG) approaches significantly outperform the standard approaches that do not explicitly account for non-linearity. In the Table below, we consider two forms of algorithms with a mention of the results with either concave utility functions or constraints or both. The model-based approaches use posterior or optimistic sampling approaches, the model-free approaches are based on policy gradients or primal-dual approaches. X mentions that we have the results for those cases, and the others represent the scenarios where we are working towards efficient results.
|
Concave Utility
|
Constraints
|
Concave Utility and Constraints
|
Model-Based
|
X
|
X
|
X
|
Model-Free Finite State Space
|
X
|
X
|
X
|
Model-Free Infinite State Space
|
X
|
X
|
|
Representative Publications
- Ather Gattami, Qinbo Bai, and Vaneet Agarwal, "Reinforcement Learning for Multi-Objective and Constrained Markov Decision Processes," in Proc. AISTATS, Apr 2021
- Qinbo Bai, Vaneet Aggarwal, Ather Gattami, "Provably Efficient Model-Free Algorithm for MDPs with Peak Constraints," Accepted to Journal of Machine Learning Research, Jun 2022.
- Mridul Agarwal, Qinbo Bai, and Vaneet Aggarwal, "Concave Utility Reinforcement Learning with Zero-Constraint Violations," Accepted to Transactions on Machine Learning Research, Nov 2022.
- Mridul Agarwal, Qinbo Bai, Vaneet Aggarwal, "Regret Guarantees for Model-Based Reinforcement Learning with Long-Term Average Constraints," in Proc. UAI, Aug 2022.
- Mridul Agarwal and Vaneet Aggarwal, "Reinforcement Learning for Joint Optimization of Multiple Rewards," Accepted to Journal of Machine Learning Research, Jul 2022.
- Mridul Agarwal, Vaneet Aggarwal, and Tian Lan, "Multi-Objective Reinforcement Learning with Non-Linear Scalarization," in Proc. AAMAS, May 2022.
- Qinbo Bai, Amrit Bedi, Mridul Agarwal, Alec Koppel, and Vaneet Aggarwal, "Achieving Zero Constraint Violation for Constrained Concave Utility Reinforcement Learning via Primal-Dual Approach," Submitted to JMLR, 2022
- Qinbo Bai, Amrit Bedi, Mridul Agarwal, Alec Koppel, and Vaneet Aggarwal, "Achieving Zero Constraint Violation for Constrained Reinforcement Learning via Primal-Dual Approach," in Proc. AAAI, Feb 2022.
- Qinbo Bai, Amrit Singh Bedi, and Vaneet Aggarwal, "Achieving Zero Constraint Violation for Constrained Reinforcement Learning via Conservative Natural Policy Gradient Primal-Dual Algorithm," in Proc. AAAI, Feb 2023.
- Qinbo Bai, Mridul Agarwal, and Vaneet Aggarwal, "Joint Optimization of Multi-Objective Reinforcement Learning with Policy Gradient Based Algorithm," Journal of Artificial Intelligence Research 74 (2022) 1565-1597, Aug 2022.