Multi-Objective Reinforcement Learning

Most real-world problems require balancing several competing goals at once rather than optimising a single reward. These rewards typically include standard performance measures, but can also incorporate safety and alignment objectives in a far more natural and designer-friendly way than single-objective approaches.

Multi-objective reinforcement learning (MORL) extends reinforcement learning to explicitly represent and reason about these trade-offs, and ARAAC has played a leading role in establishing it as a sub-field. Our work spans the theory of MORL and its application to building safe, human-aligned autonomous agents.

Key Researchers

Professor Peter Vamplew

Federation University Australia

Peter is co-founder/co-leader of ARAAC, and a senior member of the Future of Life Institute’s Existential AI safety Research Community. He has played a leading role in establishing multi-objective reinforcement learning (MORL) as a sub-field of reinforcement learning, explicitly designed for problems with multiple conflicting objectives (which describes most real-world problems)

View profile

Professor Richard Dazeley

Deakin University

Richard is the Leader of the Machine Intelligence Lab at Deakin University (Geelong), and the Deputy Head of School. He is a leading researcher in the Human-alignment of autonomous agents through Safe, Ethical, Explainable and Interactive methods utilising Multiobjective Reinforcement Learning (MORL) and is a senior member of the AI existential Safety Community

View profile

Associate Professor Cameron Foale

Federation University Australia

Cameron has an interest in building usable, fair, transparent, and scalable connected eHealth systems, and applying AI techniques to time-series data.

View profile

Hadassah Harland (Haddie)

Deakin University

Haddie is a PhD student at Deakin University (Geelong) and Top-Up Scholarship recipient with CSIRO’s Data61 Robotics and Autonomous Systems Group, with an interest in Human-Machine Collaboration.

View profile

Scott Johnson

Deakin University

Scott is currently studying for his Honours degree at Deakin University, with a focus on the transfer of safety knowledge between environments using Multi-Objective Reinforcement Learning. He has worked as a research assistant on several ML projects for both Deakin University and Federation University.

View profile

ARAAC Publications

K. Ding, P. Vamplew, C. Foale, R. Dazeley (2025). An empirical investigation of value-based multi-objective reinforcement learning for stochastic environments The Knowledge Engineering Review
P. Vamplew, C. Foale, C. F. Hayes, P. Mannion, E. Howley, R. Dazeley, S. Johnson, J. Källström, G. Ramos, R. Rădulescu, W. Röpke, D. M. Roijers (2024). Utility-Based Reinforcement Learning: Unifying Single-objective and Multi-objective Reinforcement Learning International Conference on Autonomous Agents and Multiagent Systems (AAMAS)
H. Harland, R. Dazeley, B. Nakisa, F. Cruz, P. Vamplew (2023). AI apology: interactive multi-objective reinforcement learning for human-aligned AI Neural Computing and Applications
P. Vamplew, C. Foale, R. Dazeley (2022). The impact of environmental stochasticity on value-based multiobjective reinforcement learning Neural Computing and Applications
C. F. Hayes, R. Rădulescu, E. Bargiacchi, J. Källström, M. Macfarlane, M. Reymond, T. Verstraeten, L. M. Zintgraf, R. Dazeley, F. Heintz, E. Howley, A. A. Irissappane, P. Mannion, A. Nowé, G. Ramos, M. Restelli, P. Vamplew, D. M. Roijers (2022). A practical guide to multi-objective reinforcement learning and planning Autonomous Agents and Multi-Agent Systems (JAAMAS)
P. Vamplew, B. J. Smith, J. Källström, G. Ramos, R. Rădulescu, D. M. Roijers, C. F. Hayes, F. Heintz, P. Mannion, P. J. K. Libin, R. Dazeley, C. Foale (2022). Scalar reward is not enough: a response to Silver, Singh, Precup and Sutton (2021) Autonomous Agents and Multi-Agent Systems (JAAMAS)
P. Vamplew, C. Foale, R. Dazeley, A. Bignold (2021). Potential-based multiobjective reinforcement learning approaches to low-impact agents for AI safety Engineering Applications of Artificial Intelligence
T. T. Nguyen, N. D. Nguyen, P. Vamplew, S. Nahavandi, R. Dazeley, C. P. Lim (2020). A multi-objective deep reinforcement learning framework Engineering Applications of Artificial Intelligence
P. Vamplew, R. Dazeley, C. Foale, S. Firmin, J. Mummery (2018). Human-aligned artificial intelligence is a multiobjective problem Ethics and Information Technology
T. G. Karimpanal, E. Wilhelm (2017). Identification and off-policy learning of multiple objectives using adaptive clustering Neurocomputing
P. Vamplew, R. Issabekov, R. Dazeley, C. Foale, A. Berry, T. Moore, D. Creighton (2017). Steering approaches to Pareto-optimal multiobjective reinforcement learning Neurocomputing
P. Vamplew, R. Dazeley, C. Foale (2017). Softmax exploration strategies for multiobjective reinforcement learning Neurocomputing
D. M. Roijers, P. Vamplew, S. Whiteson, R. Dazeley (2013). A survey of multi-objective sequential decision-making Journal of Artificial Intelligence Research