Multi-Objective Reinforcement Learning
Most real-world problems require balancing several competing goals at once rather than optimising a single reward. These rewards typically include standard performance measures, but can also incorporate safety and alignment objectives in a far more natural and designer-friendly way than single-objective approaches.
Multi-objective reinforcement learning (MORL) extends reinforcement learning to explicitly represent and reason about these trade-offs, and ARAAC has played a leading role in establishing it as a sub-field. Our work spans the theory of MORL and its application to building safe, human-aligned autonomous agents.
Key Researchers
Associate Professor Cameron Foale
Federation University Australia
Cameron has an interest in building usable, fair, transparent, and scalable connected eHealth systems, and applying AI techniques to time-series data.
Ethan Watkins (EJ)
ARAAC
EJ is a chemist by training but has pivoted his career towards AI safety research to ensure that advances in AI result in human flourishing. He is particularly interested in Reinforcement Learning and is excited to explore the potential of multi-objective approaches to train agents that are better aligned with human goals. He is currently working with ARAAC researchers as an intern.
Hadassah Harland (Haddie)
Deakin University
Haddie is a PhD student at Deakin University (Geelong) and Top-Up Scholarship recipient with CSIRO’s Data61 Robotics and Autonomous Systems Group, with an interest in Human-Machine Collaboration.
Professor Peter Vamplew
Federation University Australia
Peter is co-founder/co-leader of ARAAC, and a senior member of the Future of Life Institute’s Existential AI safety Research Community. He has played a leading role in establishing multi-objective reinforcement learning (MORL) as a sub-field of reinforcement learning, explicitly designed for problems with multiple conflicting objectives (which describes most real-world problems)
Professor Richard Dazeley
Deakin University
Richard is the Leader of the Machine Intelligence Lab at Deakin University (Geelong), and the Deputy Head of School. He is a leading researcher in the Human-alignment of autonomous agents through Safe, Ethical, Explainable and Interactive methods utilising Multiobjective Reinforcement Learning (MORL) and is a senior member of the AI existential Safety Community
Scott Johnson
Deakin University
Scott is currently studying for his Honours degree at Deakin University, with a focus on the transfer of safety knowledge between environments using Multi-Objective Reinforcement Learning. He has worked as a research assistant on several ML projects for both Deakin University and Federation University.
ARAAC Publications
- (2025). An empirical investigation of value-based multi-objective reinforcement learning for stochastic environments The Knowledge Engineering Review
- (2024). Utility-Based Reinforcement Learning: Unifying Single-objective and Multi-objective Reinforcement Learning International Conference on Autonomous Agents and Multiagent Systems (AAMAS)
- (2023). AI apology: interactive multi-objective reinforcement learning for human-aligned AI Neural Computing and Applications
- (2022). The impact of environmental stochasticity on value-based multiobjective reinforcement learning Neural Computing and Applications
- (2022). A practical guide to multi-objective reinforcement learning and planning Autonomous Agents and Multi-Agent Systems (JAAMAS)
- (2022). Scalar reward is not enough: a response to Silver, Singh, Precup and Sutton (2021) Autonomous Agents and Multi-Agent Systems (JAAMAS)
- (2021). Potential-based multiobjective reinforcement learning approaches to low-impact agents for AI safety Engineering Applications of Artificial Intelligence
- (2020). A multi-objective deep reinforcement learning framework Engineering Applications of Artificial Intelligence
- (2018). Human-aligned artificial intelligence is a multiobjective problem Ethics and Information Technology
- (2017). Identification and off-policy learning of multiple objectives using adaptive clustering Neurocomputing
- (2017). Steering approaches to Pareto-optimal multiobjective reinforcement learning Neurocomputing
- (2017). Softmax exploration strategies for multiobjective reinforcement learning Neurocomputing
- (2013). A survey of multi-objective sequential decision-making Journal of Artificial Intelligence Research