In the context of Markov Decision Processes (MDPs), the framework of forward-backward probability propagation on factor graphs has proven to be useful for finding optimal policies. However, in cases involving vector rewards, there is a need to evaluate a trade-off among constituent objectives. In this work, assuming multiple rewards, we show how to use the framework of belief propagation for dynamically generating the Pareto front and propagating it as a forward flow distribution. The idea is applied to path planning on discrete 1D and 2D grids where different sets of states have vector rewards in the form of priors.
Belief Propagation of Pareto Front in Multi-Objective MDP Graphs
Buonanno A.;
2023-01-01
Abstract
In the context of Markov Decision Processes (MDPs), the framework of forward-backward probability propagation on factor graphs has proven to be useful for finding optimal policies. However, in cases involving vector rewards, there is a need to evaluate a trade-off among constituent objectives. In this work, assuming multiple rewards, we show how to use the framework of belief propagation for dynamically generating the Pareto front and propagating it as a forward flow distribution. The idea is applied to path planning on discrete 1D and 2D grids where different sets of states have vector rewards in the form of priors.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.