The use of estimation techniques on stochastic models to solve control problems is an emerging paradigm that falls under the rubric of Active Inference (AI) and Control as Inference (CAI). In this work, we use probability propagation on factor graphs to show that various algorithms proposed in the literature can be seen as specific composition rules in a factor graph. We show how this unified approach, presented both in probability space and in log of the probability space, provides a very general framework that includes the Sum-product, the Max-product, Dynamic programming and mixed Reward/Entropy criteria-based algorithms. The framework also expands algorithmic design options that lead to new smoother or sharper policy distributions. We propose original recursions such as: a generalized Sum/Max-product algorithm, a Smooth Dynamic programming algorithm and a modified versions of the Reward/Entropy algorithm. The discussion is carried over with reference to a path planning problem where the recursions that arise from various cost functions, although they may appear similar in scope, bear noticeable differences. We provide a comprehensive table of composition rules and a comparison through simulations, first on a synthetic small grid with a single goal with obstacles, and then on a grid extrapolated from a real-world scene with multiple goals and a semantic map.

A Unifying View of Estimation and Control Using Belief Propagation With Application to Path Planning

Buonanno, Amedeo
2022-01-01

Abstract

The use of estimation techniques on stochastic models to solve control problems is an emerging paradigm that falls under the rubric of Active Inference (AI) and Control as Inference (CAI). In this work, we use probability propagation on factor graphs to show that various algorithms proposed in the literature can be seen as specific composition rules in a factor graph. We show how this unified approach, presented both in probability space and in log of the probability space, provides a very general framework that includes the Sum-product, the Max-product, Dynamic programming and mixed Reward/Entropy criteria-based algorithms. The framework also expands algorithmic design options that lead to new smoother or sharper policy distributions. We propose original recursions such as: a generalized Sum/Max-product algorithm, a Smooth Dynamic programming algorithm and a modified versions of the Reward/Entropy algorithm. The discussion is carried over with reference to a path planning problem where the recursions that arise from various cost functions, although they may appear similar in scope, bear noticeable differences. We provide a comprehensive table of composition rules and a comparison through simulations, first on a synthetic small grid with a single goal with obstacles, and then on a grid extrapolated from a real-world scene with multiple goals and a semantic map.
2022
Belief propagation
dynamic programming
Markov decision process
path planning
reinforcement learning
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12079/73328
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 5
social impact