Reinforcement learning variants for stochastic dynamic combinatorial optimization problems in transportation / Florentin D. Hildebrandt, Alexander Bode, Marlin W. Ulmer, Dirk C. Mattfeld
VerfasserHildebrandt, Florentin D. ; Bode, Alexander ; Ulmer, Marlin Wolf ; Mattfeld, Dirk C.
ErschienenMagdeburg, Germany : Otto-von-Guericke-Universität Magdeburg, Fakultät für Wirtschaftswissenschaft, [2023]
Umfang1 Online-Ressource (38 Seiten, 0,82 MB) : Diagramme
SerieWorking paper series ; 2023, no. 06
 Das Dokument ist frei verfügbar
Reinforcement learning variants for stochastic dynamic combinatorial optimization problems in transportation [0.82 mb]
With rising customer expectations and increasing computational potential many transportation services face real-time decision making in stochastic and dynamic environments. They often need to find and adapt complex plans that are effective now but also flexible with respect to future developments. Mathematically these three challenges of searching the large and complex decision space for effective and flexible decisions are reflected in the three parts of the famous Bellman Equation namely the reward function (effective) the value function (flexible) and the decision space (search). In the transportation literature reinforcement learning (RL) has shown potential to quickly evaluate the reward- and value function of the Bellman Equation for a limited number of decisions but struggles to search a complex constrained decision space immanent in most transportation problems. The question of how to combine the thorough search of the complex decision space with RL-evaluation techniques is still open. We propose three RL-based solutions each inspired by one component of the Bellman Equation to search for and evaluate decisions in an integrated manner. The first and second method learn to dynamically manipulate the reward function and decision space to encourage effective and flexible and prohibit inflexible decisions respectively. The third method models the Bellman Equation as a mixedinteger linear programming formulation in which the value function is given by a neural network approximator. We compare our proposed solution methods in a structured analysis for carefully designed problem classes based on longhaul medium-haul and short-haul transportation logistics. We demonstrate the overall effectiveness of our methods compared to prominent benchmark methods and highlight how the methods’ performances depend not only on the problem classes but also on the instances’ parameterizations