24 April 2025
Stara Kotłownia
Europe/Warsaw timezone

Comparing the Efficiency of Selected Reinforcement Learning Algorithms in Stability Control and Navigation Tasks

24 Apr 2025, 10:45
30m
SK 04/05 (Stara Kotłownia)

SK 04/05

Stara Kotłownia

Warsaw University of Technology, Main Campus

Speaker

Oskar Wyłucki

Description

This paper presents a comprehensive comparison of the ef-
ficiency of four key reinforcement learning algorithms (DQN,
PPO, REINFORCE, and A2C) in stability control and nav-
igation tasks. The study was conducted in two test environ-
ments: Cart Pole, representing a basic balance maintenance
task, and Lunar Lander, constituting a complex navigational
challenge requiring precise landing. As part of the research,
the algorithms were implemented using various neural net-
work architectures adapted to the specific requirements of
each environment. For the Cart Pole environment, simpler
architectures were applied, while for the more complex Lu-
nar Lander environment, enhanced networks with additional
learning process stabilization techniques were implemented,
such as layer normalization and orthogonal initialization.

The research methodology focuses on a systematic analysis
of key performance aspects, including convergence speed,
sample efficiency, adaptability to different initial conditions,
and learning process stability over time. For each algorithm
and environment, standardized experiments were conducted
with detailed performance metrics recorded throughout the
training process. The experiments revealed significant dif-
ferences in how algorithms perform under varying levels of
environmental complexity.

The comparative analysis revealed significant differences
between algorithms in terms of learning approach, training
process stability, and ability to efficiently utilize accumulated
experiences. These observations emphasize that selecting an
appropriate algorithm strongly depends on the specifics of the
particular task, environmental complexity, and available com-
putational resources. This research provides practical insights
into algorithm selection and configuration for reinforcement
learning tasks of varying complexity in the domains of stability
control and navigation.

Index Terms—reinforcement learning, deep Q-network, prox-
imal policy optimization, REINFORCE, advantage actor-critic,
stability control, navigation tasks, Cart Pole, Lunar Lander

Author

Co-author

Dr Radosław Roszczyk (Wydział Elektryczny)

Presentation materials