Let’s code from scratch a discrete Reinforcement Learning rocket landing agent!
Welcome to another part of my step-by-step reinforcement learning tutorial with gym and TensorFlow 2. I’ll show you how to implement a Reinforcement Learning algorithm known as Proximal Policy Optimization (PPO) for teaching an AI agent how to land a rocket (Lunarlander-v2). By the end of this tutorial, you’ll get an idea of how to apply an on-policy learning method in an actor-critic framework in order to learn navigating any discrete game environment, next followed by this tutorial I will create a similar tutorial with a continuous environment. I’ll show you what these terms mean in the context of the PPO algorithm and also I’ll implement them in Python with the help of TensorFlow 2.
Text version tutorial: https://pylessons.com/LunarLander-v2-PPO/
Full video playlist: https://www.youtube.com/watch?v=D795oNqa-Vk&list=PLbMO9c_jUD47r9QZKpLn5CY_Mt-NFY8cC
GitHub code: https://github.com/pythonlessons/Reinforcement_Learning
Support My Channel Through Patreon:
https://www.patreon.com/PyLessons
One-Time Contribution Through PayPal:
https://www.paypal.com/paypalme/PyLessons