This model simulates a lunar module as it attempts to make a safe landing on the moon. Several key factors are considered as the module approaches the designated landing area and each must have values within a safe zone to avoid crashing or drifting into space. The featured model is publicly available from AnyLogic.
Once the model is running, a panel below the module displays the elements that are monitored as it lands. The X, Y, and Z values show the module's distance from its target, while VX, VY, and VZ display the module's speed. Horizontal acceleration is adjusted via the joystick in the central control panel and vertical acceleration is managed with the rocket control. The current fuel level is also shown in the central panel.
Several camera angles may also be selected to view the module as it lands.
A set of system parameters establishes the maximum speed at which the module can travel as it lands. Exceeding those values will result in a crash.
The module's distance from the target at the beginning of the simulation is defined by the inDistance variables for X and Y. As the module moves, the getDistance functions determine its new distance.
Power parameters define how both horizontal and vertical acceleration change the speed of the module.
Step 1 - Perform a run with random actions to check Pathmind Helper setup.
Go through the steps of the Pathmind Helper Setup guide to make sure that everything is working correctly. Completing this step will also demonstrate how the model performs when using random actions instead of following a trained policy.
Step 2 - Examine the reinforcement learning elements.
Observations - Nine observations exist and include everything that must be monitored for the module to make a safe landing, such as distance and speed.
Reward Variables - The reward variables for the model are defined as remaining fuel, distance from the target, whether the module landed safely or crashed, throttle direction, and velocity.
Actions - This model contains three decision points (X, Y, or Z thrusters) with three possible actions to control the throttles.
The actions established in the do_action function include all the possible movements that the module can make: forward, backward, or holding steady along x, y, and z.
Done - The completion of the simulation is defined in the simulation itself. There are four possible scenarios that can end this simulation: crashing, landing, taking too long to land, drifting too far from the target, and running out of fuel.
Event Trigger - This model uses the Pathmind Event Trigger with the recurrence defined in an outside parameter. The trigger adjusts the module's speed at the rate of once per second. We have also included other conditions to end the simulation in this event trigger.
Step 3 - Export model and get Pathmind policy.
Complete the steps in the Exporting Models and Training guide to export your model, complete training, and download the Pathmind policy.
reward += after.fuelRemaining - before.fuelRemaining;
reward += Math.abs(before.distanceToX) - Math.abs(after.distanceToX);
reward += Math.abs(before.distanceToY) - Math.abs(after.distanceToY);
reward += before.distanceToZ - after.distanceToZ;
reward += after.landed == 1 ? 3 : 0;
reward -= after.crashed == 1 ? 0.3 : 0;
reward -= after.gotAway == 1 ? 1 : 0;
reward -= before.distanceToZ <= 100. / 1500. && Math.abs(after.speedX) > 200 ? 0.01 : 0;
reward -= before.distanceToZ <= 100. / 1500. && Math.abs(after.speedY) > 200 ? 0.01 : 0;
reward -= before.distanceToZ <= 100. / 1500. && Math.abs(after.speedZ) > 200 ? 0.01 : 0;
Step 4 - Run training.
Click Start Training. You will receive a message notifying you that training has begun, and an email when it completes. Once that happens, you’ll be able to export the Pathmind policy.
Step 5 - Run the simulation with the Pathmind policy.
Back in AnyLogic, update the Mode field to Use Policy. Click Browse and select the downloaded policy file.
Run the simulation again. Make sure the Use AI checkbox is selected so that manual control is not needed. Note that the module will land on target, within the time limit, and without depleting its fuel resource.
You may also want to run the included Monte Carlo to see how reliably the Moon Lander is able to reach the target.
As you can see, the Moon Lander doesn't always reach the target safely. Try adjusting the reward function to see if you can improve the results.
While making a safe landing on the moon might not be a daily headache for most simulation modelers, other use cases can benefit from similarly-structured models. Robotic machinery might need to consider safety and spacing in work zones as it moves through a factory. Autonomous vehicle manufacturers could look to optimize reaching target destinations within speed limits while simultaneously monitoring power levels. In these cases and many others, reinforcement learning offers new insights and superior performance in optimizing simulation models.