This introductory demo illustrates key ideas of reinforcement learning with a model that is small enough to be quickly understood. (Thanks to Kelvin Yeung of DecisionLab for this contribution!)
The Cheese Chasing simulation is based on a grid with 36 possible "point nodes," a single mouse agent, and a piece of cheese as the reward. The goal is for the mouse to find the cheese in as few steps as possible. The mouse receives the reward once it finds the cheese.
Step 1 - Run the simulation to check Pathmind Helper setup.
Go through the steps of the Check Pathmind Helper Setup guide to make sure that everything is working correctly. Completing this step will also demonstrate how the model performs using random actions instead of a policy.
Step 2 - Examine the Pathmind properties.
Observations - Observations contains six observations used in this model: the location of the mouse (1. row and 2. column), the mouse's distance from the cheese (3. row distance and 4. column distance), and the location of the cheese itself (5. row and 6. column).
Note: The observations can optionally be normalized between values of -1 and 1. This is why each observation is divided by 5.0.
Metrics - The metrics in this model track grants a reward of +1 whenever the mouse finds the cheese. Otherwise, no reward is given.
Actions - This model contains one decision point with four possible actions which is to move left, right, up, or down.
doAction function referenced in this field defines the four possible actions that a mouse can take.
Event Trigger - An action is triggered (move left, right, up or down) once per second.
Done - The simulation is complete whenever the mouse finds the cheese.
Step 3 - Export model and get Pathmind policy.
Complete the steps in the Exporting Models and Training guide to export your model, complete training, and download the Pathmind policy.
reward = after.cheeseFound; // + 1 reward whenever cheese is found
A policy will be generated after training completes. A trained policy file is included in the tutorial folder.
Step 4 - Run the simulation with the Pathmind policy.
Once you’ve downloaded the Pathmind policy, return to AnyLogic. Open the Pathmind Helper properties and change the "Mode" radio button to Use Policy. Click Browse and locate the downloaded policy file.
Now run the model again and notice the reinforcement learning algorithm at work. You will see the mouse finds the cheese in the minimum number of steps no matter where it is located.
The Cheese Chasing example demonstrates the learning ability of reinforcement learning.
More realistic applications for this type of model might be helping robotic equipment move through a factory by the shortest path, or an autonomous vehicle navigate to its destination.