- Sign up for Pathmind
- Download and install the Pathmind Helper
- Download AnyLogic Professional Edition
- Download the Getting Started tutorial files
Pathmind helps you use reinforcement learning to find new paths to achieve efficiency, speed, and profitability in simulations. It offers the benefits of reinforcement learning without requiring advanced knowledge of software engineering or neural networks. When added to an AnyLogic model, Pathmind surfaces new insights into real-world problems.
This Getting Started guide shows how to set up an AnyLogic model for success with reinforcement learning. You will be provided with a simple introductory model and the functions needed for the Pathmind Helper in AnyLogic. You will export that model into the Pathmind web application to train a policy. Lastly, you will validate the trained policy in AnyLogic using a Monte Carlo experiment.
- Build a model in AnyLogic that simulates a real-world problem.
- Use the Pathmind Helper to add reinforcement learning into your AnyLogic model.
- Export the AnyLogic simulation as a standalone Java application.
- Upload the AnyLogic export to the Pathmind web app, where training takes place in the cloud.
- Download the new policy file and validate it back in AnyLogic.
Introductory Model Overview
The Pathmind introductory model features a state chart in a simple stochastic, or random, environment. Within the chart is a final goal, as well as Start and Intermediate states.
Each second, the agent must choose between two decisions: move or do nothing. It can freely cycle between Start and Intermediate, but can only reach the goal if it remains in the Intermediate state for between two and five seconds. The randomness of the model comes from a timeout that is a random number between two and five. The goal is to train the agent to wait and do nothing in order to reach the final goal.
To explore the simulation, open the model in AnyLogic. In the Pathmind Helper properties, uncheck the Enabled checkbox.
Now run the simulation. Use the Move button to transition the agent from the Start to Intermediate state. Notice that immediately clicking Move again will send the agent back to Start. Now try transitioning to Intermediate and waiting between two and five seconds before clicking Move. Since the timeout is reached, the agent can now move to the goal.
Deciding how long to wait before taking an action may seem like a simple problem in such a small model, but it quickly demonstrates how this works. Models and optimization can quickly become more complicated as models take on additional parts and complexities.
Clicking through the model is a good introduction to some reinforcement learning concepts that are used in the Pathmind Helper:
- Action - a decision to move or do nothing.
- Event Trigger - the frequency in which the Move button is clicked.
- Observation - a numerical representation of where the agent is located in the state chart at a given moment in time.
- Reward Variables - a reward given to the agent for reaching the goal.
Pathmind Helper Properties
Open the Pathmind Helper properties and re-check the Enabled checkbox. Observe the reinforcement learning elements.
Number of Agents - This indicates the number of RL agents (i.e. decision points) in your model. In this tutorial, there is only one decision point.
Observations - Observations serve as the eyes and ears of a simulation and include any information about the current state of the environment. In this model, observations are an one-hot array expressing the current location of the agent.
[Start, Intermediate, goal]
[1.0, 0.0, 0.0] - Agent in "Start" state.
[0.0, 1.0, 0.0] - Agent in "Intermediate" state.
[0.0, 0.0, 1.0] - Agent in "goal" state.
Reward Variables - Reward Variables are the building blocks of the reward function and are used to determine if an action was good or bad. Often, reward variables represent simulation metrics such as revenue or time spent. Those metrics are usually the elements that are being optimized in a simulation. They are combined within the reward function to teach the algorithm which actions are best as it seeks to optimize, usually for several metrics at once. Each action results in points, or a reward, being given.
This model grants a reward of 1 when the agent reaches the goal.
Actions - Actions define what agents are allowed to perform. In this case, there are two discrete choices: do nothing (0) or move (1).
This action (0 or 1) is passed as an argument to
doAction(action) which is then executed by the AnyLogic model.
Event Trigger - The event trigger tells Pathmind when to trigger the next action. Some models use time-based event triggers, while others relay on conditional triggers. This model performs one action every one second.
Done - Every simulation needs an endpoint. Some simulations conclude after a defined period of time, while others reach their end when certain conditions become true. In this case, the simulation ends when the goal is reached.
It is good practice to perform a test run before exporting a model. Doing so will ensure that the model is functional and the Pathmind Helper elements are working correctly.
In the Pathmind Helper properties, make sure that Use Random Actions is selected in the Mode field. In the Event Trigger section, select the Debug Mode checkbox. Now run the simulation.
Once the simulation is running, open the Debug Panel. If set up correctly, data will be printed for each action an agent performs.
At this point, the model will perform poorly since random actions are being used instead of a policy. Running a test is necessary, however, for both error checking and understanding how the model behaves before a policy is in place.
After checking for errors, the model can be exported as a standalone Java application for import into the Pathmind web application.
Uploading to Pathmind
Once a model is set up, it can be brought into the Pathmind web application for training. The application is organized in a hierarchy:
Projects → Models → Experiments
Projects correlate to a problem being modeled in a simulation. Within each project, updated variations of that model can be uploaded. Multiple experiments can then be run on those versions, and each experiment can have a different reward function.
To begin the upload process, open the Pathmind web application and select Create New Project. When prompted, upload the exported folder.
Writing the Reward Function and Training
In the next screen, you can begin writing a reward function. The right reward function is critical for getting the best results from training. Generally, it is best to start with a simple reward function. A simple function will help provide intuition about how an agent learns within a model. The variable names defined on the previous page will automatically populate in the reward function.
This example model uses
reward = after.goalReached - 0.1;.
Now click Train Policy, and training will begin. It should take about 10 minutes for training to complete.
Once training is complete, export the policy.
Back in AnyLogic, navigate to the Pathmind Helper properties. Change Mode to Use Policy. Click Browse and navigate to the downloaded policy zip file.
Now run the policy to inspect it's behavior.
With the policy in place, the agent moves to the goal in as few steps as possible. Now re-run the policy using the included Monte Carlo experiment.
Compared to using random actions, or even human trial and error, the policy reaches the goal in as few moves as possible. In a real-world application, that improved performance could equal increased revenue or more efficient processes.
- View the Pathmind Helper help article.
- Explore other tutorial models to see examples of how Pathmind can be applied in a wide range of projects and industries.
- Visit the Knowledge Base for helpful guides on using the Pathmind Helper and web application.
Have any questions? Contacts us to learn more!