Sign up for Pathmind
Download and install the Pathmind Helper
Download the Getting Started tutorial files
Pathmind helps you use reinforcement learning to find new paths to achieve efficiency, speed, and profitability in simulations. It offers the benefits of reinforcement learning without requiring advanced knowledge of software engineering or neural networks. When added to an AnyLogic model, Pathmind surfaces new insights into real-world problems.
This Getting Started guide shows how to set up an AnyLogic model for success with reinforcement learning. You will be provided with a simple introductory model and the functions needed for the Pathmind Helper in AnyLogic. You will export that model into the Pathmind web application to train a policy. Lastly, you will validate the trained policy in AnyLogic using a Monte Carlo experiment.
Build a model in AnyLogic that simulates a real-world problem.
Use the Pathmind Helper to add reinforcement learning into your AnyLogic simulation.
Upload the AnyLogic simulation to Pathmind, where training takes place in the cloud.
Download the policy and validate it back in AnyLogic.
Introductory Model Overview
The Pathmind introductory model features a state chart in a simple stochastic, or random, environment. Within the chart is a final goal, as well as Start and Intermediate states.
Each second, the agent must choose between two decisions:
do nothing. It can freely cycle between Start and Intermediate, but can only reach the goal if it remains in the Intermediate state for between two and five seconds. The randomness of the model comes from a timeout that is a random number between two and five. The goal is to train the agent to wait and do nothing in order to reach the final goal.
To explore the simulation, open the model in AnyLogic. In the Pathmind Helper properties, uncheck the Enabled checkbox.
Now run the simulation. Use the Move button to transition the agent from the Start to Intermediate state. Notice that immediately clicking Move again will send the agent back to Start. Now try transitioning to Intermediate and waiting between two and five seconds before clicking Move. Since the timeout is reached, the agent can now move to the goal.
Deciding how long to wait before taking an action may seem like a simple problem in such a small model, but it quickly demonstrates how this works. Models and optimization can quickly become more complicated as models take on additional parts and complexities.
Pathmind Helper Properties
Open the Pathmind Helper properties and re-check the Enabled checkbox. Observe the reinforcement learning elements.
Number of Agents - This indicates the number of "controlled" agents (i.e. decision points) in your model. In this tutorial, there is only one decision point.
Observations - Observations serve as the eyes and ears of a simulation and include any information about the current state of the environment. In this model, observations are a one-hot array expressing the current location of the agent.
[Start, Intermediate, goal]
[1.0, 0.0, 0.0] - Agent in "Start" state.
[0.0, 1.0, 0.0] - Agent in "Intermediate" state.
[0.0, 0.0, 1.0] - Agent in "goal" state.
Metrics - Metrics are the building blocks of the reward function and are used to determine if an action was good or bad. Often, embody important KPIs such as revenue and cost. They are combined within the reward function to teach the algorithm which actions are best as it seeks to optimize, usually for several metrics at once. Each action results in points, or a reward, being given.
This model grants a reward of 1 when the agent reaches the goal.
Actions - Actions define what agents are allowed to perform. In this case, there are two discrete choices: do nothing (0) or move (1).
This action (0 or 1) is passed as an argument to
doAction(action), which is then executed by the AnyLogic model.
Done - Every simulation needs an endpoint. Some simulations conclude after a defined period of time, while others reach their end when certain conditions become true. In this case, the simulation ends when the goal is reached.
Event Trigger - The event trigger tells Pathmind when to trigger the next action. Some models use time-based event triggers, while others rely on conditional triggers. This model performs one action each second.
Testing The Model
It is good practice to perform a test run before exporting a model. Doing so will ensure that the model is functional and the Pathmind Helper elements are working correctly.
In the Pathmind Helper properties, select the Debug Mode checkbox. Now run the simulation.
Once the simulation is running, open the Developer Panel. If set up correctly, data will be printed for each action an agent performs.
Uploading to Pathmind
Once a model is set up, it can be brought into the Pathmind web application for training. The application is organized in a hierarchy:
Projects → Models → Experiments
Projects correlate to a problem being modeled in a simulation. Within each project, updated variations of that model can be uploaded. Multiple experiments can then be run on those versions, and each experiment can have a different reward function.
To begin the upload process, complete the steps in the Exporting Models and Training guide to export your model to Pathmind.
Writing the Reward Function and Training
In the next screen, you can begin crafting a reward function. The right reward function is critical for getting the best results from training. Generally, it is best to start with a simple reward function. In this example, set the Goal to maximize the
This will automatically generate a reward function that seeks to maximize the
goalReached metric. We will dig deeper into rewards in subsequent tutorials so don't worry about it right now.
Now click Train Policy, and training will begin. It should take about 10 minutes for training to complete. Keep in mind that the Learning Progress charts won't look very interesting because this introductory example is too basic.
Once training is complete, export the policy.
Back in AnyLogic, select the policy file that you had just exported from Pathmind and run the included Monte Carlo experiment.
With the policy in place, the agent moves to the goal in as few steps as possible.
Compared to using random actions, or even human trial and error, the policy reaches the goal in as few moves as possible. In a real-world application, that improved performance could equal increased revenue or more efficient processes.
View the Pathmind Helper help articles.
Explore other tutorial models to see examples of how Pathmind can be applied in a wide range of projects and industries.
Visit the Knowledge Base for helpful guides on using the Pathmind Helper and web application.
Have any questions? Contacts us to learn more!