1. Complete the Supply Chain Optimization tutorial.
  2. Download the tutorial files.

Simulation Overview

Routing product deliveries cost-effectively can be a challenge for large manufacturers. Deciding which factories should deliver to which warehouses based on real-world trial and error often results in lost time and unnecessary costs. The team at CNX Consulting Partners turned to simulation modeling when faced with that problem. CNX built an AnyLogic model that allowed them to virtually test delivery strategies. Then, the team brought that simulation to Pathmind to explore how reinforcement learning could determine the best routes for speed and profitability.

The simulation itself features two sources of goods: Factory 1 and Factory 2. Goods are specified to leave the factories once per hour. The final destinations, or sinks, for the goods are Warehouse 1 and Warehouse 2. Delays and blocks were set up along each route, allowing a maximum of one product through the selectOutput block at a time. 

Once the routes were in place, additional elements were needed to establish the dynamics of the model. Stochastic elements were added representing the delivery delay between each factory and named time_F1_W1, time_F1_W2, and so on.  In this model, the cutoff time for a profitable delivery is 24 hours and the maximum amount of time a delivery can take to complete is 36 hours. To make this use case more realistic, the time variables (i.e. delivery delay) are randomly set to an integer between 12 and 36 once per week in which each integer represents an hour.

A changeDelays event was also added, allowing the time variables to automatically update the delay events along the routes twice every two months.

Next, items were added to track profitability outcomes. The variables profitable_deliveries and nonprofitable_deliveries count how many deliveries made money, while w1_count and w2_count keep track of how many deliveries end up at each warehouse.

Similarly, profit_percent finds the percentage of profitable deliveries and flow_percent tracks the portion of deliveries that are sent to each warehouse. These variables are updated by the calcProfit event.


Step 1 - Perform a run with random actions to check Pathmind setup.

Go through the steps of the Check Pathmind Helper Setup guide to make sure that everything is working correctly. Completing these steps will also demonstrate how the model performs using random actions instead of a policy.

Step 2 - Examine the reinforcement learning components.

Observations - Using the factory variable, the observations track whether products are leaving Factory 1 (0) or Factory 2 (1). It also observes the delay times for the route ahead.

Reward Variables - This function track profitable deliveries and minimizes unprofitable deliveries.

.Actions - This model contains one decision point with two possible actions.

Finally, the doAction() function executes the actions prescribed by the policy. This function tells the model which warehouse to route deliveries to using the f1_warehouse and f2_warehouse booleans.

Done - The Done field determines when the simulation will end. This simulation’s time is defined as one year in the Simulation: Main properties.

Step 3 - Export model and get Pathmind policy.

Complete the steps in the Exporting Models and Training guide to export your model, complete training, and download the Pathmind policy.

Reward Function:

reward += after.profitableDeliveries - before.profitableDeliveries; // Maximize profitable deliveries
reward -= after.unprofitableDeliveries - before.unprofitableDeliveries; // Minimize unprofitable deliveries

We have also included a trained policy file in the tutorial folder.

Step 4 - Run the simulation with the Pathmind policy.

Once you’ve downloaded the Pathmind policy, return to AnyLogic. Open the Pathmind Helper properties and change the Mode radio button to Use Policy. Click Browse and locate the downloaded policy file you will use in the policyFile text field.

Run the model again using the included Monte Carlo experiment and observe the improvement now that the reinforcement learning policy is being used. Deliveries are being made to the best locations for maximum profitability.

Now compare this to random actions which is our baseline.


Adding Pathmind to this simulation demonstrates how reinforcement learning can be used to increase efficiency for manufacturers and distributors. While the delivery routes featured in this model are simple, they show the basic ideas of how these tools can be used to optimize more advanced models.

In reviewing Pathmind, Stefan Hauers, Consultant at CNX Consulting Partners, said that “it takes some time to understand how reinforcement learning works, what actions can be optimized, and which algorithm should be chosen.” 

He added that “since Pathmind reduces the effort of implementation of reinforcement learning to a minimum, I could fully focus on my experiments. With the support of Pathmind’s development team, I achieved well-performing solutions for my use case and will continue to create more and more complex use cases.”

Please visit CNX Consulting to read more about their supply chain management and transformation expertise.

Did this answer your question?