To validate the performance of a trained Pathmind policy, you will need to run a Monte Carlo experiment in AnyLogic. A Monte Carlo experiment automatically executes hundreds of simulation runs, using random initial seeds, which can be used to validate the results of the policy.
Step 1 - Determine which metrics to track.
In your AnyLogic model, note which metrics you'd like to measure. This can range from specific variables, to AnyLogic histogram data, or anything else.
Step 2 - Create a new Monte Carlo experiment.
Step 3 - Configure your Monte Carlo.
Select a name for the experiment.
Number of Iterations (i.e. the number of simulation runs) to 100, the minimum number we recommend.
Define the metrics you want to track.
Title - This is the graph label, and it can be anything.
Expression - The metric you would like to track. Typically, this is a variable in your AnyLogic simulation.
Number of Intervals - Range of possible values in your metric.
Initial Interval Size - The size of each bar in your bar chart.
Step 4 - Make sure the Monte Carlo model time and randomness matches what you have set in Simulation.
If Model Time and Randomness do not match what you have set in your Simulation experiment, the Monte Carlo results will be invalid!
Step 5 - Run your Monte Carlo.
Change the Pathmind Helper "Mode" to "Use Policy" and point it to the policy zip file that you had exported from Pathmind.
Run your Monte Carlo. This can take several hours, depending on the length and complexity of your simulation.
At the conclusion of your Monte Carlo, you should see a distribution. The next step is to compare these results with a baseline. Typically, comparable baselines include:
Optimizers such as OptQuest
Monte Carlo Using Pathmind Policy
As you can see, using the trained Pathmind policy, the number of balked customers is about 75 on average. Lower is better in this case.
Monte Carlo Using Random Actions
In comparison, the average number of balked customers is about 225 using random actions.
This is far worse than the trained policy, meaning that the trained policy drastically outperformed our baseline of random actions.