To validate the performance of a trained Pathmind policy, you will need to run a Monte Carlo experiment in AnyLogic. A Monte Carlo experiment automatically executes thousands of simulation runs, using random initial seeds, which can be used to validate the results of the policy.
Step 1: Determine which metrics to track.
In your AnyLogic model, note which metrics you'd like to measure (you'll have to reference them in the wizard). This can range from specific variables, to AnyLogic histogram data, or anything else.
Step 2: Create a new Monte Carlo experiment.
Step 3: Configure your Monte Carlo.
Select a name for the experiment.
Number of Iterations (i.e. the number of simulation runs) to 100, the minimum number we recommend.
Define the metrics you need to track.
Title - This is the graph label, and it can be anything.
Expression - The metric you would like to track. Typically, this is a variable in your AnyLogic simulation.
Number of Intervals - Range of possible values in your metric.
Initial Interval Size - The size of each bar in your bar chart.
In the case of the Coffee Shop model, here is what you enter:
Title - "Balked Customers"
root.custFailExit.countPeds() This refers to the point on the Coffee shop model where customers exit without being served.
custFailExit is a pre-baked method within AnyLogic that comes with the
countPeds() function attached. You have to know which values you want to count, and how they are derived from the model, in order to enter this term for you custom models.
Number of intervals - This is the maximum number of bars you want on your histogram. A rough, order of magnitude expectation of the results you expect, such as up to 200 balked customers, is what is necessary to know what to enter here.
Initial interval size - In this model, each customer counts as one. They are discrete units, rather than larger groupings or continuous amounts.
So the Coffee Shop Monte Carlo parameters would look like this:
Step 4: Run your Monte Carlo.
Change the Pathmind Helper "Mode" to "Use Policy" and point it to the
policy.zip that you had exported from Pathmind.
Run your Monte Carlo. This may take several hours, depending on the length of your simulation.
At the conclusion of your Monte Carlo, you should see a normal distribution. The next step is to compare these results with a baseline. Typically, comparable baselines include:
- Random Actions
- Optimizers such as OptQuest
Monte Carlo Using Pathmind Policy
As you can see, using the trained Pathmind policy, the number of balked customers is about 75 on average. Lower is better in this case.
Monte Carlo Using Random Actions
In comparison, the average number of balked customers is about 225 using random actions. That's a big gap!
This is far worse than the trained policy, meaning that the trained policy drastically outperformed our baseline of random actions.