Simulation metrics provide a snapshot of policy performance at each training iteration. Each simulation metric corresponds to a metric that you defined in your AnyLogic model.
The purpose of providing simulation metrics is twofold:
Easily compare multiple experiments and reward functions using "ground truth" metrics.
Quickly determine if a reward function results in the desired behavior without the need to run the policy back in AnyLogic.
Interpreting Simulation Metrics
You can think of simulation metrics as a real-time Monte Carlo experiment. By sidestepping the need to run a Monte Carlo, you can save many hours of time.
Pathmind will calculate the average value and variance of each metric over all episodes in the latest training iteration. These metrics provide an estimate of policy performance as training progresses.
The sparklines associated with each metric show historical performance over the entire training. For example, the sparkline above for "avgWaitTime" (the metric we care about the most) demonstrates that the policy is improving after each iteration because it is decreasing as training progresses.
Tips & Tricks
You can include any metric taken from the simulation even if it is not used in your reward function. This can be leveraged to track the policy's performance on all key business metrics for each Pathmind experiment.