Simulation metrics give a snapshot of policy performance at each training iteration. Each simulation metric corresponds to a reward variable that you defined in your AnyLogic model.
The purpose of providing simulation metrics is twofold:
- Enable comparisons between multiple experiments and reward functions using "ground truth" metrics.
- Quickly determine if a reward function results in the desired behavior without the need to run the policy in AnyLogic.
Interpreting Simulation Metrics
You can think of simulation metrics as a real-time Monte Carlo experiment. By sidestepping the need to run Monte Carlos, you can save many hours of work and waiting.
Pathmind will calculate the average value of each reward variable over all episodes in the latest training iteration. These metrics provide a snapshot of policy performance as training progresses. The episode count equals the number of iterations in the Monte Carlo experiment.
The sparklines associated with each metric show historical performance over the entire training. For example, the sparkline above for "successful" (the metric we care about) demonstrates that the policy is improving after each iteration because there is an upward trend.
Tips & Tricks
You can include any metric taken from the simulation in reward variables, even if it is not used in your reward function. This can be leveraged to track the policy's performance on all key business metrics for each Pathmind experiment.