After exporting your AnyLogic simulation to Pathmind, you will be asked to write a reward function.
As a starting point, we recommend that you try two things.
Start with the default "after minus before" formulation of the reward function (detailed explanation here) to build intuition. This formulation compares the impact of an action "before" and "after" an action is taken.
reward += after.reward - before.reward; // Maximizes the reward
reward -= after.reward - before.reward; // Minimizes the reward
Maximizing or minimizing a metric using this reward function is usually enough to achieve your business objectives.
Test each reward metric independently as a sanity check. For example, first train a policy using "reward0" only.
Now train a second policy using "reward1" only.
In Pathmind, you may run as many experiments in parallel as you'd like by clicking the + New Experiment button.
You can leverage this feature to quickly test the impact of each reward on the policy's behavior.
Once you provide a reward function, click the Train Policy button to start training.