After exporting your AnyLogic simulation to Pathmind, you will be asked to write a reward function. A reward function describes how a reinforcement learning agent should behave.
As a starting point, we recommend that you try two things.
Simply maximize or minimize a metric that you care about to see how well the policy learns.
Maximizing or minimizing a metric is usually enough to achieve your business objectives.
Test each reward metric independently as a sanity check. For example, first train a policy using "reward0" only.
Now train a second policy using "reward1" only.
In Pathmind, you may run as many experiments in parallel as you'd like by clicking the + New Experiment button.
You can leverage this feature to quickly test the impact of each individual reward on the policy's behavior.
Once you provide a reward function, click the Train Policy button to start training.