This article is only for users leveraging multiple controlled agents in their simulations. When training a policy containing multiple controlled agents, you must separate rewards into collective (i.e. global) and individual (i.e. local) rewards.
Collective rewards are metrics that express the performance of the entire system. If you were modeling a factory, total throughput is an example of a collective reward since it doesn't represent the performance of any particular agent.
In Pathmind, the reward to maximize throughput would be something like
reward += after.totalThroughput - before.totalThroughput.
Individual rewards measure the performance of particular agents in the system. Continuing with the factory example, utilization could be a metric that only applies to specific agents.
In Pathmind, the reward to maximize utilization would be similar to throughput
reward += after.utilization - before.utilization.
Bringing It Together
Once you specify both collective and individual rewards, you can combine them in your reward function.
// Collective Reward
reward += after.totalThroughput - before.totalThroughput
// Individual Reward
reward += after.utilization - before.utilization
In this way, the policy will seek to maximize both total throughput of the entire system and the utilization of individual agents.