This article is only for users leveraging multiple controlled agents in their simulations. When training a policy containing multiple controlled agents, you must separate rewards into collective (i.e. global) and individual (i.e. local) rewards.

Collective Rewards

Collective rewards are metrics that express the performance of the entire system. If you were modeling a factory, total throughput is an example of a collective reward since it doesn't represent the performance of any particular agent.

In Pathmind, the reward to maximize throughput would be something like reward += after.totalThroughput - before.totalThroughput.

Individual Rewards

Individual rewards measure the performance of particular agents in the system. Continuing with the factory example, utilization could be a metric that only applies to specific agents.

In Pathmind, the reward to maximize utilization would be similar to throughput reward += after.utilization - before.utilization.

Bringing It Together

Once you specify both collective and individual rewards, you can combine them in your reward function.

// Collective Reward

reward += after.totalThroughput - before.totalThroughput

// Individual Reward

reward += after.utilization - before.utilization

In this way, the policy will seek to maximize both total throughput of the entire system and the utilization of individual agents.

Did this answer your question?