Pathmind allows you to train multiple agents using a single shared policy. For most use cases, this is unnecessary but you may need multiple agents if your use case has the following characteristics:

  1. A credit assignment problem. This means that the policy is unable to distinguish between the individual components that lead to the success of the entire system.
  2. Multiple agents with inconsistent cycle times. Imagine a simulation of a race track in which cars can cycle around, each reaching the finish line at different, overlapping times. Because of this inconsistency, it is difficult for a single-agent policy to separate the performance of each individual car.

Step 1 - Set the number of reinforcement learning agents in your simulation

Once this is activated, run your simulation in debug mode. Notice that actions, observations, rewards, and done are now unique to each agent as denoted by the Agent ID.

Pathmind:
Step: 1
Agent ID: 0
Observations:
Array: [43.0, 0.0, 0.0, 0.0, 70.0, 0.0, 0.0, 0.0, 70.0, 0.0, 0.0, 0.9999999999999996]
Number of Observations: 12
Reward Variables:
Array: [0.0, 900.1955427073212, 2.068086687094852]
Number of Reward Variables: 3
Last Action: [18.0, 17.0, 9.0, 16.0, 14.0, 0.0]
Skip: false
Done: false
Agent ID: 1
Observations:
Array: [43.0, 0.0, 0.0, 0.0, 70.0, 0.0, 0.0, 0.0, 70.0, 0.0, 0.0, 0.9999999999999996]
Number of Observations: 12
Reward Variables:
Array: [0.0, 900.1955427073212, 2.068086687094852]
Number of Reward Variables: 3
Last Action: [0.0, 12.0, 0.0, 15.0, 5.0, 11.0]
Skip: false
Done: false

Keep in mind that each time Pathmind is triggered, it will return an action for each agent at the same time no matter what. If this causes issues (e.g. only one agent at a time needs an action), you will need to implement a "skip" condition.

Step 2 - Pass agentId as an argument to your observations, reward, and actions.

This argument will return an integer indicating the agent in question. For example, 0 refers to Agent 0, 1 refers to Agent 1, etc.

Step 3 - Craft your observations, reward, and actions based on agentId.

In the screenshot below, an action is only executed if the agentId matches the index of the current truck (i.e. the agent that needs the next action at a given point in time).

You can repeat the steps above for observations and reward only if each agent requires unique information that should not be shared among agents.

Did this answer your question?