As you commence your reinforcement learning journey, the starting point is always the action space. A well-framed action space makes it much easier for a reinforcement learning policy to learn desired behavior.
Step 1: Identify the type of action space that is most relevant for your use case.
Pathmind supports three types of action spaces.
Discrete Actions - A discrete action represents an explicit choice. For example, a traffic light has two discrete choices of 0 (stop) or 1 (go).
Continuous Actions - A continuous action represents a continuous range of choices. Consider a knob that controls power. You can ask the policy to decide how much to rotate the knob.
Tuple Actions - A tuple action represents multiple, simultaneous decisions. You can have multiple discrete actions, multiple continuous actions, or a mix of both. For instance, to drive a car, you need to both press on the gas and turn the steering wheel simultaneously.
Step 2: Map actions to all components in your simulation.
You must assign actions in a way that gives the policy as much freedom and control as possible. Do not micromanage your policy as it will hinder it's ability to learn on its own.
There are basically two strategies that we recommend you try.
The Control Center Approach
The control center approach basically treats the policy as your central control center. This approach is best suited for use cases in which the system must coordinate together. For example, if you were simulating an airport control tower, each plane cannot decide to take off on it's own. It must listen to the control tower to avoid collisions with other planes.
In Pathmind terminology, this is when the number of controlled agents is set to 1.
The Multi-Agent Approach
The multi-agent approach allows the policy to control each agent independently.
This is suitable for use cases in which each agent should seek to maximize it's own individual performance. For example, if you were modeling a race track, each car should seek to complete the race as fast as possible.
In Pathmind terminology, this is when the number of controlled agents is greater than 1.
Step 3: Determine how actions are triggered in your simulation.
Finally, you must determine when an action should be triggered. Only trigger actions when necessary. Otherwise, you risk confusing the policy and it will not learn.
In Pathmind, you can trigger actions in two ways:
Cyclic Triggers - These are actions that are triggered on a fixed, recurring interval. For example, if you were simulating a power plant, you could ask the policy to check electricity production once per hour and adjust the system accordingly in order to meet a production target.
Conditional Triggers - These are actions that are triggered based on a condition. For example, if you were modeling a retail store, you can trigger a restock only when inventory depletes to a certain level.
If you trigger an action but it does not meaningfully impact the state of your system, the policy will become confused and not understand the impact of it's choices.