Please note that a Pathmind Professional subscription is required to use action masking.

In many situations, an agent is allowed to select actions that are impossible at particular moments in time. For example, in the case of a manufacturing line, a reinforcement learning policy can direct a machine to begin processing the next product even though the machine is currently occupied. Over time, a policy should learn to avoid these "invalid" actions but this makes learning confusing and inefficient.

To avoid this issue, Pathmind allows you to apply an action mask which tells the policy whether or not a selected action is allowed or disallowed at any given moment in time. By ignoring actions that are "invalid", the policy can better mask out noisy and useless information. Please take a look at this article for additional information about the motivation for using action masks.

Action masking only supports single discrete actions (@Discrete where size = 1 only) at the moment. We are currently building support for action masking with tuple actions (@Discrete where size > 1) .

Step 1 - Open the Pathmind Helper properties and locate the Action Masks field.

Note: {true, false} is a static placeholder for demonstration purposes. You must replace this with a function that constructs the mask each time Pathmind is triggered. This is explained in Step 2.

Step 2 - Construct your action mask.

Whenever Pathmind is triggered, you must return a boolean array (boolean[]) in which each element in the array corresponds to the action in question. False means mask the action because the action is invalid whereas True means do not mask because the action is valid.

For example, you can pass the boolean array by calling a function each time Pathmind is triggered.

Step 3 - Audit your action mask.

Turn on debug mode and inspect the console output to confirm that the action mask is working as intended.

In the example above, the policy is allowed two actions: 0 and 1. Within the action mask array, index 0 corresponds to action 0 and index 1 correspond to action 1.

Step 4 - Query the trained policy.

Once you have obtained a policy from Pathmind, you must add the action mask to the front of observations to query the policy.

The action mask should be placed at the beginning of observations as shown above.

Did this answer your question?