Please note that a Pathmind Professional subscription is required to use action masking.

In many situations, an agent is allowed to select actions that are impossible at particular moments in time. For example, in the case of a manufacturing line, a reinforcement learning policy can direct a machine to begin processing the next product even though the machine is currently occupied. Over time, a policy should learn to avoid these "invalid" actions but this makes learning confusing and inefficient.

To avoid this issue, Pathmind allows you to apply an action mask which tells the policy whether or not a selected action is allowed or disallowed at any given moment in time. By ignoring actions that are "invalid", the policy can better mask out noisy and useless information. Take a look at this article for additional information about the motivation for using action masks.

Action masking is supported for single actions (@Discrete where size = 1) and tuple actions (@Discrete where size > 1) only. If you need action masking on continuous actions, please contact Pathmind support.

Step 1 - Open the Pathmind Helper properties and locate the Action Masks field.

Note: {true, false} is a static placeholder for demonstration purposes. You must replace this with a function that constructs the mask each time Pathmind is triggered. This is explained in Step 2.

Step 2 - Construct your action mask.

Whenever Pathmind is triggered, you must return a boolean array (boolean[]) in which each element in the array corresponds to the action in question.

  • True means do not mask because the action is valid.

  • False means mask the action because the action is invalid.

For example, you can pass the boolean array by calling a function each time Pathmind is triggered.

Step 3 - Audit your action mask.

Turn on debug mode and inspect the console output to confirm that the action mask is working as intended.

In the example above, the policy is allowed two actions: 0 and 1. Within the action mask array, index 0 corresponds to action 0 and index 1 correspond to action 1.

Step 4 - Query the trained policy.

Once you have obtained a policy from Pathmind, you must manually add the action mask to the front of observations to query the policy.

The action mask should be placed at the beginning of observations as shown above.

Masking Tuple Actions

You can apply the mask to tuple actions by simply appending the mask for each action. Below is an example for clarity.

Did this answer your question?