This feature is currently in private beta. Please reach out to us using the chat widget at the bottom right to get early access.
In many situations, an agent is allowed to select actions that are impossible at particular moments in time.
For example, in the case of a manufacturing line, a reinforcement learning policy is allowed to direct a machine to begin processing the next product even though the machine is currently occupied. Over time, a policy should learn to avoid these "invalid" actions but this makes learning difficult and confusing for the policy.
To avoid this issue, Pathmind allows you to apply an "action mask" which basically tells the policy whether or not a selected action is allowed or disallowed at any given moment in time. By ignoring actions that are "invalid", the policy can better mask out noisy and useless information.