In some cases, a Pathmind experiment may finish sooner than expected. Early stopping can occur for several reasons.
An experiment may end before training is complete if it exceeds Pathmind’s maximum thresholds:
Maximum training iterations - 500 iterations
Maximum training time - 12 hours
Maximum total episodes - 200,000 episodes
Models may need to be adjusted to train within these limits in rare situations.
Initial Learning Check
Training will automatically terminate if no learning has taken place by the 50 iteration mark. Trying a different reward function may fix the issue in some cases. Other instances may require testing different observations and metrics.
After 250 iterations, Pathmind will begin checking how much the mean reward changes over each iteration. If the mean reward does not change by more than 1% over 75 consecutive iterations, Pathmind will end training early and generate a policy. No substantial changes over that number of iterations is evidence that the best possible outcome has already been achieved given your reward function. Early stopping in this situation produces a policy while eliminating unnecessary training time.
Miscellaneous Stopping Conditions
In some cases, you may notice training end before reaching 250 iterations. This can happen for the following reasons:
Training had exceeded the 12 hour maximum. Please read this article to learn how to speed up training times.
The 200,000 total episode limit was reached. Simulations with extremely short episodes (e.g. less than 10 steps) will typically hit this limitation. If this limitation is problematic, try increasing the density of Pathmind triggers.