When executing an experiment in Pathmind, you will notice four lines moving in parallel.
These lines represent how well a policy learned to achieve the goals defined in your reward function.
Pathmind leverages a techinque called Population Based Training to automate the hyperparameter tuning process.
When training starts, Pathmind will launch four parallel reinforcement learning "trials" corresponding to the four lines shown above. A "trial" essentially represents the process of teaching a reinforcement learning policy the best sequence of actions that lead to your desired outcome (i.e. what you defined in your reward function).
Why are there four trials?
In the machine learning world, the quality of training depends largely on your hyperparameter selection. Finding the right hyperparameters is a time-consuming process of trial and error. Pathmind automates it for you.
- Each trial is randomly initialized with different hyperparameters (e.g. learning rate, batch sizes, gamma, etc.) to maximize the chances of discovering the best performing combination.
- Periodically, each trial will look at the other trials' results and automatically inherit hyperparameters that work well, throwing away hyperparameters that do not. This means that hyperparameters are changed on the fly (i.e. midway through training).
Once training concludes, you will be presented with the best possible policy given your reward function.