Prerequisites

  1. Complete the Get Started tutorial.

  2. Download the tutorial files.

Introduction

Using Pathmind, you can easily adapt existing simulation models for deep reinforcement learning (RL) in order to make more informed decisions in the face of variability.

This tutorial will show you how to adapt a simple AnyLogic tutorial model, the Bass Diffusion Model, for RL with Pathmind. The steps below will walk you through connecting the model to an RL policy, training the policy to make intelligent decisions, and evaluating the RL policy’s performance compared to a baseline optimizer. The procedure followed here can be applied to AnyLogic models of varying complexity — from solving a simple puzzle to controlling the complex routing of trucks across northern Europe.

Model Overview

In 1969, Frank Bass developed a marketing model to describe product adoption within a population. The model simulates how members of the population transform from potential adopters to adopters of a product in one of two ways: (1) they hear an advertisement about the product from an ad agency; or (2) they are told about the product by other members of the population.

Since its inception, the Bass diffusion model has been used to describe the dynamics of marketing campaigns and to forecast demand. Here, you will use the Bass diffusion model in a different, more powerful way. You will apply RL to the model and train an intelligent policy that prescribes the optimal allocation of resources for the ad agency. The policy will make and adjust a key decision to meet a target number of adopters with minimal expenditures.

Building the Bass Diffusion Model

The Bass Diffusion Model is a differential equation that describes population product adoption. This problem can be built in a straightforward way in AnyLogic, and it is available as an introduction to the AnyLogic system dynamics library. The tutorial can be found in the AnyLogic help menu:

If you are unfamiliar with the Bass Diffusion model or to system dynamics in AnyLogic, we recommend working through steps 1-13 of the tutorial to build the model from the ground up. This process will take around 1 hour and will give you an intimate understanding of the model.

Alternatively, the final version of the model can be opened directly by clicking the Bass Diffusion – Phase 6 link at the bottom of Step 13 page:

Description of the Model

Below is a diagram of the AnyLogic model directly after Step 13 of the above tutorial:

The population will change from PotentialAdopters to Adopters at a rate determined by AdoptionFromAd and AdoptionFromWOM. The first rate-determining factors, AdoptionFromWOM, is fixed by the system’s parameters: ContactRate, AdoptionFraction and TotalPopulation. The other rate-determining factor, AdoptionFromAd, is dependent on decisions made by the ad agency: MonthlyExpenditures.

In the basic model, an optimizer can be run to determine the optimal value of MonthlyExpenditures to meet a constraint of at least 8000 Adopters while minimizing TotalExpenditures. The performance of the model when operating under the optimizer will be used as the baseline to evaluate the performance of the trained RL policy.

Preliminary Work on Adapting the Model

Before changing any simulation model to be controlled by RL, we must first ask the following questions:

Q1: What is the main decision to be made?

A1: Deciding on how much money the ad agency should spend on advertising and when they should run ads.

Q2: What information is necessary to inform the decision?

A2: The current amount of number of adopters and potential adopters as well as the time.

Q3: What metrics indicate whether the right decision was made?

A3: The ad agency’s total expenditures to date, the number of adopters and whether or not that number meets the goal of 80,000 adopters.

Q4: When should the decision be made or adjusted?

A4: Once per month.

The next section will demonstrate how the answers to the questions above determine how we fill out the Pathmind Helper.

Connect the Model to the Pathmind Helper

The Pathmind Helper is the interface between the existing simulation model and the Pathmind Policy to be trained. Below is a procedure for introducing the Pathmind Helper into the Bass Diffusion Model, then connecting the Helper to the model based on the answers to questions 1-4 in the previous section.

Importing the Pathmind Helper

Reference the instructions on downloading the Pathmind Helper. Drag and drop pathmindHelper from the Palette Menu to the main agent of the model.

Create a parameter of type Boolean to indicate if Pathmind is enabled. Set the default value to true.

In the pathmindHelper set the Enabled field to equal isPathmindEnabled.

Q1: What is the main decision to be made? – doAction()

A1: Deciding on how much money the ad agency should spend on advertising and when they should run ads.

As discussed in the previous section, each time Pathmind is triggered, the policy should decide:

  • How much to spend on advertisements in the coming month (MonthlyExpenditures)

  • Whether to switch off the advertisement or not (ToggleAds)

Create a function named doAction(). This function will take RL’s decisions as arguments. It will do two things:

  1. It will update MonthlyExpenditures

  2. If toggleAds equals 1, it will send a message to the statechart to switch off ads.

MonthlyExpenditures = expenditure;
if (toggleAds == 1)
statechart.fireEvent("StopAdvertising");

Note: Since the doAction() function switches advertisements on and off, you must put a guard on the timeout transition to only fire if Pathmind is not enabled.

Create a transition that is triggered by a particular message. Pathmind will use this method to switch off advertisements.

To avoid divide-by-zero errors, put conditions on the flows so that if their source stock is empty their out-flow is automatically set to 0.

Draw a link from PotentialAdopters to AdoptionRate and from Adopters to DiscardRate.

In the Properties tab, define the DiscardRate Flow object to be:

Adopters < 1 ? 0 : delay(AdoptionRate, ProductLifeTime)

Similarly, define the AdoptionRate to be:

PotentialAdopters < 1 ? 0 : AdoptionFromAd + AdoptionFromWOM

Within the Pathmind Helper Properties tab, copy and paste the following code snippet into the Actions field:

class Actions {
@Continuous(low = 0, high = 10000, shape = 1) double expenditure;
@Discrete(n = 2, size = 1) int toggleAds;
void doIt() { doAction(expenditure, toggleAds); };
}

Q2: What information is needed to make the decision? -- getObservation()

A2: The current amount of number of adopters and potential adopters as well as the time are all needed to make an informed decision

Define observations:

class Observations {
double potentials = PotentialAdopters;
double adopters = Adopters;
double month = getMonth();
}

Q3: What metrics indicate the right decision was made? -- getMetrics()

A3: The ad agency’s total expenditures to date, the number of adopters and whether or not that number meets the goal of 80,000 adopters.

Define Metrics:

class Metrics {
double totalExpenditures = TotalExpenditures;
double adopters = Adopters;
boolean hasMetGoal = Adopters > 80000 ;
}

Q4: When should the decision be made? -- Pathmind Event Trigger

A4: Once per month.

Monthly expenditures should be decided ahead of time at the beginning of each month. Within the Pathmind Helper, enable Pathmind to trigger actions once per month. Note that triggers should stop when advertisement gets switched off.

To make sure that MonthlyExpenditures is updated by Pathmind before monthlyEvent updates TotalExpenditures, put the first occurrence in monthlyEvent to be at 0.1 minutes.

Adjusting the Episode Parameters before Training

Increase maximum available memory from 64 to 128

In the Simulation Experiment properties set the simulation duration to 1.5 years to match optimizer’s horizon.

Training and Intelligent Policy

Export the model to Pathmind Web app. Below is a schematic showing the steps to export an adapted model for training in the web app. From the file menu, select new -> experiment. Then in the New Experiment popup window, select Reinforcement Learning, then click Finish. In the properties tab of the resulting experiment, select Export to Pathmind. The model will be uploaded to the Pathmind web app, where you can conduct your training experiments.

For more information on specifics of the Pathmind web app user interface, see this list of tutorials.

The reward function, which takes in user-defined metrics and blends them into a single reward value at each trigger step, is shown below.

Reward Function:

reward += after.hasMetGoal ? (80000 - before.adopters) : (after.adopters - before.adopters);
reward -= (after.totalExpenditures - before.totalExpenditures) *5;
reward += isDone(-1) && after.hasMetGoal ? 1000 : -500;

Copy and paste this reward function into the Pathmind web app and select Train Policy to kick off training. The training will take around 30 minutes for this model, and its progress can be tracked using the Learning Progress chart in the bottom of the Web app main page.

Evaluating the Trained Policy

Once training is complete, an Export Policy button will appear in the top right corner of the web app. This will download the policy as a zip file that can then be referenced in the Pathmind Helper back in AnyLogic.

To evaluate the performance of the policy, run the simulation several times with Pathmind enabled, and record the values of totalExpenditures, adopters whether the goal of 80,000 adopters was met. If you have AnyLogic Professional edition, you can run a Monte-Carlo experiment as an alternative to gather statistics on these three values.

Below is a summary of the policy’s performance compared to that of the built-in optimizer described in Step 13 of the Bass Diffusion AnyLogic tutorial.

id

Reward Function

Total Expenditure ($)

Adopters

Meeting the Goal

Improvement over Optimizer

Note

1

OptQuest

5,564

83,491

100.0%

duration of 1.5 years

2

reward += after.hasMetGoal ? (80000 - before.adaptors) : (after.adaptors - before.adaptors);

reward -= (after.totalExpenditures - before.totalExpenditures) *5;

reward += isDone(-1) && after.hasMetGoal ? 1000 : -500;

4,082

83,017

99.4%

26.64%

Trained in Pathmind

3

3,273

83,491

100.0%

41.18%

Trained longer

The Pathmind RL policy outperforms the optimizer by a margin of more than 25%.

This performance gain can be attributed to Pathmind’s ability to dynamically change the monthly expenditures and toggle ads on and off. In contrast, the traditional optimizer must choose a fixed value for the monthly expenditures, resulting in inefficiency. The Bass Diffusion Model is a dynamic problem that requires a dynamic decision variable in order to be solved. Static optimizers are unable to react and adapt to different scenarios within a dynamic system, and are thus not the ideal solution for sequential decision problems. RL on the other hand, can react, adapt and generalize to a variety of scenarios, prescribing the appropriate behavior to the model through user-defined decision points.

Extra Credit

At this point, it may seem that you have given up control over the scenario to an AI agent. However, while the policy can find clever ways to achieve the goal assigned it through the reward function, the user (you) remains in full control of the goals that are set for the policy! To convince yourself of your control over the intelligent policy, return to the Pathmind web app and adjust train several more experiments with alternate reward functions. In each of the new experiments, change the importance weights on the third term in the reward function

Once training has completed, evaluate each of the policies back in AnyLogic. Note how the behavior of the policy can be tuned by adjusting the reward function. This skill of reward shaping is central to training intelligent RL policies. More information on reward shaping can be found here.

Next Steps

1. View the Pathmind Helper help article.

2. Explore other tutorial models to see examples of how Pathmind can be applied in a wide range of projects and industries.

3. Visit the Knowledge Base for helpful guides on using the Pathmind Helper and web application.

Have any questions? Contacts us to learn more!

Did this answer your question?