Prerequisites

  1. Complete the Warehouse tutorial.
  2. Download the tutorial files. 

Simulation Overview

This model presents a simple coffee shop operation. Throughout the day, customers enter the shop and place an order. If the order is received within a specific time limit, the customer collects their order and pays the bill. They will then either grab a chair or exit immediately, but both scenarios represent a positive customer experience. On the other hand, customers will only wait so much time to receive an order. Those who face too long of a delay will exit the shop angrily and have a negative customer experience.

In addition to managing orders, servers must also make sure that the kitchen area stays clean as the day progresses. The simulation looks to find the balance between keeping customers happy and kitchen cleanliness under control.

The model contains three agents: the server, customers, and chairs. Within each agent are additional functional elements. The server agent, for example, contains a State Chart with all the possible states the server can be in, including takingOrder and cleaning.

Within the customers agent, a balkTimer Event sets the timeout for how long a customer is willing to wait before giving up on an order and leaving.

Customer arrivals are determined by the arrivalSched Schedule.

A PedSource defines the customer entrance point. From there, PedGoTo and PedService elements make up the possible customer movements through the coffee shop. Customer who receive their orders within the time limit move to the custSuccessExit sink. Those who reach the timeout determined by the balkTimer event will leave via the custFailExit sink.

Tutorial

Step 1 - Perform a run with random actions to check Pathmind Helper setup.

Go through the steps of the Check Pathmind Helper Setup guide to make sure that everything is working correctly. Completing this step will also demonstrate how the model performs using random actions instead of a policy.

Step 2 - Examine the Pathmind Properties

Observations - Observations contains five observations used in this model: size of the order queue, size of the collect order queue, size of the bill pay queue, kitchen cleanliness level, and time.

Reward Variables - The reward variables in this model track kitchen cleanliness, successful customer exits, balked customers, and average service time.

Actions - This model contains one decision point with four possible actions.

The doAction function referenced in this field defines the four possible actions that a server can take.

Done - This simulation is set to run for 8 hours.

Event Trigger - This model's event trigger is conditional and will only trigger an action when a barista is in the idle state.

Step 3 - Export model and get Pathmind policy.

Complete the steps in the Exporting Models and Training guide to export your model, complete training, and download the Pathmind policy.

Reward Function -

reward += after.kitchenCleanlinessLevel - before.kitchenCleanlinessLevel; // Maximize kitchen cleanliness
reward += after.successfulCustomers - before.successfulCustomers; // Maximize successful exits
reward -= after.balkedCustomers - before.balkedCustomers; // Minimize balked customers
reward -= after.avgServiceTime - before.avgServiceTime; // Minimize average service time

A policy will be generated after training completes*. A trained policy file is included in the tutorial folder.

* Please note that the coffee shop simulation requires 12 hours to train because AnyLogic's Pedestrian library causes the model to execute slowly. Keep this in mind if you plan to use the Pedestrian library.

Step 5 - Run the simulation with the Pathmind policy.

Once you’ve downloaded the Pathmind policy, return to AnyLogic. Open the Pathmind Helper properties and change the "Mode" radio button to Use Policy. Click Browse and locate the downloaded policy file.

Now run the model again using the included Monte Carlo experiment. Observe the decline in balked customers and the increase in successful exits now that a Pathmind policy is in place.

Now compare the policy results to random actions to measure the difference in balked customers and successful exits.

Conclusion

As you can see, the trained policy resulted in happier customers and higher profits, demonstrating just how powerful reinforcement learning can be for companies looking to improve operations. Beyond a coffee shop, similar models for retail stores, ticket counters, and nearly any customer-service environment can be dramatically improved with reinforcement learning.

Did this answer your question?