Churn Simulation Model Overview
Data Science Fighting Churn Machine Learning

Churn Simulation Advances

This post introduces the new advanced customer churn simulation for learning the techniques of Fighting Churn with data (FCWD). The book was originally released with a simple simulation. The simple simulation served a purpose, but was lacking: Some of the code from Fighting Churn with Data either would not run or produced meaningless results on the simple simulation. But the recently advanced churn simulation solves this. Now, all of the code in Fighting Churn With Data can be studied, tested and practiced using the new simulation. Table 1 lists the new features of the simulation:

Advanced Simulation (New!)Simple Simulation (Classic)
5 different product plans with different prices and limits1 product plan with one price and no limits.
Discounts for some customers.No discounts.
Up to 100 users per customer accountSingle user customer Accounts only.
Monthly, Bi-Annual and Annual payment optionsMonthly Payments Only
25 different user actions8 Different user Actions
2 user actions with $ values to analyzeNo valued user actions
All FCWD code runs and gives meaningful results.Not all FCWD code runs.
Table 1: Features of the advanced churn simulation

This means two things for users:

  1. You can practice all the techniques in Fighting Churn With Data.
  2. The data is challenging enough to test machine learning models

All of the simulation code is available open source on github. And you can customize the simulation to generate unique data sets. Customizing the simulation code will be the subject for a future post. For more information on the advanced churn simulation check out the ChurnSim White Paper hosted on SSRN (the Social Science Research Network.)

Churn Simulation Overview

Figure 1 summarizes the ChurnSim Mode. Customers choose product plans and take actions. Customers take actions at a rate determined by a Log-Normal behavioral model. Customer actions provide positive or negative utility. This is utility in the economic sense, meaning usefulness or enjoyment. Next, the customer’s utility determines if they churn, or upgrade or downgrade. Finally, the observable data is written to a simple but realistic churn database.

Figure 1: Churn Sim Model Overivew

Realistic Churn Rate Simulation

A realistic set of churn rates is one result of the new simulation, as shown in the figure below. Figure 2 shows a realistic simulation of churn rates for an enterprise SaaS company: Firstly, the MRR churn rate is higher than the logo (count based) churn rate. Secondly, the Net retention rate is greater than 100%. Net retention greater than 100% is sometimes called “negative churn”.

Churn Simulation Realistic Churn Rates
Figure 2: Churn Simulation Realistic Churn Rates

The realistic relationship between churn rates result from careful construction of the simulation. The simulation balances the utility that customers receive with the price they pay for accounts with different numbers of users. To learn more about churn rate calculation, check out this post about churn rate calculation with SQL.

Realistic Behavior Model in a Churn Simulation

The Log-Normal behavior rate model makes the simulated behavior frequency realistic. Figure 3 shows that the frequency has a long tail: A few simulated customers have rates far above the mean. Further, the log-log plot (right) shows that the frequency of high rates follows a power law.

Figure 3: Long tail / power law distribution of customer behaviors

More and more actions by the customer has no impact on the churn rate. Figure 4 shows the diminishing returns to customer actions: A few actions per month leads to a big churn rate reduction. However, after a certain point, additional actions have no impact. This diminishing returns is a core part of the simulation model for customer utility. For more information, check out this post on customer behavioral analysis.

Figure 4: Diminishing Churn returns in churn reduction

Realistic Machine Learning Results

One interesting use of the simulation is analyzing variability in the results of machine learning models. Figure 5 shows the results of SHAP analysis for an XGBoost model trained on three different instantiations of the data set. The top three most important features are the same for every model. However, after the top three features there is a lot of variability in the ranking. To learn more, read this post on explainable machine learning with SHAP analysis.

Figure 5: Results of SHAP analysis for an XGBoost model trained on three different instantiations of the data set.

Learn More!

In conclusion, the new simulation has a lot of great feature realistic details. By using the new simulation students and machine learning researchers can learn a lot about customer churn. For more information, check out the ChurnSim White Paper hosted on SSRN (the Social Science Research Network.) You can also download the simulation code from github. Lastly, if you want to setup and run the churn simulation yourself watch the demonstration video: