Churn Prevention with Reinforcement Learning

Creating a churn propensity model is now pretty standard for data scientists. Today churn is the most common data science problem in the world, because every company wants recurring revenue. But how do you go from a churn model to churn prevention? It is much harder than it sounds.

The Hazards of Model Based Churn Prevention

Suppose you have a machine learning model that can predict churn. This is commonly known as a churn propensity model. How do you prevent churn? It seems obvious: Use the model to predict who is going to churn, and then do something to encourage them to stay. Interventions encouraging customers to stay include giving them discounts and free training programs. But if you take this approach there are going to be problems: The churn model doesn’t know about your interventions, so it will become confused. This is technically known as introducing confounders.

If you target churn interventions at the most at-risk customers, the next time you fit your model the customers who were targeted will appear to be less at risk. The targeted customers will have a lower churn rate because of your interventions. The next time you fit the churn model, the model will see a lower churn rate for the targeted customers. That churn rate will become part of the predictions in the future. This will affect will be disproportionately in the most at risk customers (because those are the ones you targeted) and the model will lose accuracy in identifying the most at risk customers.

This undesirable situation is summarized in the diagram below.

Churn Model Based Interventions Confounds the Model — Targeting Churn Interventions using the Model Churn Risk Score Creates Confounders in the Model

Pathological Churn Confounders

That’s only the beginning of the problems: Consider a more pathological situation: Suppose a customer behavior predicts churn, and that same customer behavior also predicts a positive response to the intervention.

The customers who have the risk raising behavior that also predicts responsiveness to your intervention may be predicted to be low risk. But the low risk prediction is only because of the intervention. The behavior that should be predicting high churn risk will predict low churn risk, but only because of the interventions.
For example, suppose some customers are less affluent and your metrics show they have fewer in-app purchases. Those customers are also at high churn risk. If you intervene by giving them a discount, they are more likely to accept the discount and not churn. Next time you fit your model it will predict that low in-app purchases is associated with not churning, but it is only the case because you are in the habit of giving those customers discounts.

Behavior Based Churn Interventions

A better approach is to use behavior features from your churn model to target interventions. This is the approach advocated in the book, Fighting Churn With Data. This approach has several advantages:

Choosing churn interventions based on specific risky behaviors is more intuitive than choosing interventions based on churn risk alone. It is easy to pick an appropriate intervention. For example:
- If a customer does not use key product features, send them emails promoting those features.
- If a customer doesn’t use the product enough to get a good value, recommend that they switch their plan to a lower tier. (Having a tiered pricing plan is better than giving out discounts because it does not undermine your pricing!)
Choosing churn interventions based on behavior leads to less pathological confounding of the churn model. True, your churn rate will become lower because your are making interventions. Your model will learn the new lower churn rate with time. But because interventions are targeted based on broad customer behaviors there will not be pathological confounders in the model.

Targeting Low Risk Customers

Note that in this approach you don’t want to just target customers who are at high risk overall. For example, target all customers who do not use a product feature including those who may be at low risk due to other behaviors. If a customer is not using a good product feature, its worth it for them to learn even if they aren’t at risk. If the interventions are costly then you may still have to ration them to the most at risk customers. But to the extent that you do not target customers based on the model risk score your model is less vulnerable to pathological confounding.

Churn Interventions using Behavioral Metrics are not Subject to Confounding, but miss out on the full power of the model.

Disadvantages of Behavioral Targeting

The disadvantage of this approach is that you are not using the full power of your machine learning model. The model is used to tell you which behaviors lead to churn. But the model churn predictions are discarded (or at best, used for model verification.)

Still, behavior based churn intervention targeting is the best approach for smaller organizations and those that do not have advanced analytics skills in house.

Churn Prevention With Reinforcement Learning

There is a way to use a churn model to prevent churn directly: It requires making the data about your interventions “first class” components of your model. You must make data about your churn interventions a key component of your churn model by adding intervention metrics as features. For example:

If you send emails promoting product features, make the open rate for those emails a model feature (metric). Make separate features for every type of promotion.
If you recommended to a customer that they change their tier, make a model feature representing your past offers to the customer and how they responded.
If you offer customers discounts, make features representing what discounts the customer has and how long they have been receiving discounts.

The emphasis of your churn model must shift:

You are no longer predicting churn based on customer behavior.
You are predicting the response to your interventions, given what you know about customer behavior. Customer behavior is still important and provides important contextual information for modeling the effect of interventions.

Your modeling framework also needs a new component: Reinforcement Learning.

What Reinforcement Learning Does

Reinforcement learning is an algorithm that chooses the best intervention for each customer. Reinforcement learning also guides the discovery of the best interventions for each customer. There is a tradeoff between giving the customer the best intervention possible, and ensuring adequate sampling of the interventions. Adequate sampling of the interventions allows the model to learn what works best. This requires some random and possibly sub-optimal actions. In the reinforcement learning literature, this is known as the tradeoff between exploitation and exploration. Such a system is illustrated in the diagram below.

Churn Reinforcement Model Interventions — Using a Churn model as a component of a reinforcement learning system captures the full power of the model without introducing confounding effects.

The downside of model based churn prevention with reinforcement learning is complexity. This is more than a single data scientist can usually handle on their own. Don’t forget: You don’t just need to make a churn model. To make the reinforcement learning use case work you need a reliable data pipeline that will pass data about the result of interventions constantly. You will need to refit your model at a much more consistent pace – at least once a week, and probably every day. The stakes are much higher than with a standard churn risk model: If your model has issues it translates directly into poor intervention with your customers. Thats why choosing churn interventions with customer behavioral metrics (described above) is much safer for smaller organizations.

Reinforcement Learning at OfferFit

This is why I am excited to be part of the OfferFit team as the Director of Machine Learning Implementation: OfferFit, has created a Reinforcement Learning platform for churn and all stages of the customer lifecycle. OfferFit’s platform is a turnkey solution for one of the most challenging problems in churn management: churn prevention that uses the full power of Machine Learning. And of course reinforcement learning is super effective for the upside of the customer lifecycle: Upsell and Cross-sell! I’ll write more about churn management with reinforcement learning in future posts.