Churn Prediction (in Retail)– Approach in a real project

Rhydham Gupta
5 min readOct 30, 2022

--

In simple terms, Churn Prediction means predicting the customers who will stop purchasing in near future. But why do we need it?

Say we own a grocery store named ATmart, and we’ve posted solid growth of 10% for the past 2 years. But current year estimates project a negative ~-1% growth at the current pace. What should you do?

There can be 2 strategies, 1) Increase new customers and 2)Increase the retention rate of old customers. Which strategy should be our priority? Customer acquisition is often the most expensive step in the customer journey so let’s focus on the second strategy, The first step to increasing the retention rate is to identify the customers who will stop buying in the coming months. This is what we refer to as Churn Prediction. Let’s see it in action.

In ATmart, from churn analysis, we have got a list of customers who are at risk of leaving. We further segment them into 3 groups and send them the following offers —
Group1- Additional 5% discount next month,
Group2-Gift Coupon on min purchase of 2K,
Group3-Get a 10% discount on combo packs.
Hopefully! This should put us back on the path of growth.

Since we have already seen the importance, let’s see how to do Churn Prediction Analysis the right way.

Steps-

  1. Defining the modeling construct
  2. Input dataset with dependent and independent variables
  3. Choosing the right classification model
  4. Evaluation

Defining the modeling construct

This is a very important step in the churn prediction model. This part is highly subjective and I have seen the average data scientist rush through it but a good data scientist spends a good amount of time on this step. This step basically involves making decisions. Let’s understand it in more detail.

For our ATmart, Few questions that we would like to answer —

  1. How do we define an eligible customer for our case?
    For e.g. these could be customers who have transacted in the
    current month
    or the past 2 months
    or past 1 year or
    maybe a more complex condition like someone who has done a transaction in at least 3 months in past 1 year. You get the point.
  2. What is the definition of churn that we want to predict?
    For e.g. a churned customer can be one
    who will not do any transaction in the next 1 month
    or who will not do any transaction in the next 2 months etc

Now the common approach to getting answers to these questions is to ask business and dig deep into the data. So for ATmart, we contacted the store manager and they said that usually if a customer does not make any transaction in 2 months then chances of him/her coming back is very low. So our churn customer becomes the one who does not do any transaction in 2 months.

Secondly, for the definition of eligible customers, we will need to go into the data but what to look for? So understand this point very carefully. Our objective is to learn the characteristics of customers who have left us in the past. So if a churned customer starts showing change in buying pattern from 3 months back, so our definition of customer can be someone who has had atleast 1 transaction in the last 3 months. Note that we finalize it using detailed analysis that I will cover in another article. Hurray! we got the answers, let’s proceed to the next step.

Input dataset with dependent and independent variables

Well if you are confused as to why we are talking about so many definitions then let’s connect the dot in this step. Any model needs dependent and independent variables to go into modeling but we only have a transactional dataset. How do we get there —

Note — We have Data till Oct’22 and we are trying to predict the churn for Nov-Dec’22

Let’s begin with the prediction dataset. So for each customer, we need an output that gives us churn probability. So one thing is clear data has to be aggregated at a customer level. Here’s how the data will look like

Since we only have data till Oct’22, we will aggregate the independent variables for Aug-Oct’22, feed it into the model, and will the churn predictions for the next two months.

Now you might be wondering how to create a training dataset. Let’s see it

Dependent Variable — We create a flag using the 2 months churn condition. Since we have data till Oct’22 so we will use the last 2 months to create dependent flag. If any eligible customer(who has done at least 1 transaction in the preceding 3-month period ‘June-Aug’22') has not done any transaction in Sep-Oct’22 will be a churned customer hence we will put 1 corresponding to those customers and 0 for other customers.

Independent Variables — Now we can have multiple independent variables but note that any aggregation should only be for a 3-month period which is June-Aug’22. Some examples could be the Total transaction amount, #unique products purchased, Frequency of purchase, Recency, Payment Preference, etc. This exercise should be exhaustive and you should try to include all the variables that you think can be the potential indicators of customers which churned.

Choosing the right classification model

We have already done the heavy lifting. Time to choose a model for our training dataset. In my experience, tree-based algorithms such as Random Forest, and XGBoost work well for most cases. As they provide good accuracy and interpretability.

Sometimes, the event rate (Proportion of churn customers in training dataset) becomes very small so you often have to use techniques to handle the imbalance dataset.

Evaluation

As a general practice, we use cross-validation to ensure that model is not overfitting. For evaluation, we can use different metrics like Precision, Recall, F1-score, ROC Curve, etc.

Another very popular technique for validating churn models is using decile analysis, Gain Charts, and lift Charts. The idea is simply that if I pick up 10% of the customers at random then the probability of picking up the churned customers will be the same as 10% of the total churned customers. Conversely, if we select the 10% of the customers' based on the descending order of the predicted probability, then what % of total churn customers I am able to capture? Like with 10% of customers I am able to capture 30% of the total churn customers hence I get a positive lift.

Hope you enjoyed this article! Check out my blog for more such articles.

--

--

Rhydham Gupta

I am a Data Scientist, I believe that observing and decoding data is an art. Same Data, Different Eyes Different Stories