Clustering — A New Way for MMM Modelling

Rhydham Gupta
6 min readMay 17, 2022

Clustering is very important in data science. Even before data science became popular in the Industry, the business decisions used to be made basis the clustering.

  • Launch of a new product — You want to segment and define your target audience,
  • Launch a marketing campaign — Identify the segment that you want to address and make the content accordingly,
  • Profits Declining — Identify product categories that are leading to a drop in profits, etc.

Whenever we talk about clustering, the first solution that comes to mind is K-Means clustering. Very rightly it is a very powerful technique and simple to interpret. But in this article, I want to touch upon the —

  1. What is the Use of Clustering in MMM Modelling
  2. Why K-Means might not suit our clustering requirements
  3. What is the possible solution

Now, let’s start with our example of the FoodChocolate company that is doing MMM Modelling to understand and optimize the impact of various marketing channels.

Now firstly, what is the role of the Clustering in MMM Modelling?

We have store-level data of the FoodChocolate company. We have sales data and marketing data (Numbers of sales calls made to a store, banner marketing at the store level, Emails sent to each store, etc.) at the store level. Now when you want to fit the model you have various options —

  • You can fit a regression model at the Store Level
  • You can aggregate the data at the DMA/State level and fit the model at that level
  • You can fit the regression model at the National Level (only data points would be different dates)

The last option of fitting the National model seems most convenient but it has its own set of challenges. Firstly, you have very less data points and if you are fitting 10–20 variables your model is 100% going to overfit. Moreover, by aggregating the data you have averaged out a lot of signals, For e.g if there is a store1 that gets more influenced by Sales Calls while store2 gets more influenced by Emails, by aggregating it you lose the power to distinguish between the two. You will get the same coefficient for all the stores which might not serve our purpose.

On the other hand, if you try to fit the model at the store level, the data sparsity will be very high. which will make it very difficult for you to fit the model.

Doing it at DMA/State level would be a better option but still, there are high chances that a store in a particular state will respond completely different to other stores in the same state.

Understand this point —

Clustering is done to group together homogeneous stores that will most likely behave similarly to the marketing.

Why K-Means might not suit our clustering requirements?

  1. How to decide which variables to include in clustering? In K-means clustering that decision will be completely based on personal judgement.
  2. In K-means if there are some co-related variables then it makes the clusters highly biased because their dimension would get more weightage in deciding the clusters
  3. In MMM Modelling, even the dependent variable sales have two dimensions, one is overall sales and the other is time series (Corresponding to each week/month what are sales), in MMM Modelling, we are modelling the time series so some similarity is too required in the sales pattern of the store to be able to group them together. There is no way to incorporate the time series data in the K-means clustering.

So, because of the above-mentioned reasons, it is not a good idea to use the K-means clustering in the case of MMM Modelling.

What’s the solution

Now, this approach is as traditional as it can get and as complex as it can get. So you will need to pay a little bit more attention to understand this.

So, let me first share the framework of the approach with you.

The framework of the Clustering Approach

Step-1 Making the Clusters on the Homogeneous dimensions.

Remember, that one of the shortcomings of the K-means clustering is that if there are multiple correlated variables, they would get more say in the clustering. But we solve this problem by running K-means on individual dimensions (most of the time highly correlated variables),

Let’s say in our FoodChocolate data, we mainly identified three pillars of the data that are important for clustering.

  • Sales Dimension — Total Sales ($), Total units sold, Chocolate Sales, Wafers Sales, Other Sales
  • Marketing Dimension — Calls Data, Emails, Messages, Facebook Posts
  • Similar growth Trend — Now we want to add another dimension to our clustering, stores that have shown similar growth in sales are more likely to show similar responsiveness to the marketing. Hence we use Time-Series clustering for this dimension. DTW clustering is one technique that can be used.

Now let’s assume for the moment that for all the dimensions we get 3 clusters each. So Maximum Clusters that we can get = 3*3*3 = 27 Clusters. Now for any analysis, these many clusters are very difficult to comprehend. We need a way to combine some of these clusters, BUT HOW?

Step-2 Defining the criteria of Homogeneity to combine these clusters

On which dimension, these clusters should be clubbed. Now coming to the original idea of doing clustering in MMM Modelling,

Stores will similar marketing and similar sales are highly likely to show similar responsiveness to the marketing.

So, the next question is how to combine Sales and Marketing to define the homogeneity criteria, Take the dependent variable (Sales) and most important marketing (Calls, Emails). Apply Principal component analysis to it and select the PC1 (Principal Component 1) as the homogeneity criteria. In another way, now our clusters should be able to explain the difference in the means of the PC1 with confidence, HOW? Significance Testing

Step-3 Using Hypothesis testing to combine clusters

Now, fundamentally, we have to take each pair of the cluster and check if the sample means of the homogeneity series are significantly different. This can be done using ANOVA and the p-value <0.05 should be selected and others should be clubbed.

Now thinking it from the perspective of Coding, if you see you have 27 clusters and the total pairs would be ²⁷C₂ = 351, another option is we can do it in a loop, —

The first step is how to assign the priority order of the dimensions, we run the ANOVA test and based on the descending value of the F-statistic we rank the dimensions.

Now take one dimension at a time, apply pairwise ANOVA (TukeyHSD)on the clusters, and club the clusters that do not provide any significant difference in the PC1. In the next iteration, simply combine the clusters with another dimension to create new clusters, apply ANOVA and club it, and repeat the same process for all the dimension clusters. Below is the Example —

Combining Cluster Process

Finally, after combining you will get the final clusters.

Hope you find this approach Interesting!

--

--

Rhydham Gupta

I am a Data Scientist, I believe that observing and decoding data is an art. Same Data, Different Eyes Different Stories