Decoding Data Sparsity for MMM Modelling

3 min readMar 23, 2022

You are trying to fit a regression model. In the data, you introduce additional data which has value in independent variables but the dependent value is 0. What do you think will happen to the prediction curve. It will get pulled down, isn’t it? Think about it. The objective of the loss function is to minimize the loss and when you introduce a lot of 0s observations in the dependent variable, the loss will only be minimized when the prediction is close to 0 hence the model will shrink the coefficients of all the independent variable to minimize the losses due to 0.

Now you might be thinking! This is a completely hypothetical case, why was even there a need to add the 0s observation to the data. This is the real case scenario and a very popular case in MMM modeling as well. How? When you market to multiple stores, not all of them get converted. The majority of them will not start selling your goods because you approached them. This is how the world of marketing works. This begs the need to understand the data sparsity.

Let’s walk through it with some examples,

Let’s say there is a chocolate manufacturer Chocolite and to sell their chocolates their sales team reaches out to multiple distributors across the country. Now the question they wanted to understand is the ROI (Return on Investment) of sales calls. When we say ROI, we also need to specify at what level — Geographical Granularity X Time Dimension.

Now when you will start modeling, you will have 3 options. You can model it at the Store level, State level, or National level. The modeling at a more granular level means that you will get more granular insights but at the same time, there will be challenges in the modeling due to data sparsity. What do I mean by challenges, when you will create the regression model there will be a lot of stores that were reached out by your sales rep but they didn’t convert or are contributing a very low volume of sales. WiLL YOU INCLUDE SUCH STORES IN YOUR MODELING? If you decide to exclude some stores, what will be your basis to prove your point.

Here comes the role of understanding data sparsity but how? What will be the mathematical way of representing it, let’s go through some of the ways—

Decile Analysis — Decile analysis is a very popular concept. we divide the data into 10 buckets and then analyze the data, we first sort the data and then divide it into these buckets. Let’s understand with an example —

Decile Analysis — Min #Months data available for stores

I have created a Decile Analysis for #Months of Data available to the number of stores. Observations — You can see that in 4 deciles (40% of the stores) we have no sales, for 20% of the stores we only have sales for less than 6 months of Data when we are modeling for 2 years of data.

Methodology —

We follow a simple methodology to create decile analysis, we computed #months data available for each store, then we sort the data on that. Now each 10% of the stores becomes part of each decile in the mentioned order, hence we get the decile analysis.

This method of checking the data sparsity is simple yet effective. Now we need to take this decision, there are generally two options in such cases -

Exclude the 0 sales stores
Cluster the stores, model the high sales separately at a granular level, and for low sales stores, combine them and do modeling at the national level.

2. Number of 0 — One simple solution to gauge the sparsity is to simply see the number of stores/months having 0s.

Decoding Data Sparsity for MMM Modelling

Written by Rhydham Gupta