Combine multiple Broad Variables to Single- MMM Modelling Advanced

Rhydham Gupta
4 min readJun 15, 2022

--

In MMM Modelling, there are several broad level variables (variables which are not a store level) like —

  • Paid Search — Google, Bing, Firefox (Metrics — Impressions, Clicks)
  • Paid Social — Facebook, Twitter, Snapchat (Metrics — Impressions, Clicks)
  • Website (Metrics — Pageviews)
  • Digital Display Ads (Metrics — Impressions, Clicks)
  • Print Circulations (Metrics — Reads)
  • TV (Metrics- GRP)

SO WHAT’S THE ISSUE?

Now there are two major issues —

  1. Which metric to model (Impressions vs Clicks) — For each variable, we have to decide which metric to model, as opposed to old times we now have a better way to track the marketing campaigns, Earlier we could only estimate how many people have watched a banner Ad but nowadays online campaigns could give us more granular information like how many people have clicked the Ad and so on,
    Now #clicks is a better indicator of the level of engagement of the marketing channel and hence would be better able to explain the variations in sales due to this marketing channel. But the problem is it is an uncontrollable factor.
    You can only decide the number of advertisements/campaigns you want to show but have no control over clicks. But of course, there is a linear relationship between the two, more Ads will lead to more clicks.
  2. Multicollinearity — Now using these many variables in the model will surely lead to multi-collinearity hence the model will become very unstable. So there is a need to combine a few variables but we need to get different weights for that. Simply combining it would mean that we are saying that all variables have a similar impact which is not right.

So what’s the solution —

1. Represent the engagement metric in the form of controllable metrics -

Observe the following equations closely —

Website Pageviews = function(Paid Search Impressions, Paid Social Impressions, Display Ads Impressions, Print Circulations, TV)

Paid Search Clicks = function(Paid Search Impressions, Paid Social Impressions, Display Ads Impressions, Print Circulations, TV)

Paid Social Clicks = function(Paid Search Impressions, Paid Social Impressions, Display Ads Impressions, Print Circulations, TV)

Suppose we ran a paid social marketing campaign let’s say a Facebook video, what are the different factors that will impact the number of clicks on that post —

  • The number of Paid Social — Facebook post impressions, more spending means Facebook will show this Ad to more users.
  • The number of Paid Search — Google search impressions, there will have some indirect impact, more google ADs will make the user familiar with the brand hence increasing the chances of clicking the post.
  • Display ADs, TV, Print — Again all these media channels would result in more brand awareness hence more clicks on the Facebook post.

So we can model this linear relationship using techniques like SEM (Structural Equation modelling) (Think of it as similar to the linear regression model only)

Similarly, for all other engagement metrics, we can model in the form of the above equations —

Then based on these coefficients, we can derive the engagement activity like we get the following equation —

Paid Search Clicks = 0.03*(Paid Social Impressions) + 0.08*(TV GRPs) + 0.78(Paid Search Impressions) + 0.02*(Print Circulation)

Hence, we can calculate the updated engagement activity and use this for modelling.

2. Use factor loadings from PCA (Principal Component Analysis) to combine different variables-

Now let’s discuss how to combine the multiple variables into one.

We basically use PCA and take Principal Component 1 as the final engagement score that can be modelled, the only condition being that the PC1 is able to explain the maximum variance, the threshold being greater than 80%.

Now let’s understand the intuition behind it, let’s take the example of our Facebook post only, now generally for paid social we have data available at a demographic level like State/DMA, now the marketing strategy generally tends to spend in the proportion of output expected so impressions alone didn’t indicate the level of impact our post have and it will have low variation across demography.

But the #clicks can vary significantly from one area to another hence indicating that the Facebook post-campaign is more effective in that area than others. Hence in PCA, the impact of clicks will be more than the impressions.

Moreover, the clicks we will be using in the PCA will be the ones coming from the SEM equations that we have derived, hence we are indirectly capturing the impact of total impressions only as we only have control over the number of impressions we can get not the clicks.

3. After modelling the combined variable in the main model, segregate it in the secondary model

So, in the primary model, we will only test out the combined variable,

With that, we will get the final contribution of the combined variable, let’s say it contributes 7% of the total sales.

Now we will create the secondary model, now what is this secondary model,

It’s nothing but only like the primary model where the dependent is not the overall sales but the contribution from the combined variable.

And in independent we only put variables that were combined in PCA to create the combined variable,

This is how we solve this tricky challenge of multiple broad variables!

I know this might seem little bit unrelatable until you implement it on your own, but only try to keep the high level idea so that next time if you face a similar challenge you can purpose similar solution.

--

--

Rhydham Gupta
Rhydham Gupta

Written by Rhydham Gupta

I am a Data Scientist, I believe that observing and decoding data is an art. Same Data, Different Eyes Different Stories

No responses yet