Using T-Test in the real Data Science project
Statistics is intimidating to many data scientists mainly because there is too much emphasis on understanding the maths while there is little focus on understanding its real value in the projects. But in this article, we will try to understand the real scenario’s where hypothesis testing becomes very important.
Let us say that there is a big cosmetic brand, BeautyAll which sells various products through multiple outlets across the country. Now the management is planning to implement a new recommendation software that gives combo offers to the customers. But before the final roll-out, they really want to test the effectiveness of the system, so let’s help them!
So out of a total of 1200 stores, we picked up two samples of 10 stores each. In one sample, we implemented the recommendations while other sample outlets were doing business in a regular way. After a month, we compared the average sales of each group and found that the average is 8.8% higher for outlets implementing the recommendations.
Before you become too happy and purpose to the company about adopting a new system let’s ask about confidence in the results from statistics.
You may ask that it is very clear that the average is higher, then why bring in statistics? The issue is that we have these numbers from the sample not all of the stores, so we are trying to make the decision about the entire stores' based on this sample. Now let’s go back to how we select samples — randomly, so there are always chances that we took some stores with high sales in our test group (Used Recommendations) but we need an answer, right. so hypothesis testing is what comes to our rescue.
Statisticians have already done hard lifting for us and we simply need to use a t-test to answer this question. Let’s look at the results from the t-test.
Just focus on the P(T≤t)(two-tail), and remember that the p-value is 0.7.
In simple terms, it indicates that basis on the variance in the data there is a 71% probability that this difference could happen. Now if you are confused there are a few things we need to know about hypothesis testing.
Intuition —
Let’s say that instead of 1 sample of 10 stores if we have picked 10 different samples would the mean be the same for all the samples, No but
According to statistics (central limit theorem), it will follow a normal distribution.
Whole hypothesis testing, rest on this assumption. Generally, people tell you that at the significance level of 95% if the p-value is less than 0.05 we can reject the null hypothesis. But why?
Since we know that sample means will follow a normal distribution, the area under the curve is the probability. when the p-value is less than 0.05, it indicates that getting such a mean or more extreme mean has a probability equal to the p-value. Now at a very small p-value, we can say that the following number has a very low probability of belonging to the null hypothesis distribution. so we can reject the null hypothesis.
Hypothesis Testing —
In hypothesis testing, we define null hypothesis(ho) and alternate hypothesis(ha). Very Important — Basis the sample, we can only reject the null hypothesis or not reject the null hypothesis but we never accept the null hypothesis.
Generally, the claim we are trying to test becomes the alternative hypothesis, In this case,
Null Hypothesis — The average sales are similar with and without a recommendation system
Alternative Hypothesis — The average sales are different in stores with a recommendation system
Now, since the p-value is 0.71 this means that getting this much difference or more difference due to sampling has a probability of 71%, hence we cannot reject the null hypothesis. This means that we cannot trust that the 8.8% lift in average sales is due to the recommendation system, it is highly probable due to the sampling variance.
So you will ask me, what now, should I purpose to the management that the recommendation system is not effective. Definitely Not!
We only reject the null hypothesis, we don’t accept it!
It simply means that current evidence is not enough to conclude that the difference in the sales is due to the recommendation system, so what’s the solution. Get more evidence, How?
Take a bigger sample and test this hypothesis.