Using Binomial Distribution in the real scenario

Rhydham Gupta
4 min readAug 1, 2022

--

Binomial Distribution is a distribution in statistics that is difficult to relate to the real problems for data scientists. But in this article, we are first going to look at some properties related to binomial and then understand its actual usage intuitively with a use-case.

By definition, given the probability of success p, it represents the probability for x successes in n trials.

Assumptions for Binomial Distribution —

  • There are only 2 outcomes for each trial like yes/no, and heads/tail.
  • Each trial has the same probability of success
  • Each trial is independent of one another
Example of Binomial Distribution

In the above example, we have plotted the binomial distribution for the different numbers of successes in 10 trials with the probability of success for each trial equal to 0.3. Did you notice?

Unlike the normal distribution which is a continuous distribution, Binomial is a discrete distribution. On X-axis it can only have discrete values.

What do you think the shape of the binomial distribution looks like? Let’s see it with an example. We have created a sample scenario —

Binomial distribution with different success probability

The above graph answers our question of what is the shape of the binomial distribution.

At p = 0.5, Bell Shaped

At p < 0.5, Right Skewed

At p > 0.5, Left Skewed

Enough of theory, let’s discuss the actual use case.

Let’s say you are a headphone retailer and you recently introduced a new design headphone Aplhavoice on Amazon. Initially, you sold only a few samples and you found that out of 100 products sold,10 were returned. so the probability of return becomes 0.1 or 10%.

Now the retailer is interested in knowing how many returns he can expect if he sells 10,000 headphones.

Given this data only, what will be your best guess? 10% of 10,000 which is 1000. Right, but how much variation can he expect? What is the probability that total returns will be less than 1050 or 1100? The binomial distribution is there to answer our question. Let’s see how —

To compute the binomial probability, we are using the excel function BINOM.DIST().

Cumulative probability represents the total probability of returns less than x. For e.g. 1000 represents the probability of returns ≤1000 which will include all the cases from 1–1000 returns.

Next, we have calculated the probability in a certain range.

Did you observe something interesting? According to these results, the probability of total returns ≤ 1,060 is 98%. Hence, with 98% confidence, we can recommend to the retailer that even in the worst scenario maximum returns will not exceed 1060. Woah how does it feel after helping the retailer?

Don’t be confused about why this plot is not right skewed when p<0.5, remember that the original products sold were 10,000 but we are only plotting in the range of 1000.

Bonus!

There is another distribution known as the Poisson distribution which is one of the extreme cases of the binomial distribution. This is generally applicable in the scenarios when the n ~ large and p~small. Fraud detection is one of the cases suitable for Poisson distribution.

It is used to show how many times an event is likely to occur over a specified period. For e.g. if the average number of frauds in a month is 1000, it can answer the probability of more than 1100 or any x value number of frauds in a month.

Hope now you are better aware of the binomial distribution!

--

--

Rhydham Gupta
Rhydham Gupta

Written by Rhydham Gupta

I am a Data Scientist, I believe that observing and decoding data is an art. Same Data, Different Eyes Different Stories

No responses yet