# Data Science Interview Experience — Top 5 tricky ML questions asked

Recently, I have been giving many interviews for the role of Data Scientist and related roles. When I started, I was disappointed by the low success rate of passing the interviews. In initial interviews, everything will be going smoothly but then the interviewer will ask a tricky ML question and my fumbled answer and subsequent line of questioning will knock me out. Soon, I realized that though these tricky questions were asked in different shapes and contexts, the fundamentals of the underlying topics remained the same and are limited.

I worked on it and as a result, after initial hiccups, I started tasting success in the interviews. Of course, this was not the only factor for clearing the interviews but now the ML section no longer leads to my elimination instead it bought me extra points.

Based on my experience, I am going to share the top 5 questions/concepts which are most focused on by the interviewers in the ML section. I am confident that if you have a solid grasp of these concepts and the related concepts, the ML section of the interview will become your strength.

**Let’s dive into these questions —**

**Q: What is the difference between XGBoost and Gradient Boosting**

In one of the initial interviews, I remember myself answering that XGBoost is the faster and optimized implementation of the Gradient Descent. Although, this is technically correct but the interviewer is actually expecting an answer more from a data science perspective.

Ans:

- XGBoost has a provision to apply Regularisation
- XGBoost has an in-built mechanism to deal with the Null values using a technique known as the sparsity awareness
- It uses gradients to create the tress hence the criteria of node split are different and based on the similarity scores
- Parallelisation to find the (variable, threshold) combinations using the weighted quantile sketch approach on large datasets
- Optimisation to speed up the computations

**Q: How do you control overfitting? Can we use cross-validation to control overfitting?**

Ans: You should know that cross-validation can be used to check the over-fitting but cannot control it. Then how can you control it —

- Feature Engineering & Feature Selection
- Parameter tuning
- Treating outliers if it is a linear algorithm
- Regularisation
- Early stopping, Pruning
- If possible, get more data

**Bonus tip **— Try to explain any scenario that you have faced while working on a project. I used to tell that in one of the projects I was facing this issue of overfitting because test accuracy was low while train accuracy was fine. On in-depth examination, I identified that there were highly correlated variables that were causing this to happen. So I used PCA to combine them into 1 and hence got rid of overfitting.

**Q: Different evaluation metrics used for regression and which one is preferred when?**

Ans: Different evaluation metrics used for regression are —

- R2 (Most commonly reported metric, it explains what percentage of variance in dependent is explained by the independent variables)
- Adj. R2 (R2 that gets penalized for the high number of variables)
- MSE (Loss function)
- RMSE (Root mean squared error)
- MAPE (Mean absolute percentage error, this metric is easy to explain to the business as it indicates what is the percentage error in prediction on an average. For e.g. 10% MAPE indicates that for each record on average prediction values deviate by 10% from the actual)

**Bonus **— **When to use MSE vs RMSE?**

RMSE is the metric that is at the same scale as that of the actual dependent whereas MSE is on the squared scale.

**Q: What is the difference between Bagging & Boosting**

Ans:

Bagging —

- It creates multiple trees parallelly and takes the majority vote for the final prediction
- It creates trees on the actual dependent value
- It provides more robustness and might not give very good results on the imbalanced dataset

Boosting —

- It is the sequence of weak learners and the next tree is dependent on the residuals of the prediction till the last tree
- It creates trees on the residuals
- Since, it creates the subsequent trees on the residuals, hence more importance is given to the misclassified samples. So it works well on the imbalanced dataset

**Bonus **— In which case, will you prefer Bagging over Boosting and vice-versa?

The answer lies in the above points.

**Q: What are precision and recall? An interesting version of this question, ****It is reported that 12 of the predicted 18 frauds were classified correctly and 80% of the total frauds were caught. Tell the precision and recall.**

Ans:

You might get confused about this question if you have only a surface-level understanding of the precision & recall. Let’s try to create the confusion matrix.

Precision = TP/(TP + FP) = 12/18 = 0.66

Recall = TP/(TP + FN) = 12/15 = 0.8

Hurray! Importantly if you have a sound understanding of precision and recall, you would realize that the answer is directly present in the question.

Recall — What % of the actual 1s are predicted correctly = 80% = 0.8

Precision — What is the accuracy rate of the predictions, out of 18 predictions 12 were right, hence 12/18 = 0.66

Interestingly, the fun part of this question is that TN is not a question and it is not even required for precision and recall.

Hope you liked this article. You can find out a detailed explanation in my blog.