Data Science Interview Questions and Answers
• Data Science Interview Questions
 1. At which position does a continuous function have a zero slope A) at the lowest point B) at the highest point C) at the saddle points D) All the above Answer:D (at the lowest point, at the highest point and at the saddle points) 2. We have three eigenvectors x, y and z created out of the covariance matrix of three continuous variables a, b and c that are on the same scale. Which of the following is True? A) x . z = 0 B) x . y = 1 C) z . z = - 1 D) y . z = 0.5 Answer:A (x . z = 0) 3. If we had two samples of size 30 each. One of the samples is perfectly normally distributed and the other is slightly skewed. We want to find if both the distributions come from the same population by performing a hypothesis test. Can we transform the skewed sample into a normal distribution (using a mathematical function) and perform the test? A) Yes we can, because maths transformations are valid B) Yes we can because transformation does not affect the sample size C) No we cannot because both are no longer on the same scale D) We cannot because maths transformation is biased Answer:C (No we cannot because both are no longer on the same scale ) 4. Principal Component Analysis is a dimensionality reduction technique which uses the variance in the target variable to derive principle components of the predictors. Is the above statement True or False? A) True B) False Answer:B (False) 5. Can we use the correlation matrix instead of covariance matrix when we perform PCA? A) No, we have to strictly use covariance B) Yes, because even correlation measures association C) Yes, because either way there is no practical difference in results D) Yes, using correlation is in fact always better Answer:B (Yes, because even correlation measures association) 6. We have three principal components x, y and z (sorted in descending order of their explained variance 50%, 30% and 20% ) created out of three continuous variables a, b and c that are on the same scale. If i, j and k are sums of correlation of x, y and z respectively with each of the variables a, b and c. [example: j = corr(y,a) + corr(y,b) + corr(y,c)] Which of the following is true? A) i ≥ j ≥ k B) i ≤ j ≤ k C) i + j + k = 0 D) i = j = k Answer:A (i ≥ j ≥ k) 7. Which of the following is/are the assumption(s) of Naive Bayes algorithm? 1) All the features in a dataset are equally important 2) All the features in a dataset are independent A) 1 B) 2 C) 1 and 2 D) Neither 1 nor 2 Answer:C (1 and 2) 8. What is the F1 score for the given confusion matrix? TP, TN, FP, FN = 52, 85, 43, 12 A) 0.50 B) 0.65 C) 0.75 D) 0.85 Answer:C (0.75) 9. What is the range of values of Root Mean Squared Error in the context of a regression job? A) -1 to +1, because we can have negative error value B) 0 to 1, because error is essentially in terms of percentage C) 1 to 100, because error can never be exactly zero D) Depends on the scale and magnitude of the values of the dependant variable Answer:D (Depends on the scale and magnitude of the values of the dependant variable)