Machine Learning Interview Questions and Answers
  • ML (Machine Learning) Interview Questions
    1. Which of the following statements about components_ attribute of sklearn.decomposition.PCA is true?
    1) It gives the principal axes in feature space
    2) It represents the directions of maximum variance in the data
    3) It's a set of all eigen vectors for the projection space

    A) 1 and 2

    B) 2 and 3

    C) 1 and 3

    D) 1, 2 and 3

    Answer:D (1, 2 and 3)

    2.Which of the following is/are true?
    1) A covariance of zero indicates that two variables are extremely related or same.
    2) Covariance and correlation are exactly the same if the features are normalized to unit variance
    3) Correlation is the standardized form of covariance.

    A) 1 and 2

    B) 2 and 3

    C) 1 and 3

    D) 1, 2 and 3

    Answer:B (2 and 3)

    3. Which of the following is an sklearn module that can be used to concat preprocessing steps and an estimator into one single function?

    1) Decision Tree


    3)Random Forest

    4)Gradient Boosting

    A) 1 and 2

    B) 2 and 3

    C) 3 and 4

    D) 1 and 4

    Answer:B (2 and 3)

    4. Which of the following cannot be used to assess a regression job?

    A) ROC curve

    B) Mean Absolute Error

    C) Coefficient of determination

    D) scatterplot between predictions and actual

    Answer:A (ROC curve)

    5. In the context of K-means clustering, when we plot an elbow plot, the horizontal axis tells us the number of clusters. What does the vertical axis tell us?

    A) Variance in the target variable

    B) Inter-Cluster-Sum-of-Squared-Distances

    C) Intra-Cluster-Sum-of-Squared-Distances

    D) Sum of squared distances/degrees of freedom

    Answer:C (Intra-Cluster-Sum-of-Squared-Distances)

    6. Which of the following statements should be true in the context of training and testing an ML model

    The test set should be representative of the population

    The train set should be representative of the population

    A) 1 only

    B) 2 only

    C) Neither 1 nor 2

    D) Sum of squared distances/degrees of freedom

    Answer:C (Both 1 and 2)

    7. The final decision boundary of a decision tree is always ____?

    A) Linear

    B) Curvilinear

    C) Non-linear

    D) None of the above

    Answer:D (The final decision boundary of a decision tree is always)

    8. A decision tree that is let to grow to its maximum size is prone to have what of the following?

    A) High bias error

    B) High recall on test set

    C) High precision on test set

    D) High variance errorh

    Answer:D (High variance error)

    9. How many iterations of training and testing happen if we perform Leave One Out Cross Validation on a dataset of size 10,000 records?

    A) 1

    B) 100

    C) 9999

    D) 10000

    Answer:D (10000)

    10. If one were to let a decision tree grow to the fullest extent on a practical dataset, what would the impurity of individual leaf nodes likely be?

    A) Close to 0

    B) Close to 0.5

    C) Close to 1

    D) Maximum

    Answer:A (Close to 0)

    11. The following table gives the predicted ratings(by our model) and actual ratings of some products. Calculate the RMSE score for these predictions.

    Product Name Actual Rating Predicted Rating
    XDR 5 2
    XLP 3 4
    XTZ 4 5

    A) 1.61

    B) 1.91

    C) 2.61

    D) 2.91

    Answer:B (1.91)

    12. You own an E-commerce website. There User A likes product 1, 2 and 3. B like product 6,3,4. C like product 1, 2 and 5. If you use the concept of user-user based collaborative filtering, then if user C comes to your e-commerce website, then what product would be recommended to user C?

    A) 1

    B) 2

    C) 3

    D) 6

    Answer:C (3)

  • python bootcamp 2021
datai analytics