1. Which of the following statements about components_ attribute of sklearn.decomposition.PCA is true?
1) It gives the principal axes in feature space
2) It represents the directions of maximum variance in the data
3) It's a set of all eigen vectors for the projection space
A) 1 and 2
B) 2 and 3
C) 1 and 3
D) 1, 2 and 3
Answer:D (1, 2 and 3)

2.Which of the following is/are true?
1) A covariance of zero indicates that two variables are extremely related or same.
2) Covariance and correlation are exactly the same if the features are normalized to unit variance
3) Correlation is the standardized form of covariance.
A) 1 and 2
B) 2 and 3
C) 1 and 3
D) 1, 2 and 3
Answer:B (2 and 3)

3. Which of the following is an sklearn module that can be used to concat preprocessing steps and an estimator into one single function?
1) Decision Tree
2)Bagging
3)Random Forest
4)Gradient Boosting
A) 1 and 2
B) 2 and 3
C) 3 and 4
D) 1 and 4
Answer:B (2 and 3)

4. Which of the following cannot be used to assess a regression job?
A) ROC curve
B) Mean Absolute Error
C) Coefficient of determination
D) scatterplot between predictions and actual
Answer:A (ROC curve)

5. In the context of Kmeans clustering, when we plot an elbow plot, the horizontal axis tells us the number of clusters. What does the vertical axis tell us?
A) Variance in the target variable
B) InterClusterSumofSquaredDistances
C) IntraClusterSumofSquaredDistances
D) Sum of squared distances/degrees of freedom
Answer:C (IntraClusterSumofSquaredDistances)

6. Which of the following statements should be true in the context of training and testing an ML model
The test set should be representative of the population
The train set should be representative of the population
A) 1 only
B) 2 only
C) Neither 1 nor 2
D) Sum of squared distances/degrees of freedom
Answer:C (Both 1 and 2)

7. The final decision boundary of a decision tree is always ____?
A) Linear
B) Curvilinear
C) Nonlinear
D) None of the above
Answer:D (The final decision boundary of a decision tree is always)

8. A decision tree that is let to grow to its maximum size is prone to have what of the following?
A) High bias error
B) High recall on test set
C) High precision on test set
D) High variance errorh
Answer:D (High variance error)

9. How many iterations of training and testing happen if we perform Leave One Out Cross Validation on a dataset of size 10,000 records?
A) 1
B) 100
C) 9999
D) 10000
Answer:D (10000)

10. If one were to let a decision tree grow to the fullest extent on a practical dataset, what would the impurity of individual leaf nodes likely be?
A) Close to 0
B) Close to 0.5
C) Close to 1
D) Maximum
Answer:A (Close to 0)

11. The following table gives the predicted ratings(by our model) and actual ratings of some products. Calculate the RMSE score for these predictions.
Product Name

Actual Rating

Predicted Rating

XDR

5

2

XLP

3

4

XTZ

4

5




A) 1.61
B) 1.91
C) 2.61
D) 2.91
Answer:B (1.91)

12. You own an Ecommerce website. There User A likes product 1, 2 and 3. B like product 6,3,4. C like product 1, 2 and 5. If you use the concept of useruser based collaborative filtering, then if user C comes to your ecommerce website, then what product would be recommended to user C?
A) 1
B) 2
C) 3
D) 6
Answer:C (3)
