Machine Learning Practice Test
We set the gradient to zero to obtain the minimum or maximum of a function because:
The gradient of a multivariable function at a maximum point will be the zero vector of the function, which is the single greatest value that the function can achieve.
Which of the following machine learning algorithms is based on the principle of bagging and is extensively used and effective?
The Radom Forest algorithm builds an ensemble of Decision Trees, mostly trained with the bagging method.
Which of the following is a good characteristic of a test dataset?
A good test dataset has a good amount of sample population and equal ratios of class representation.
The following are the most regularly used metrics and tools for evaluating a classification model:
The model performance assessment for classification algorithms encorporates all of the above techniques.
What is the purpose of cross-validation?
Cross-validation is a model validation technique for assessing how the results of a statistical analysis will generalize to an independent data set.
How do you deal with data in a dataset that is missing or corrupted?
Explanations: All of the above techniques are different ways of imputing the missing values.
A disadvantage of decision trees is which of the following?
Allowing a decision tree to split to a granular degree makes decision trees prone to learning every point extremely well to the point of perfect classification that is overfitting.
Which of the following is the correct technique to preprocess data before performing regression or classification?
You need to always normalize the data first. If not, PCA or other techniques that are used to reduce dimensions will give different results.
Why is it necessary to use second-order differencing in a time series?
If the second-order difference is positive, the time series will curve upward and if it is negative, the time series will curve downward at that time.
In Sklearn, what is pca.components_?
pca.components_ is the set of all eigen vectors for the projection space.
Which of the following is a feature extraction example?
All of the above techniques transform raw data into features which can be used as inputs to machine learning algorithms.
Which of the following regularization statements is incorrect?
A large value results in a large regularization penalty and therefore, a strong preference for simpler models, which can underfit the data.
Which of the following statements about Naive Bayes is correct?
Naive Bayes assumes that all the features in a data set are equally important and independent.
Which of the following scenarios will K-means clustering fail to produce satisfactory results? 1) Outliers in the data 2) Data points of various densities 3) Nonconvex data points
K-means clustering algorithm fails to give good results when the data contains outliers, the density spread of data points across the data space is different, and the data points follow nonconvex shapes.
In-text mining, which of the following approaches can be used for normalization?
Lemmatization and stemming are the techniques of keyword normalization.
How can a clustering algorithm avoid becoming caught in a bad local optima?
K-Means clustering algorithm has the drawback of converging at local minima which can be prevented by using multiple radom initializations.
After 15 iterations of gradient descent with a=0.3, you compute J(theta). You notice that J(Theta) rapidly falls before leveling out. Which of the following conclusions do you think is most likely based on this information?
You need the gradient descent to quickly converge to the minimum. So the current setting of a seems to be good.
Which of the following is an appropriate method for determining "k" main components?
This will maintain the structure of the data and also reduce its dimension.
What is the purpose of a sentence parser?
Sentence parsers analyze a sentence and automatically build a syntax tree.
Using the automated machine learning user interface, you create a machine learning model (UI). You must guarantee that the model complies with Microsoft's transparent AI philosophy. What are your options?
Model Explain Ability. Most businesses run on trust and being able to open the ML “black box” helps build transparency and trust. In heavily regulated industries like healthcare and banking, it is critical to comply with regulations and best practices. One key aspect of this is understanding the relationship between input variables (features) and model output. Knowing both the magnitude and direction of the impact each feature (feature importance) has on the predicted value helps better understand and explain the model. With model explain ability, we enable you to understand feature importance as part of automated ML runs.
Different binary classification models are being evaluated by a Data Scientist. A false positive result is 5 times more
expensive than a false negative result (from a commercial standpoint).
The following criteria should be used to evaluate the models:
1) Must have a recall rate of at least 80%
2) Must have a false positive rate of 10% or less
3) Must minimize business costs
The Data Scientist creates the matching confusion matrix once each binary classification model is created.
Which confusion matrix best describes the model that meets the criteria?
The following calculations are required:
TP = True Positive
FP = False Positive
FN = False Negative
TN = True Negative
FN = False Negative
Recall = TP / (TP + FN)
False Positive Rate (FPR) = FP / (FP + TN)
Cost = 5 * FP + FN
A Machine Learning Engineer uses the Amazon SageMaker Linear Learner algorithm to prepare a data frame for a
supervised learning task. The ML Engineer notes that the target label classes are unbalanced, and that several feature
columns have missing data. The percentage of missing values is less than 5% for the full data frame.
What should the machine learning engineer do to reduce bias caused by missing values?
Use supervised learning to predict missing values based on the values of other features. Different supervised learning approaches might have different performances, but any properly implemented supervised learning approach should provide the same or better approximation than mean or median approximation, as proposed in responses A and B. Supervised learning applied to the imputation of missing values is an active field of research.
A business wants to develop a fraud detection model. Due to the limited number of fraud incidents, the Data Scientist
currently does not have enough information.
Which strategy is the MOST LIKELY to catch the MOST genuine fraud cases?
With datasets that are not fully populated, the Synthetic Minority Over-sampling Technique (SMOTE) adds new information by adding synthetic data points to the minority class. This technique would be the most effective in this scenario.
A fraud detection model is built using logistic regression by a Data Scientist. While the algorithm's accuracy is 99 percent,
the model fails to detect 90 percent of fraud incidents.
What activity will ensure that the model is able to detect more than 10% of fraud cases?
Decreasing the class probability threshold makes the model more sensitive and, therefore, marks more cases as the positive class, which is fraud in this case. This will increase the likelihood of fraud detection. However, it comes at the price of lowering precision.
In Amazon S3, a Machine Learning team has numerous huge CSV datasets. On similar-sized datasets, models
developed with the Amazon SageMaker Linear Learner algorithm have previously taken hours to train. The
training process must be accelerated by the team's leaders.
What can a Machine Learning Expert do to help with this issue?
Amazon SageMaker Pipe mode streams the data directly to the container, which improves the performance of training jobs. In Pipe mode, your training job streams data directly from Amazon S3. Streaming can provide faster start times for training jobs and better throughput. With Pipe mode, you also reduce the size of the Amazon EBS volumes for your training instances. A would not apply in this scenario. C transforms the data structure. D is a streaming ingestion solution, but is not applicable in this scenario.