Explanation:
The gradient of a multivariable function at a maximum point will be the zero vector of the function, which is the
single greatest value that the function can achieve.
Explanation:
The Radom Forest algorithm builds an ensemble of Decision Trees, mostly trained with the bagging method.
Explanations:
A good test dataset has a good amount of sample population and equal ratios of class representation.
Explanations:
The model performance assessment for classification algorithms encorporates all of the above techniques.
Explanations:
Cross-validation is a model validation technique for assessing how the results of a statistical analysis will generalize to
an independent data set.
Explanations: All of the above techniques are different ways of imputing the missing values.
Explanations:
Allowing a decision tree to split to a granular degree makes decision trees prone to learning every point extremely well
to the point of perfect classification that is overfitting.
Explanations:
You need to always normalize the data first. If not, PCA or other techniques that are used to reduce
dimensions will give different results.
Explanations:
If the second-order difference is positive, the time series will curve upward and if it is negative, the time series will curve
downward at that time.
Explanations:
pca.components_ is the set of all eigen vectors for the projection space.
Explanations:
All of the above techniques transform raw data into features which can be used as inputs to machine learning algorithms.
Explanations:
A large value results in a large regularization penalty and therefore, a strong preference for simpler models, which can underfit the data.
Explanations:
Naive Bayes assumes that all the features in a data set are equally important and independent.
Explanations:
K-means clustering algorithm fails to give good results when the data contains outliers, the density spread of data points
across the data space is different, and the data points follow nonconvex shapes.
Explanations:
Lemmatization and stemming are the techniques of keyword normalization.
Explanations:
K-Means clustering algorithm has the drawback of converging at local minima which can be prevented by using multiple radom initializations.
Explanations:
You need the gradient descent to quickly converge to the minimum. So the current setting of a seems to be good.
Explanations:
This will maintain the structure of the data and also reduce its dimension.
Explanations:
Sentence parsers analyze a sentence and automatically build a syntax tree.
Explanation:
Model Explain Ability.
Most businesses run on trust and being able to open the ML “black box” helps build transparency and trust. In heavily
regulated industries like healthcare and banking, it is critical to comply with regulations and best practices. One key aspect
of this is understanding the relationship between input variables (features) and model output. Knowing both the magnitude
and direction of the impact each feature (feature importance) has on the predicted value helps better understand and explain
the model. With model explain ability, we enable you to understand feature importance as part of automated ML runs.
Explanation:
The following calculations are required:
TP = True Positive
FP = False Positive
FN = False Negative
TN = True Negative
FN = False Negative
Recall = TP / (TP + FN)
False Positive Rate (FPR) = FP / (FP + TN)
Cost = 5 * FP + FN
Explanation:
Use supervised learning to predict missing values based on the values of other features. Different
supervised learning approaches might have different performances, but any properly implemented
supervised learning approach should provide the same or better approximation than mean or median
approximation, as proposed in responses A and B. Supervised learning applied to the imputation of
missing values is an active field of research.
Explanation:
With datasets that are not fully populated, the Synthetic Minority Over-sampling Technique (SMOTE)
adds new information by adding synthetic data points to the minority class. This technique would be the
most effective in this scenario.
Explanation:
Decreasing the class probability threshold makes the model more sensitive and, therefore, marks
more cases as the positive class, which is fraud in this case. This will increase the likelihood of fraud
detection. However, it comes at the price of lowering precision.
Explanation:
Amazon SageMaker Pipe mode streams the data directly to the container, which improves the
performance of training jobs. In Pipe mode, your training job streams data directly from Amazon S3.
Streaming can provide faster start times for training jobs and better throughput. With Pipe mode, you
also reduce the size of the Amazon EBS volumes for your training instances. A would not apply in this
scenario. C transforms the data structure. D is a streaming ingestion solution, but is not applicable in this
scenario.