Master Data Science: Machine Learning Q&A

Which of the following can be used to a collection of data to produce balanced cross-validation groupings?

createSample

createResample

createFolds

None of the above

Correct! Wrong!

Simple bootstrap samples can be created with createResample.

Identify the incorrect statement.

Three parameters are used for time series splitting

Horizon parameter is the number of consecutive values in test set sample

Simple random sampling of time series is probably the best way to resample times series data.

All of the above

Correct! Wrong!

The optimal method for resampling time series data is probably not just random sampling.

Which of the subsequent functions can be used to maximize the minimal differences?

avgDiss

minDiss

sumDiss

All of the above

Correct! Wrong!

The total number of differences can be increased by using sumDiss.

Which of the following functions is capable of producing the indices needed for the time series splitting type?

binTimeSlices

newTimeSlices

createTimeSlices

None of the above

Correct! Wrong!

Techniques used in rolling forecasting are related to the splitting of time series.

Which of the aforementioned functions wraps various lattice graphs to display the data?

plotsample

featurePlot

levelplot

None of the above

Correct! Wrong!

Caret employs featurePlot to visualize data.

Identify the incorrect statement.

Predictors might have only a handful of unique values that occur with very low frequencies

The function findLinearCombos uses the QR decomposition of a matrix to enumerate sets of linear combinations

In every situation, the data generating mechanism can create predictors that only have a single unique value

All of the above

Correct! Wrong!

The mechanism used to generate the data occasionally produces predictors with just one distinct value.

The evimp function in the ______ package is wrapped in varImp.

plot

numpy

earth

None of the above

Correct! Wrong!

Multivariate Adaptive Regression Splines by Jerome Friedman are implemented in the earth package.

Identify the incorrect statement.

An argument, para, is used to pick the model fitting technique

The trapezoidal rule is used to compute the area under the ROC curve

For regression, the relationship between each predictor and the outcome is evaluated

All of the above

Correct! Wrong!

The model fitting method is chosen through the application of a nonpara argument.

Identify the incorrect statement.

In Sample Error is also called resubstitution error

In Sample Error is also called generalization error

Out of Sample Error is the error rate you get on the new dataset

All of the above

Correct! Wrong!

Generalization error is another name for out of sample error.

Which of the following best describes the proper working order?

evaluation->input data ->algorithms

questions->evaluation ->algorithms

questions->input data ->algorithms

All of the above

Correct! Wrong!

Questions: The process begins with defining the specific questions or problems that need to be answered or addressed through data analysis. These questions guide the entire analysis and help determine the relevant data and the approach to be used. Input Data: Once the questions are defined, relevant data is collected and prepared for analysis. Data collection can involve various methods, including surveys, experiments, web scraping, or accessing existing datasets. Algorithms: After obtaining the data, appropriate algorithms are selected and applied to analyze the data, extract patterns, make predictions, or perform any specific task to answer the defined questions. So, the correct order of working is: Questions -> Input Data -> Algorithms

Which of the following exhibits the proper relative importance?

question->data->features->algorithms

algorithms->data->features->question

question->features->data->algorithms

None of the above

Correct! Wrong!

Question: The starting point is to define the specific questions or problems that need to be answered or addressed through data analysis. Data: Once the questions are defined, relevant data is collected and prepared for analysis. High-quality and relevant data are essential for accurate and meaningful results. Features: After obtaining the data, relevant features or attributes are extracted or selected from the data. These features act as inputs to the algorithms. Algorithms: With the data and features in place, appropriate algorithms are applied to analyze the data, extract patterns, make predictions, or perform any specific task to answer the defined questions.So, the correct relative order of importance is Question -> Data -> Features -> Algorithms.

When making predictions, trees examine each set of data's .

heterogeneity

equality

homogeneity

All of the above

Correct! Wrong!

When making predictions, trees examine each set of data's .

Identify the incorrect statement.

Test transformation would mostly be imperfect

The first goal is statistical and second is data compression in PCA

Training and testing data must be processed in different way

All of the above

Correct! Wrong!

Data from both training and testing must be treated similarly.

Which of the following options for a bagging method does the train function offer?

bagFDA

treebag

bagEarth

All of the above

Correct! Wrong!

Using the bag function also permits bagging.

Which of the following statements about random forest is accurate?

Random forest are difficult to interpret but very less accurate

Random forest are easy to interpret but often very accurate

Random forest are difficult to interpret but often very accurate

None of the above

Correct! Wrong!

The best prediction algorithm is random forest.

Which of the following uses additive logistic regression as the foundation for statistical boosting?

mboost

ada

gbm

gamBoost

Correct! Wrong!

Model-based boosting is done using mboost.

FREE Master of Data Science Machine Learning Questions and Answers

Which of the following can be used to a collection of data to produce balanced cross-validation groupings?

Identify the incorrect statement.

Which of the subsequent functions can be used to maximize the minimal differences?

Which of the following functions is capable of producing the indices needed for the time series splitting type?

Which of the aforementioned functions wraps various lattice graphs to display the data?

Identify the incorrect statement.

The evimp function in the ______ package is wrapped in varImp.

Identify the incorrect statement.

Identify the incorrect statement.

Which of the following best describes the proper working order?

Which of the following exhibits the proper relative importance?

When making predictions, trees examine each set of data's .

Identify the incorrect statement.

Which of the following options for a bagging method does the train function offer?

Which of the following statements about random forest is accurate?

Which of the following uses additive logistic regression as the foundation for statistical boosting?

Related Post