# Business Analysis Statistics Test #1

#### A linear regression's usual formula is Y= beta0+beta1*X+error. Which phrase best encapsulates the presumptions made about the mistakes?

The correct answer:

The errors are independent, normally distributed with zero mean and constant variance.

#### Which SAS application will split the original data set into training and validation data sets stratified by county, each comprising 60% of the data?

The correct answer:

Proc sort data=SASUSER.DATABASE;by county;run;proc surveyselect data=SASUSER.DATABASE samprate=0.6 out=sample outall;strata county;run;

#### A logistic regression model's input variable, Region (A, B, or C), is investigated by an analyst. The analyst finds that when Region = A, the likelihood of purchasing a specific item is 1. What issue does this highlight?

The correct answer:

When the dependent variable partially or partially completely separates an independent variable or a mixture of numerous independent variables, this is known as quasi-complete separation. In a discrete outcome variable, levels in a category variable or values in a numeric variable are separated by groups.

#### The mean incomes of men and women employed by a corporation are compared by an analyst. Variables in the SAS data collection SALARY include: Gender (M or F) Pay (dollars per year) What SAS tools may be used to calculate the p-value when comparing the wages of men and women? (Select two.)

Please select 2 correct answers

The correct answer:

Proc ttest data=salary;class gender;var pay;run;

Proc glm data=salary;class gender;model pay=gender;run;

#### Which statistic from a validation sample can be used to choose the model to employ for a binary target variable's prediction?

The correct answer:

The model with the lowest average squared error value is the one that is selected. The model with the lowest mean squared error value is the one that is chosen.

#### Training, validation, and test data have been separated from the entire modeling data. Which data are most suitable for model evaluation?

The correct answer:

Data scientists can assess how successfully the model produces predictions based on the new data using validation data, which serves as the initial test against unobserved data. Validation data is not often used by data scientists, but it might offer some useful information for optimizing the hyperparameters that affect how the model evaluates data.

#### Which statistic, when applied to a larger model, suggests a better model?

The correct answer:

The changed R-squared is a variant of R-squared that takes into account factors in a regression model that is not significant. In other words, the adjusted R-squared demonstrates whether or not a regression model is improved by including more factors.

#### Which of the following best defines a pair of observations that are incongruent in the LOGISTIC process?

The correct answer:

In comparison to an observation without the event, an observation with the event has a lower anticipated probability.

#### This model has been chosen as the winner by an analyst since it outperforms a rival model with more predictors in terms of model fit. Which statistic supports this argument?

The correct answer:

Is a corrected model accuracy (goodness-of-fit) metric for linear models. It shows how much of the volatility in the target field can be attributed to the input or inputs.

#### What is the best way to handle mean imputation when it is applied to data that has already been partitioned for an accurate assessment?

The correct answer:

The validation and test data sets are subjected to the sample means from the training data set.

#### A financial services manager is attempting to determine the likelihood that specific customers will not pay off their home equity line of credit (HELOC). The code below was left by a previous employee. A similar data set of more recent clients is called RECENT HELOC, while the training data set is called HELOC. Which SAS data procedures will determine the anticipated likelihood of client default for recent clients? (Select two.) insert here>; data new prob; set scored heloc; run;

Please select 2 correct answers

The correct answer:

Odds=exp(default);p=odds/1+odds;

P=1/(1+exp(-default));