Apache Spark Test 2

Spark supports which cluster managers?

MESOS

YARN

Standalone Cluster Manager

All of the above

Correct! Wrong!

Explanation:
Several cluster managers are available in the system right now:
Standalone is a simple cluster manager that comes with Spark and makes setting up a cluster a breeze.
Apache Mesos is a general-purpose cluster manager that can run Hadoop MapReduce and service applications as well. (Deprecated)
YARN is the Hadoop 2 resource manager.
Kubernetes is an open-source framework for containerized application deployment, scalability, and administration.

Which of the following statements about Spark MLlib is correct?

Enables powerful interactive and data analytics application across live streaming data

Provides an execution platform for all the Spark applications

It is the scalable machine learning library which delivers efficiencies

All of the above

Correct! Wrong!

Explanation:
It is the scalable machine learning library which delivers efficiencies

RDDs are immutable and fault-tolerant.

False

True

Correct! Wrong!

Explanation:
RDDs are fault-tolerant, immutable distributed collections of things that cannot be changed once created. RDD divides each dataset into logical partitions that can be computed on various cluster nodes.

Which algorithm is not a solution for the regression problem?

Gradient-Boosted Trees

Decision Trees

Logistic Regression

Ridge Regression

Correct! Wrong!

Explanation:
The supervised machine learning technique of logical regression can be used to predict a categorical response. It can be used to overcome machine learning difficulties, including underclassification. The process of classification involves looking at data and giving a class (or a label) to it.

Which of the following statements about Spark R is correct?

It enables users to run SQL / HQL queries on the top of Spark.

It allows data scientists to analyze large datasets and interactively run jobs

It is the kernel of Spark

It is the scalable machine learning library which delivers efficiencies

Correct! Wrong!

Explanation:
SparkR is a R package that provides a lightweight frontend for interacting with Apache Spark from within R. SparkR provides a distributed data frame implementation in Spark 3.2.1 that allows operations like selection, filtering, aggregation, and so on (similar to R data frames, dplyr), but for big datasets. MLlib, a distributed machine learning framework, is also supported by SparkR.

Which of the following statements regarding DataFrame is correct?

DataFrame API have provision for compile-time type safety

DataFrames provide a more user-friendly API than RDDs.

Both the above

None of the above

Correct! Wrong!

Explanation:
Dataframe is an easy-to-use Spark API for processing organized and unstructured data. A Schema is a blueprint for every DataFrame. It can contain both general data types like string types and integer kinds, as well as spark-specific data types like struct types.

Which of the following statements about Spark Shell is correct?

It allows reading from many types of data sources

It helps Spark applications to easily run on the command line of the system

It runs/tests application code interactively

All of the above

Correct! Wrong!

Explanation:
For Scala and Python users, the Spark Shell provides interactive command-line environments. SparkR Shell has only been properly tested to function with Spark solo to yet, and does not cover all Hadoop distributions, therefore it is not included here. REPL (Read/Eval/Print Loop) is another name for the Spark Shell.

Is MLlib a deprecated library?

Yes

Correct! Wrong!

Explanation:
The Resilient Spread Dataset (RDD), a fault-tolerant read-only multiset of data objects distributed over a cluster of servers, is the architectural foundation of Apache Spark. The usage of the Dataset API is encouraged even though the RDD API is not deprecated.

On RDD, the read operation is

Coarse-grained

Fine-grained

Neither fine-grained nor coarse-grained

Either fine-grained or coarse-grained

Correct! Wrong!

Explanation:
In RDD, the read operation can be coarse or fine grained. We can transform the entire dataset but not a single element on it because it is coarse-grained. We can transform individual elements on the dataset with fine-grained transformations.

Spark supports which cluster managers?

Which of the following statements about Spark MLlib is correct?

RDDs are immutable and fault-tolerant.

Which algorithm is not a solution for the regression problem?

Which of the following statements about Spark R is correct?

Which of the following statements regarding DataFrame is correct?

Which of the following statements about Spark Shell is correct?

Is MLlib a deprecated library?

On RDD, the read operation is

Premium Tests $49/moFREE July-2024

Premium Tests $49/mo
FREE July-2024