Data Science Deep Learning and Neural Networks 3

Question 1

What is the 'attention mechanism' in deep learning primarily used for?

Accepted Answer

Allowing the model to focus on relevant parts of the input when producing each output

Answer

Attention computes a weighted sum of input representations, letting the model dynamically focus on the most relevant tokens or features for each prediction.

Question 2

In deep learning, what is a hyperparameter?

Accepted Answer

A configuration value set before training that controls the learning process

Answer

Hyperparameters such as learning rate, batch size, and number of layers are set before training and are not updated by the optimization algorithm.

Question 3

What problem does the ReLU activation function help solve compared to sigmoid?

Accepted Answer

Vanishing gradients in deep networks by maintaining non-zero gradients for positive inputs

Answer

ReLU outputs the input directly for positive values, providing a constant gradient of 1 and mitigating the vanishing gradient problem that plagues sigmoid.

Question 4

What distinguishes a deep neural network from a shallow one?

Accepted Answer

Deep networks contain multiple hidden layers enabling hierarchical feature learning

Answer

Depth allows hierarchical composition of features — early layers detect edges, middle layers detect shapes, and later layers detect high-level concepts.

Question 5

Which technique is used to prevent overfitting by penalizing large weights in a neural network's loss function?

Accepted Answer

L2 regularization (weight decay)

Answer

L2 regularization adds a penalty proportional to the sum of squared weights to the loss, discouraging large weight values and reducing overfitting.

Data Science Practice Test

Data Science Practice Test

Data Science Deep Learning and Neural Networks 3