Neural Networks Practice Test Video Answers
1. A
A perceptron combines inputs with weights, adds bias, and passes the sum through an activation function.
2. B
Activation functions introduce non-linearity, enabling the network to learn complex patterns.
3. B
ReLU (Rectified Linear Unit) is the most common hidden-layer activation function.
4. B
Backpropagation adjusts weights using gradients from the output layer backward.
5. C
Feedforward networks pass data from input → hidden layers → output.
6. B
Gradient descent minimizes the loss function by adjusting weights.
7. B
Overfitting occurs when the network memorizes training data, reducing generalization.
8. B
Dropout randomly disables neurons during training to reduce overfitting.
9. B
Batch normalization stabilizes and speeds up training by normalizing layer inputs.
10. B
Softmax outputs probabilities that sum to 1 across all classes.
11. B
CNNs excel in image and spatial data analysis.
12. B
Pooling layers reduce spatial size, lowering computation and overfitting risk.
13. B
RNNs handle sequential data such as text or speech.
14. B
Very deep networks with sigmoid/tanh activations suffer vanishing gradients.
15. B
LSTMs address vanishing gradients in RNNs using memory cells and gates.
16. B
An epoch is one complete pass through the training dataset.
17. B
The loss function measures error between predictions and true labels.
18. B
Adam combines momentum and adaptive learning rates for optimization.
19. B
Overfitting is reduced with dropout, early stopping, and data augmentation.
20. A
Shallow networks have only input-output layers or one hidden layer.
21. B
Weight initialization helps convergence and prevents symmetry issues.
22. B
Transfer learning reuses a pre-trained model for a new, related task.
23. B
The universal approximation theorem states one hidden layer can approximate any continuous function.
24. B
Precision, recall, and F1-score are best for imbalanced datasets.
25. B
A policy network maps states to actions in reinforcement learning.
26. B
ReLU allows gradients for positive values, reducing vanishing gradients.
27. A
Autoencoders learn representations without labels → unsupervised learning.
28. B
Hyperparameter tuning adjusts settings like learning rate and batch size.
29. B
The bottleneck layer compresses data into a reduced representation.
30. A
GANs use generator vs. discriminator in a competitive setup.
31. B
Learning rate controls the step size in weight updates.
32. B
Too high learning rate → diverging or oscillating loss.
33. B
A kernel is a small matrix of weights for extracting features in CNNs.