CNN - Convolutional Neural Networks Object Detection Models Questions and Answers

Question 1

An engineer is developing a system for real-time object detection on a mobile device with limited computational power.
The highest priority is inference speed, even if it means a slight trade-off in accuracy, especially for very small objects.
Which object detection model architecture is most suitable for this scenario?

Accepted Answer

A one-stage detector like YOLO or SSD, because it performs localization and classification in a single pass, optimizing for speed.

Answer

Faster R-CNN, because its two-stage approach with a Region Proposal Network (RPN) provides superior accuracy.

Answer

R-CNN, because it uses an external selective search algorithm that is computationally efficient.

Answer

Mask R-CNN, because it extends Faster R-CNN to provide pixel-level segmentation, which is beneficial for speed.

Question 2

In object detection, what is the primary purpose of the Non-Maximum Suppression (NMS) algorithm?

Accepted Answer

To select the single best bounding box for an object from multiple overlapping predictions.

Answer

To generate a diverse set of initial region proposals across the entire image.

Answer

To increase the number of bounding boxes for each object to improve recall.

Answer

To calculate the classification loss for each predicted bounding box.

Question 3

What was the key innovation in the Faster R-CNN architecture that distinguished it from its predecessor, Fast R-CNN?

Accepted Answer

The introduction of a Region Proposal Network (RPN) to generate object proposals within the main network.

Answer

The use of a deeper backbone network like VGG-16 for feature extraction.

Answer

The implementation of RoI (Region of Interest) Pooling to handle inputs of different sizes.

Answer

The replacement of the SVM classifier with a softmax layer for object classification.

Question 4

Which of the following best describes the role of anchor boxes in models like Faster R-CNN and SSD?

Accepted Answer

They are a set of predefined reference boxes of various sizes and aspect ratios used as a starting point for predicting bounding box offsets.

Answer

They are the final, perfectly localized bounding boxes output by the model.

Answer

They are used exclusively to calculate the Intersection over Union (IoU) for the final evaluation metric.

Answer

They are dynamically generated for each image to perfectly match the ground-truth objects before training begins.

Question 5

A data scientist is evaluating their object detection model's performance. They are using Intersection over Union (IoU) to determine if a predicted bounding box is a true positive. What does an IoU score of 0.8 signify?

Accepted Answer

There is a high degree of overlap, with the area of intersection being 80% of the area of the union between the predicted and ground-truth boxes.

Answer

The predicted box and the ground-truth box have no overlap.

Answer

The model is 80% confident that the object class is correct.

Answer

The area of the union of the two boxes is 80% of the area of their intersection.

Question 6

Which of the following is a defining characteristic of one-stage object detectors like SSD (Single Shot MultiBox Detector)?

Accepted Answer

They make predictions on a dense grid of locations across feature maps of multiple scales.

Answer

They use a computationally expensive selective search to find interesting regions first.

Answer

They require each image to be processed multiple times to find all objects.

Answer

They first generate region proposals and then pass each proposal to a separate classifier.