AI AI Engineer: MLOps and Model Deployment 3

Question 1

What is the purpose of 'model quantization' in the context of ML deployment?

Accepted Answer

To reduce model size and inference time by using lower-precision arithmetic

Answer

Quantization converts model weights from 32-bit floats to 8-bit integers, shrinking memory footprint and speeding up inference with minimal accuracy loss.

Question 2

Which Python library is widely used for building ML pipelines that can be tracked and reproduced?

Accepted Answer

DVC (Data Version Control)

Answer

DVC tracks data, models, and pipeline stages in Git-compatible workflows, enabling reproducible ML experiments.

Question 3

What does 'SLA' stand for in the context of ML model serving?

Accepted Answer

Service Level Agreement

Answer

A Service Level Agreement defines the agreed-upon performance targets, such as maximum latency and uptime, for a model serving system.

Question 4

In MLOps, what does 'continuous training' (CT) refer to?

Accepted Answer

Automatically retraining models on new data triggered by schedule or drift detection

Answer

Continuous training automatically triggers model retraining on new production data, often scheduled or triggered by detected data or concept drift.

Question 5

Which serving framework is designed for high-performance, production-grade deployment of ML models, supporting REST and gRPC?

Accepted Answer

TensorFlow Serving

Answer

TensorFlow Serving is a flexible, high-performance serving system for ML models, supporting REST and gRPC endpoints for production deployment.

AI - Engineer Practice Test

AI - Engineer Practice Test

AI AI Engineer: MLOps and Model Deployment 3