What is the purpose of 'model quantization' in the context of ML deployment?
-
A
To increase model accuracy
-
B
To reduce model size and inference time by using lower-precision arithmetic
-
C
To add more layers to the neural network
-
D
To balance load across multiple servers