A data scientist is preprocessing a large corpus of text for a machine learning model. They need to reduce words like 'running', 'ran', and 'runs' to a common base form. Which technique uses a dictionary and morphological analysis to convert a word to its true base form, ensuring the result is a valid word?
-
A
Tokenization
-
B
Stop Word Removal
-
C
Stemming
-
D
Lemmatization