Data Analysis for ML/AI
Text data analysis
Tokenization, vocabularies, n-grams, EDA on text features.
5 lessons
Follow in order
Use the button below to sign in and unlock lessons.
Your path
Lessons in sequence
Work through these in order—each lesson builds on the previous one.
-
Lesson 1 of 5
Tokenization fundamentals
-
Lesson 2 of 5
Text normalization: case, accents, unicode
-
Lesson 3 of 5
Stopwords, stemming, and lemmatization
-
Lesson 4 of 5
N-grams, bag-of-words, and TF-IDF
-
Lesson 5 of 5
Text data pitfalls: encodings and dirty inputs