Data Analysis for ML/AI

Text data analysis

Tokenization, vocabularies, n-grams, EDA on text features.

5 lessons Follow in order

Use the button below to sign in and unlock lessons.

Your path

Lessons in sequence

Work through these in order—each lesson builds on the previous one.

Lesson 1 of 5

Tokenization fundamentals

Enroll to open · 10 min read
Lesson 2 of 5

Text normalization: case, accents, unicode

Enroll to open · 10 min read
Lesson 3 of 5

Stopwords, stemming, and lemmatization

Enroll to open · 9 min read
Lesson 4 of 5

N-grams, bag-of-words, and TF-IDF

Enroll to open · 10 min read
Lesson 5 of 5

Text data pitfalls: encodings and dirty inputs

Enroll to open · 9 min read