Start this lesson

Open this research notebook ->

This notebook demonstrates how to do topic modeling. The following processes are described:

Use Case: For Researchers (Less explanation, better for research pipelines)

Difficulty: Intermediate

Completion time: 30 minutes

Knowledge Required:

Knowledge Recommended:

Data Format: JSON Lines (.jsonl)

Libraries Used:

  • pandas to load a preprocessing list
  • csv to load a custom stopwords list
  • gensim to accomplish the topic modeling
  • NLTK to create a stopwords list (if no list is supplied)
  • pyldavis to visualize our topic model

Research Pipeline:

  1. Build a dataset
  2. Create a "Pre-Processing CSV" with Exploring Metadata (Optional)
  3. Create a "Custom Stopwords List" with Creating a Stopwords List (Optional)
  4. Complete the Topic Modeling analysis with this notebook