Research Notebooks

LDA Topic Modeling

Open this research notebook -> Description: This notebook demonstrates how to do topic modeling. The following processes are described: Using the tdm_client to retrieve a dataset Filtering based on a pre-processed ID list Filtering based on a stop words list Cleaning the tokens in the dataset Creating a

Finding Significant Terms for Research

Open This Research Notebook -> Take me to the Learning Version of this notebook -> Description: Discover the significant words in a corpus using Gensim TF-IDF. The following code is included: Filtering based on a pre-processed ID list Filtering based on a stop words list Token cleaning Computing

Create a Stopwords List for Research

Open this Research Notebook ->.ipynb Take me to the Learning Version of this notebook -> Description: This notebook creates a stopwords list and exports it into a CSV file. The following processes are described: Loading the NLTK stopwords list Modifying the stopwords list in Python Saving a

Exploring Metadata and Pre-Processing for Research

Open this Research Notebook -> Take me to the Learning Version of this notebook -> Description of methods in this notebook: This notebook helps researchers generate a list of IDs and export them into a CSV file. The code below is a starting point for: Importing a CSV

Exploring Word Frequencies for Research

Open this Research Notebook -> Take me to the Learning Version of this notebook -> Description: This notebook finds the word frequencies for a dataset. Optionally, this notebook can take the following inputs: Filtering based on a pre-processed ID list Filtering based on a stop words list Use

Tokenize Text Files with NLTK

Open this Research Notebook -> Description: This notebook takes as input: Plain text files (.txt) in a zipped folder called 'texts' in the data folder Metadata CSV file called 'metadata.csv' in the data folder (optional) and outputs a single JSON-L file containing the unigrams, bigrams, trigrams, full-text, and

Tokenizing Text Files

Open this Research Notebook -> Description: You may have text files and metadata that you want to tokenize into ngrams with Python. This notebook tokenizes This notebook takes as input: Plain text files (.txt) in a folder A metadata CSV file called 'metadata.csv' and outputs a single JSON-L

Join the community

Join our email list for information about new content, lessons, features, and webinars.

You've successfully subscribed to Constellate
Great! Next, complete checkout for full access to Constellate
Welcome back! You've successfully signed in.
Success! Your account is fully activated, you now have access to all content.