Constellate
BETA
Learn
Why Learn Text Analysis
Beginner Lessons
Intermediate Lessons
Research
Dataset Builder
Dataset Dashboard
Research Notebooks
Research Community
Data Sources
Dataset Builder File Types
Support for R
Teach
Hosting a Lesson
Finding and Importing Data
Sharing Lessons with Students
TAP Institute
About
What is Constellate
Collections to Analyze
Participate and Rollout
Technology We Use
Participating Institutions
Differences from DfR
(JSTOR'S Data for Research)
Learn
Why Learn Text Analysis
Beginner Lessons
Intermediate Lessons
Research
Dataset Builder
Dataset Dashboard
Research Notebooks
Research Community
Data Sources
Dataset Builder File Types
Support for R
Teach
Hosting a Lesson
Finding and Importing Data
Sharing Lessons with Students
TAP Institute
About
What is Constellate
Collections to Analyze
Participate and Rollout
Technology We Use
Participating Institutions
Differences from DfR
(JSTOR'S Data for Research)
Home
Research Notebooks
Research Notebooks
LDA Topic Modeling
Open this research notebook -> Description: This notebook demonstrates how to do topic modeling. The following processes are described: Using the tdm_client to retrieve a dataset Filtering based on a pre-processed ID list Filtering based on a stop words list Cleaning the tokens in the dataset Creating a
Finding Significant Terms for Research
Open This Research Notebook -> Take me to the Learning Version of this notebook -> Description: Discover the significant words in a corpus using Gensim TF-IDF. The following code is included: Filtering based on a pre-processed ID list Filtering based on a stop words list Token cleaning Computing
Create a Stopwords List for Research
Open this Research Notebook ->.ipynb Take me to the Learning Version of this notebook -> Description: This notebook creates a stopwords list and exports it into a CSV file. The following processes are described: Loading the NLTK stopwords list Modifying the stopwords list in Python Saving a
Exploring Metadata and Pre-Processing for Research
Open this Research Notebook -> Take me to the Learning Version of this notebook -> Description of methods in this notebook: This notebook helps researchers generate a list of IDs and export them into a CSV file. The code below is a starting point for: Importing a CSV
Exploring Word Frequencies for Research
Open this Research Notebook -> Take me to the Learning Version of this notebook -> Description: This notebook finds the word frequencies for a dataset. Optionally, this notebook can take the following inputs: Filtering based on a pre-processed ID list Filtering based on a stop words list Use
Tokenize Text Files with NLTK
Open this Research Notebook -> Description: This notebook takes as input: Plain text files (.txt) in a zipped folder called 'texts' in the data folder Metadata CSV file called 'metadata.csv' in the data folder (optional) and outputs a single JSON-L file containing the unigrams, bigrams, trigrams, full-text, and
Tokenizing Text Files
Open this Research Notebook -> Description: You may have text files and metadata that you want to tokenize into ngrams with Python. This notebook tokenizes This notebook takes as input: Plain text files (.txt) in a folder A metadata CSV file called 'metadata.csv' and outputs a single JSON-L
Join the community
Join our email list for information about new content, lessons, features, and webinars.
Join
Link copied to clipboard.
You've successfully subscribed to Constellate
Great! Next, complete checkout for full access to Constellate
Welcome back! You've successfully signed in.
Success! Your account is fully activated, you now have access to all content.