Dataset Builder File Types
CSV vs. JSON Lines Files The dataset builder creates two files: A CSV file containing only metadata A JSON Lines file containing metadata and the textual data The textual data includes: Unigrams Bigrams Trigrams Full Text (where available) The metadata may include: Column Name Description id a unique item ID
Pre-Built Datasets from the Builder
Archaeology American Journal of Archaeology (1897-2020) 02b8c5c7-64bd-efe3-01d8-88c9efe7d17c Classics Classical Quarterly (1907-2014) 82014740-8ed9-3c34-5716-d0879b8317f6 English Negro American Literature Forum (1967-1976) + Black American Literature Forum (1976-1991) + African American Review (1992-2016) b4668c50-a970-c4d7-eb2c-bb6d04313542 Shakespeare Quarterly (1950-2013) f6ae29d4-3a70-36ee-d601-20a8c0311273 ELH (1934-2014) 4999901a-fa17-31da-cfe5-2abf3a429df7 College English (1939-2016) a161f384-720b-b6bf-a0cc-4d7d3b857e1c PMLA (1889-2014) 1aea53b9-26d5-fe54-e35c-8259156ce6cd History Past & Present (1952-2014) 5e117960-e384-b705-b143-5a667fe614f0 English Historical
Can I download a dataset I created in your builder?
Download a dataset created in the builder You can download your full JSON-L dataset from the corpus builder in the link shown below. Download a dataset from The Jupyter Notebook If you have used the tdm_client to pull in a dataset, you can also download it directly from the
Working with Dataset Files
Description: This notebook describes how to: Read and write files (.txt, .csv, .json) Use the tdm_client to read in metadata Use the tdm_client to read in data This notebook describes how to read and write text, CSV, and JSON files using Python. Additionally, it explains how the tdm_