Description of methods in this notebook:
This notebook shows how to explore and pre-process the metadata of a dataset using Pandas.

The following processes are described:

  • Importing a CSV file containing the metadata for a given dataset ID
  • Creating a Pandas dataframe to view the metadata
  • Pre-processing your dataset by filtering out unwanted texts
  • Exporting a list of relevant IDs to a CSV file
  • Visualizing the metadata of your pre-processed dataset by the number of documents/year and pages/year

Use Case: For Learners (Detailed explanation, not ideal for researchers)

Difficulty: Intermediate

Completion time: 45 minutes

Knowledge Required:

Knowledge Recommended:

Data Format: CSV file

Libraries Used:

Research Pipeline: None