Skip to Main Content

Datasets at the Library of Congress: A Research Guide

Data Analysis Tools

Marion S. Trikosko, photographer. Prince George County Jr. College "Data Processing." 1969. Library of Congress Prints & Photographs Division.

This list includes a selection of important data analysis tools that are broadly useful and accessible to data analysis beginners. Certain disciplines have preferred tools, platforms, metadata schema, and more, so it is important to review which resources will be important in a given context.

Note: The Library does not currently provide access to these tools on reading room terminals. Researchers will need to install tools on their own computer in order to compute against datasets downloaded from the Library.

Programming Languages / Software Packages

  •  Python External – is a popular open source, object-oriented programming language. Several packages can be installed to simplify data analysis tasks, such as scikit-learn for machine learning and Statsmodels for statistical modeling.

     

    •  R External – R is a procedural programming language and freely available software tool that is used for computational statistics and data visualization. The R community has developed several open-source packages that are useful for both general and domain-specific data analysis procedures.

    Data Processing Tools

    • OpenRefine External – OpenRefine is a free, open source tool that assists with data cleaning and transformation (aka data wrangling).

    Data Management Planning / Project Management

    • DMPTool &ndash External The DMPTool (Data Management Plan Tool) assists with creating a data management plan, according to funder specifications or generalized guidelines. A data management plan documents all activity surrounding the collection and preservation of research data.
    • Open Science Framework External (OSF) – The OSF is an open platform developed to facilitate collaborative research projects. OSF storage allows users to reliably store and share data throughout the entire project lifecycle.Registry of Open Data on Amazon Web Services (AWS) AWS Registry External - This leading cloud platform makes it easy to find datasets made publically available through this service.