Skip to main content

Datasets at the Library of Congress: A Research Guide

Datasets are a structured collections of data generally associated with a unique body of work. This guide provides information about various dataset collections, and suggests sites and resources for data science and machine learning projects.

Introduction

The Library of Congress acquires, preserves, and provides enduring access to fixed datasets selected by subject experts. Datasets provide material for the emergent data science community to build upon, and the Library strives to cultivate a broad collection that is of use to researchers interested in a variety of topics, including open Citizen Science, machine learning, digital humanities, and government. The Library prioritizes datasets that are determined to qualify as at-risk born-digital content to preserve along with more traditional content.

This guide provides information about the collection of datasets at the Library of Congress, suggests tools for researchers, considers how datasets can be used for research, and provides guidance for locating datasets that may be sources for data science and machine learning projects. It is not intended to be comprehensive; rather, the goal of this guide is to provide credible starting points.

If you have additional questions concerning the Library’s acquisition of datasets, contact digacq@loc.gov. If you would like to recommend a dataset, please submit an Ask-A-Librarian request and it will be forwarded to the appropriate Recommending Officer based on the subject or format. This guide is one of several research guides produced by the Business Reference Services.