The Library of Congress acquires, preserves, and provides enduring access to fixed datasets selected by subject experts. Datasets provide material for the emergent data science community to build upon, and the Library strives to cultivate a broad collection that is of use to researchers interested in a variety of topics, including open Citizen Science, machine learning, digital humanities, and government. The Library prioritizes datasets that are determined to qualify as at-risk born-digital content to preserve along with more traditional content.
This guide provides information about the collection of datasets at the Library of Congress, suggests tools for researchers, considers how datasets can be used for research, and provides guidance for locating datasets that may be sources for data science and machine learning projects. It is not intended to be comprehensive; rather, the goal of this guide is to provide credible starting points.