Skip to Main Content

Datasets at the Library of Congress: A Research Guide

Frequently Asked Questions

Herman Hiller, photographer. One man looks on as another man prepares Univac computer to predict a winning horse. 1959. Library of Congress Prints & Photographs Division.

Do the Selected Datasets focus exclusively on government data?

No. The Library selects, preserves, and provides access to data from a wide range of sources, including the U.S. government, nonprofits, research institutions, and commercial entities. If you are interested in finding more sources of U.S. government data, you might consider searching or browsing through the Dataset Repositories section of this guide.

Are all of the Selected Datasets available for download?

Many of the datasets are available for immediate download, but there are scenarios that keep the Library from providing download access. There are several reasons why a particular dataset may not be downloadable from, including:

  • License restrictions.
  • Content is still being processed so that it can be made available online (this includes creating disk images of external media containing datasets).
  • Personally identifiable information has been located and must be removed.
  • Prohibitive size restrictions

Does the Library offer any helpful resources for individuals without a data science background?

Yes! This guide is a useful resource for people new to computational analysis and data-driven research. Additionally, the LC Labs team is dedicated to encouraging innovative use of the Library's digital collection materials. Check out their LC for Robots page for APIs, datasets, tutorials, and example projects that can help familiarize you with the basics of performing computational analysis on datasets.

You may also wish to read more about LC Labs’ collaboration with Peter DeCraene, the 2020-2022 Library of Congress Albert Einstein Distinguished Educator Fellow. A two-part series on the Signal Blog explores the question of approaching datasets as primary sources for K-12 classroom use. Part I examines transcription data created by By the People volunteers. Part 2 gives additional examples for how Library of Congress data could be incorporated by teachers. They also wrote a post describing what it was like to evaluate datasets for classroom use and create a derivative dataset that was more manageable for students.

Can the Selected Datasets serve as a repository for my active research data?

No. The Library selectively acquires fixed data output. Essentially, this means that a dataset in the Library's collection will not be modified, deleted, or replaced once a newer version is acquired. All previous versions will remain in the digital repository alongside the most up-to-date copy.

That said, you are welcome to contact a relevant reference librarian to discuss the possibility of adding your data to the collection.

I would like to recommend a dataset for the Library of Congress to acquire as part of its collection. How can I do this?

The primary method for recommending a dataset for selection is to reach out via Ask a Librarian, an online reference service. The service will help put you in touch with a reference librarian in the appropriate Library division (e.g., Science, Technology & Business, Music, Geography & Map). The dataset will be assessed for potential inclusion and, if the Library decides to acquire the content, the reference librarian will collaborate on the acquisitions process with staff from the Digital Content Management section and the Acquisitions and Bibliographic Access Directorate.

Is the Library of Congress interested in acquiring derivative datasets created by users?

Possibly! Feel free to contact a reference librarian through Ask a Librarian that works in the related Library division. They will be able to discuss the matter in greater depth.

Does the Library of Congress collect every version of a dataset that is included in its collection?

Not necessarily. The Digital Content Management section works with reference librarians to determine the frequency with which the Library will collect new versions of a dataset. In some instances, the Library will collect every version of a dataset that is produced and made available; however, in many situations it is simply not feasible for the Library to collect at this granular level, particularly if the activity results in significant amounts of duplicated content.

If you have additional questions concerning the Library’s acquisition of datasets, contact [email protected]. If you would like to recommend a dataset, please submit an Ask-A-Librarian request and it will be forwarded to the appropriate Recommending Officer based on the subject or format. This guide is one of several research guides produced by the Business Reference Services.