Chronicling America provides access to historic newspapers digitized under the National Digital Newspaper Program (NDNP). Sponsored by the Library of Congress and the National Endowment for the Humanities (NEH), the NDNP began in 2005 and continues to this day. In anticipation of the NDNP’s 20th year, the Library launched an effort to make the digitized newspaper data more accessible to users by re-processing select newspaper content digitized prior to 2012 to improve its machine-readable text.
Machine-readable text is created by a technology called Optical Character Recognition (OCR). Using the Tesseract Open Source OCR Engine External and custom post-processing scripts, the Library created this new OCR pipeline specifically for NDNP data. More information about the technologies and processes used in this OCR reprocessing effort is coming to this page soon.
For questions, please contact [email protected].
OCR is an automated process that converts the visual image of text into machine-readable text. Computer software can then search the OCR-generated text for words, phrases, numbers, or other characters. Although errors in the process are unavoidable, OCR is still a powerful tool for making text-based items accessible to searching. For example, important concept words often appear more than once within an article. Therefore, if OCR misreads one instance of a keyword in a passage, but correctly reads the second instance, the passage will still be found in a full-text search. OCR technology has advanced significantly since the beginning of the NDNP, thereby leading to this important reprocessing initiative.
NDNP-Open-OCR is an open-source project developed by the Library of Congress for re-processing OCR of NDNP data. More information coming soon.
Newspapers are added to Chronicling America in the form of batches. See Recent Additions to Chronicling America for more info. The following batches have been re-processed to improve the machine-readable / searchable text and are now available on Chronicling America.
Date Reprocessed Batch Added | Contributor | Batch Name | Page Count | Content on Batch |
---|---|---|---|---|
2024-12-09 | LC- Library of Congress, Washington, DC | dlc_alice_ver02 | 5276 | New-York Daily Tribune (sn83030213) 1860-1861 |
2024-12-09 | LC- Library of Congress, Washington, DC | dlc_basic_ver02 | 5177 | New-York Daily Tribune (sn83030213) 1862-1863 |
2024-12-09 | LC- Library of Congress, Washington, DC | dlc_cobol_ver02 | 5166 | New-York Daily Tribune (sn83030213) 1864-1865 |
2024-12-09 | LC- Library of Congress, Washington, DC | dlc_delphi_ver03 | 5035 | New-York Daily Tribune (sn83030213) 1866, New-York Tribune (sn83030214) 1866-1867 |
2024-12-09 | LC- Library of Congress, Washington, DC | dlc_euclid_ver04 | 5332 | New-York Tribune (sn83030214) 1866-1869 |
2024-12-09 | LC- Library of Congress, Washington, DC | dlc_grass_ver02 | 5436 | New-York Tribune (sn83030214) 1872-1873 |
2024-12-09 | LC- Library of Congress, Washington, DC | dlc_hugo_ver02 | 4861 | New-York Tribune (sn83030214) 1874-1875 |
2024-12-09 | LC- Library of Congress, Washington, DC | dlc_inform_ver02 | 5155 | New-York Tribune (sn83030214) 1875-1877 |
2025-01-08 | LC- Library of Congress, Washington, DC | dlc_java_ver02 | 5286 | New-York Tribune (sn83030214) 1877-1879 |
2025-01-08 | LC- Library of Congress, Washington, DC | dlc_lisp_ver02 | 930 | New-York Tribune (sn83030214) 1879 |
2025-01-08 | LC- Library of Congress, Washington, DC | dlc_kite_ver04 | 910 | New-York Tribune (sn83030214) 1877 |
2025-01-08 | LC- Library of Congress, Washington, DC | dlc_airy_ver02 | 6236 | New-York Daily Tribune (sn83030213) 1845, 1852-1854 |
2025-01-08 | LC- Library of Congress, Washington, DC | dlc_buttery_ver02 | 4614 | New-York Daily Tribune (sn83030213) 1847-1850 |
2025-01-08 | LC- Library of Congress, Washington, DC | dlc_crunchy_ver02 | 4340 | New-York Daily Tribune (sn83030213) 1849-1852 |
2025-01-08 | LC- Library of Congress, Washington, DC | dlc_dry_ver02 | 5278 | New-York Daily Tribune (sn83030213) 1854-1856 |
2025-01-08 | LC- Library of Congress, Washington, DC | dlc_eggy_ver02 | 5276 | New-York Daily Tribune (sn83030213) 1852, 1856-1858 |
2025-01-08 | LC- Library of Congress, Washington, DC | dlc_flavory_ver02 | 4182 | New-York Daily Tribune (sn83030213) 1842, 1858-1859 |
2025-01-08 | LC- Library of Congress, Washington, DC | dlc_gritty_ver02 | 6395 | New-York Tribune (sn83030212) 1841-1842, New-York Daily Tribune (sn83030213) 1842-1846 |
Notes: