In the winter of 2004, the European Division of the Library of Congress loaded onto a dedicated terminal in its reading room more than 580 CDs containing the entire catalog of the Comintern archives and more than a million pages of digitized documents. This was the culmination of a multinational effort stretching over several years.
In addition to a significant financial contribution as a partner institution, the Library of Congress invested many staff hours to help bring the INCOMKA project to fruition. John Van Oudenaren, chief of the European Division, attended several INCOMKA planning and coordination conferences in Europe and hosted a two-day meeting on linguistic issues in February of 2001. John E. Haynes of the Manuscripts Division also attended meetings, provided expertise in the selection of materials for digitization, served as liaison with 154 historians in 54 countries, and reviewed the lists of Romanized personal names and English translations of keywords.
The Library of Congress assumed responsibility for converting some 175,000 personal names from Russian Cyrillic to the Latin alphabet and translating from Russian to English close to 20,000 "descriptors" (keywords/subject headings) taken from Comintern archival finding aids. I was given the task of coordinating this linguistic effort. The goal was to make the INCOMKA database accessible to researchers with little or no Russian capability. The Library of Congress was well suited for this undertaking because of its vast collections of historical, biographical, and lexicographic works and the foreign-language diversity of its staff. Thirty-two staff members, most from the Area Studies Divisions, participated in the project.
In the summer of 2000, RGASPI sent to the Library of Congress a list of about 110,000 names taken from personal files (lichnye dela) maintained by the Comintern and, after 1943, the International Department of the CPSU. All names were in Russian Cyrillic as recorded over a period of many decades by clerks with highly divergent levels of foreign-language competence. Our task was to convert the Cyrillic version of the names to their "standard" American-English spelling.
The Comintern had files on persons from essentially all the countries of the world as it existed during the inter-war years, including some, like Tannu Tuva, that no longer exist. Many of the persons listed, such as Palmiro Togliatti, were prominent party members, while others were staunch anti-communists, e.g., Harry Truman. There were files on writers, painters, actors, civic leaders, and religious leaders. There even was a file on Karol Wojtyla, Pope John Paul II. But a large share of the persons were unknown functionaries, whose names could not be attested in published sources. The Library of Congress did not have access to the files themselves, which might have provided a Latin spelling for some of the names.
The name-conversion process involved four stages. First, using a computer macro devised by our European Division colleague Michael Neubert, we produced a phonetically based transliteration from Russian according to Library of Congress Romanization rules. We arranged the names into more than 100 country tables and distributed them to Library of Congress staff with native or near-native competence in given languages. The German, French, and Swiss INCOMKA partners handled the name conversions for their respective countries. Because of the special problems posed by the Chinese names, Dr. Haynes hired the services of historians at the State Archives Administration of China to identify persons and provide the standard Pinyin transliteration of their names. The Chinese experts, however, were unable to recognize close to half of the entries.
In the second stage, Library of Congress linguists analyzed the computer transliterations and converted sequences of letters into meaningful, language-specific combinations. For example, from the Mexican list, KHUAN became Juan; from the Polish list, IATSEK became Jacek; from the Moroccan list, KHADZH became Haj. Library staff attempted to identify individuals and provide the standard American-English spelling of their names, e.g., DZHON RID was identified as John Reed, as opposed to John Read or John Reid. Identifying individuals turned out to be an especially daunting task in the case of the tens of thousands of names originally written in neither Cyrillic nor Latin alphabets. To proceed from the Library of Congress phonetic transcription of a Russian phonetic transcription of a name originally recorded in a third writing system and arrive at the "correct" spelling in American usage was a challenge we were not always able to meet.
In the third stage, John Haynes sent the lists to foremost authorities in the histories of the respective countries for "vetting." In most cases, two or three specialists reviewed each list. The lists were in the form of multi-column tables presenting the original Cyrillic, the computer transliterations, and the spellings produced by the Library staff. The specialists confirmed or corrected the spellings, sometimes adding aliases, and returned the lists to the Library of Congress.
The final step was incorporating the experts' inputs and delivering the finished tables to the Spanish software company El Corte Inglés, which loaded the information into the special version of ArchiDOC it developed for the INCOMKA database.
Just as we were nearing the completion of the name-conversion project, we received a revised list from RGASPI, which created a whole new set of complications. The revised list included tens of thousands of additional names extracted from collective personal files and from the detailed finding aids compiled by RGASPI archivists. Most unfortunately, the revised list merged the additional entries with the original set of 110,000 names. El Corte Inglés, after considerable effort, eventually sorted out most of the new names for us. We completed the name conversion in house, but time did not allow us to send the additional names to outside experts for correction. Meanwhile, RGASPI sent us several long lists of Russian "descriptors" taken from the Comintern archival opisi. Harold Leich, Russian area specialist in the European Division, and I translated these terms into English, and John Haynes edited them.