The Comintern database delivered to INCOMKA partners in winter 2004 uses ArchiDOC 18.104.22.168 Unicode software. Although searching is rather cumbersome, the software accomplishes its primary purpose -- enabling scholars to search the large database in either Cyrillic or Latin alphabet, identify files of interest, retrieve bibliographic information, and in some cases, view digital images of actual documents. The software provides excellent image-enhancement tools, and readers often will find the digitized documents more legible than the originals. Researchers can print pages of manuscripts for their own use, and the software inserts the bibliographic citation at the bottom of each page -- a very useful feature.
The database homepage presents four menu choices: CLASSIFICATION, DESCRIPTORS [key words], PHYSICAL FOND [bibliographic citations, listed in ascending numerical order], and LANGUAGES.
CLASSIFICATION organizes the vast Comintern archive into 11 thematic sections, shown below. All researchers who can read Russian, especially first-time users of this resource, will profit from a quick look at the 11 sections. It would enhance database accessibility if the sections and subsections were presented in English as well as Russian, but to make searching via CLASSIFICATION truly bilingual, at least the titles of the 521 inventories (opisi) should be translated.
Clicking on the plus symbol before each CLASSIFICATION heading opens a list of subheadings, which, in turn, open sub-subheadings, then opisi, then specific files. Wherever a camera icon appears, a double click will bring up the digital image of an actual document. The CLASSIFICATION sections in English are:
Section 1 might serve to illustrate the logical structure of CLASSIFICATION. Under the heading "Comintern Congresses and Plenary Sessions of the Comintern Executive Committee" there are three subheadings. The first, entitled "Congresses of the Comintern," lists seven Fondy, namely:
Each of the Fondy lists two or more opisi, which in turn list specific dela. For example, Fond 488, opis' 1 lists 18 dela, the first being "Address on Convening the First Congress of the Comintern, 24 January 1919." It happens that this 20-page file was among those digitized, and the researcher can view the document on the screen or print it off. Fond 488, opis' 2 does not subdivide into dela; it is a collection of 76 photographs, which were not digitized. All together, the subheading "Congresses of the Comintern" comprises 15 opisi, and 2,618 dela. Subsection 2, entitled "Plenums of the Executive Committee of the Communist International," comprises 14 widely scattered opisi of Fond 495 and totals 3,130 dela. Subsection 3, "International Control Commission of the Comintern, ICC," contains two opisi totaling 216 dela. Each delo contains, on average, several dozen pages and occasionally can be hundreds of pages in length.
CLASSIFICATION provides the most direct path to bibliographic records: a double left click on a title at any level in the CLASSIFICATION hierarchy brings up the bibliographic record in the opposite window of the split-screen display. Bibliographic information becomes more detailed as one proceeds from the section to the delo level. On the section and subsection levels, the record provides a title, the name of the RGASPI archivist who processed the material, and a brief contents note, which generally duplicates the title but occasionally provides a little more information. For example, Section 9, entitled "Interbrigades of the Spanish Republican Army," has the contents note "International formations and brigades of the Spanish Republican Army." On the opis' level, the record additionally provides an information start date and shows the number of dela contained therein. On the delo level, the record contains a contents note, (typically a short paragraph in length) and information start and end dates; indicates the number of pages and the languages of the documents; and provides a list of "descriptors." The list of descriptors can be several pages in length.
Comintern database users with poor or no Russian capability are at a severe disadvantage. Although the bibliographic field names can be displayed in English, the contents note is in Russian only. And since it is the contents note (not the actual text of the digitized documents) that is searched by the so-called "text-search" function, persons not knowing Russian do not have access to this useful tool. The researcher without Russian capability who does not have a specific citation to access directly through the third menu option, PHYSICAL FOND, has only one way to identify and retrieve files, the second menu option, DESCRIPTORS.
Based on a review of Comintern archival documents, RGASPI staff identified essential terms and grouped them in ten categories of "descriptors." The lists of descriptors can be viewed in either Russian or English. Several of the categories are imprecisely delineated and often overlap. The SUBJECTS list is particularly fuzzy, and many terms are so generic it seems doubtful that a researcher would ever think of searching for them, e.g., Domestic and international situation. Terms within a list are not arranged in any hierarchy of specificity, i.e., there are neither general headings nor increasingly specific subheadings. The distinction between the categories of SOCIAL LEVEL and STATUS is especially vague. The apparent difference seems to be that terms in SOCIAL LEVEL are in the plural form, e.g., graduate students, and entries in STATUS are either singular, e.g., architect, or corporate bodies, e.g., Bulgarian delegation.
The descriptor category labels in Russian are not very "descriptive," and their English equivalents (supplied by RGASPI) are even more mysterious. The fuzziness of categories is not a minor inconvenience. Since the software cannot search all descriptor lists simultaneously, the researcher must explore each one separately to have an acceptable level of confidence in the search results. The only apparent advantage of breaking the descriptors up into separate thematic lists is browsability. The inconvenience of having to search several lists should be addressed in future versions of the Comintern database.
Descriptors are listed alphabetically, which facilitates browsing. To move rapidly down the long lists (with more than 175,000 entries, the personal names list is the longest), the researcher highlights any descriptor and begins typing a word or phrase -- as much or as little as desired -- and hits Enter. Within an instant, the desired descriptor (or the space where it should appear in the alphabetized list) appears. This "hot search" function is a great time-saver, but it has one major limitation: it is left-anchored. The software does not offer a simple "find in document" function, which would locate any term regardless of its position within a descriptor. The ten descriptor categories (Russian equivalents are in parentheses) are:
Having located and highlighted the desired descriptor, the researcher right clicks to find the option "Show related documents." Left clicking on this option will bring up a list of all dela that include the descriptor in their bibliographic records. Double left clicking on a delo title will display the complete bibliographic record. If the delo was digitized, a camera icon appears before the title. Double clicking on the icon will bring up the document image.
If "Show related documents" displays too many (or too few) dela, the researcher can search two or more descriptors simultaneously. This is accomplished by selecting the Search Assistant option on the toolbar, which displays a search form on the opposite half of the split screen. The researcher left clicks on the descriptors (one at a time), and drags them across the screen into the bottom window of the search form. Selecting the option "All of them" activates the Boolean AND operator, while "Some of them" engages the OR operator. The search form also allows one to specify a range of dates or a specific date.
The third option on the database menu is labeled PHYSICAL FOND. For researchers who already have specific archival citations, this is the quickest path to the files. Citations are arranged in ascending numerical order, beginning with the Fond number. As with other sections of the menu, one expands or collapses lists by clicking on the plus sign at the beginning of each entry. Part four of the database menu is LANGUAGE. This alphabetic list of all the document languages functions in the same way as a descriptor for searching purposes. A researcher who can read only Norwegian, for example, would click on "Norwegian" and drag the term over to the search window to limit search results to documents written in that language.
Researchers who can work with Russian have the option of keying terms directly into a search window and, by toggling to the Latin keyboard, using truncation (the percentage sign) and Boolean operators AND, OR, NOT. ArchiDOC misleadingly calls this function text searching. What the software is searching are file titles and/or file abstracts only a few lines in length. Considering that files often are hundreds of pages long and consist of dozens of documents, the odds are high that a search will show no hits. And even when a search is successful, i.e., one or more files are identified, the researcher's work has just begun. He/she still must page through the file to identify the manuscript(s) where the search terms occur.
The INCOMKA project, one of the most ambitious international archival digitization efforts to date, has achieved many, but not all, of its goals. In the near future the database will be available free of charge to researchers through the World-Wide Web, although access to actual digital images will require a paid subscription. We hope the online version will be more user-friendly than the version delivered to INCOMKA partners on CDs. A "search this site" window that would allow users to search simultaneously all ten categories of "descriptors" and to find individual terms regardless of position within a "descriptor" is a badly needed improvement.
The conversion of about 175,000 personal names to Latin alphabet and the translation of almost 20,000 descriptors into English were a major undertaking. On the whole, we are pleased with the results. Nevertheless, researchers are certain to discover errors, and we hope the online version will enable them to send corrections and suggestions to the database administrators for updating. Researchers also will soon discover that none of the personal files have been, and probably never will be, digitized -- ostensibly for privacy considerations. But, thanks to the INCOMKA project, researchers now can go to RGASPI with specific citations in hand and request the personal files.
The INCOMKA project has made the Comintern archives more accessible, to be sure. But researchers who cannot read Russian still face special challenges and have fewer search strategies than those who can work with Russian. The CLASSIFICATION menu option is available only to Russian researchers. In an ideal world, the titles of the 521 opisi and 230,000 files would be translated into English. At the Library of Congress, we have observed that nearly half of the Comintern database users have had limited or no Russian capability. It bears emphasizing that a large share, perhaps more than half, of the actual documents in the Comintern archives are in languages other than Russian, and that German, French, Spanish, and English account for most of these. These shortcomings notwithstanding, the research community should welcome the Comintern database enthusiastically.