Skip to Main Content

BIBFRAME: A Manual for Understanding Version 2.0 and Related Tools

BIBFRAME and Linked Data

BIBFRAME is a linked data project that seeks to lower barriers to accessing library data, partly by adopting contemporary data practices, but more by fostering an environment that is not just on the World Wide Web but part of the World Wide Web. Library bibliographic data is built upon a solid infrastructure of authoritative names and subjects. It is reliable, consistent, and “clean,” thanks to its use of regulated standards. But it is encased in a data format that is not easily understood or easily deployed by non-library professionals.

With BIBFRAME and linked data, the library community has an opportunity to make its controlled and well-crafted bibliographic data accessible to a global audience. Wider accessibility of a library’s bibliographic data makes the library’s resources and holdings known and available to “outsiders.” If one of those outsiders, for example, is Google, then exposing library bibliographic data in this way can translate into more relevant search results for users, and more patrons utilizing library collections.

What is Linked Data?

The Web supports linked, related documents. It also allows for linking related data and stating the relationship amongst resources. The term Linked Data refers to a set of best practices for publishing and connecting structured data on the Web. Key technologies that support linked data include the following:

  • Uniform Resource Identifiers (URIs) – a generic means to identify entities or concepts in the world
  • Hypertext Transfer Protocol (HTTP) – a simple yet universal mechanism for retrieving resources - descriptions of resources
  • Resource Description Framework (RDF) – a generic graph-based data model with which to structure and link data that describes things in the world.1

Using Anglo-American Cataloging Rules (AACR2), Resource Description & Access (RDA), and MARC 21 for the creation of authority and bibliographic records in library environments results in “flat” records that live in silos of data and are not integrated with the Web. By transitioning from a static two-dimensional collocated record to decentralized data with links that illuminate relationships, linked data potentially increases the visibility and usage of library data on the Web. Integrating library data with the large number of structured data sources and links on the Web thus potentially enhances the sharing of library data with a wider audience. Moreover, linked data allows for a fuller implementation of RDA.

Linked data is integral to the Semantic Web, a collaborative effort led by the World Wide Web Consortium (W3C) to provide a framework that allows data to be shared and reused across application, enterprise, and community boundaries.2

What is a Web of Data?

The semantic "web of data" provides a structure that allows machines to return information about the relationships between resources; it makes use of the existing http protocol and common linked data standards such as RDF to provide the semantic structure. The traditional web of documents is characterized by a flat web of links between documents and files posted on the web.

Web of Documents Web of Data
information resources “real-world objects”
links between documents links between things
unstructured data structured data
implicit semantics explicit semantics
for human consumption for humans and machines

A "web of data" uses a set of best practices for publishing and linking structured data on the Web with technologies that are more generic, more flexible, and which make it easier for data consumers to discover and integrate data from a large number of sources and links.

Resource Description Framework (RDF)

RDF is the standard model for exchange of data on the Web. RDF structures relationships between resources, people, and things on the web, and uses a graph model to represent the relationships. RDF and related standards are maintained by the World Wide Web Consortium (W3C).

The RDF data model consists of:

  • Triple statements (informally called “triples”)
  • URIs and IRIs
  • Ontologies and vocabularies

Triples

RDF uses triples to make systematized statements about semantic data. The subject, predicate, and object are the basis of the triple statement, and can be modeled using graph data. Graph data is used for the semantic web, and represents the relationships between resources, books, people, etc. in a way that computers can process the information.

This is a graph data model of the triple statement "This work was written by this author."

Subjects, predicates, and objects can all be identified by URIs and Internationalized Resource Identifiers (IRIs). In RDF, URIs and IRIs retrieve content to be read by humans and machines via content negotiation, the use of redirects, or the minting of hash tag identifiers. Humans can get a Hypertext Markup Language (HTML) page to read, and machines can retrieve an RDF Extensible Markup Language (XML) file upon which they can interpret and act.

Uniform Resources Identifiers (URIs) and Internationalized Resource Identifiers (IRIs)

On the traditional Web, URIs are used primarily for Web documents -- to link to them, and to access them in a browser. The notion of resource identity was not so important on the traditional Web; a URL simply identified whatever we see when we type it into a browser. On the Semantic Web, URIs identify not just Web documents, but also real-world objects like people and cars, and even abstract ideas and non-existing things like a mythical unicorn.

The IRI was defined by the Internet Engineering Task Force (IETF) in 2005 as a new internet standard to extend upon the existing URI scheme. While URIs are limited to a subset of the ASCII character set, IRIs may contain characters from the Universal Character Set (Unicode/ISO 10646), including Chinese or Japanese kanji, Korean, Cyrillic characters, and so forth. IRIs are defined by RFC 3987.

Triple Statements and URIs/IRIs

The subject of a triple is the URI identifying the described resource. The object can either be a simple literal value, like a string, number, or date; or the URI of another resource that is somehow related to the subject. The predicate, in the middle, indicates what kind of relation exists between subject and object, e.g., this is the name or date of birth (in the case of a literal), or the employer or someone the person knows (in the case of another resource). The predicate is also identified by a URI. These predicate URIs come from vocabularies, collections of URIs that can be used to represent information about a certain domain.

A blank node is a resource without a URI. IRIs and literals together provide the basic material for writing down RDF statements. In addition, it is sometimes handy to be able to talk about resources without bothering to use a global identifier.

There are multiple ways of creating a URI. The Library of Congress typically works through ID.LOC.GOV, the Library of Congress Linked Data Service, where a base is defined for any given dataset. ID.LOC.GOV will be explored in further detail in Unit 4 of this manual.  

Vocabularies and Ontologies

Vocabularies and ontologies allow us to add meaning and relationship information in triple statements, and are in standard formats so that computers can process the meaningful relationships and serve meaningful search results to humans. Vocabularies and ontologies are the basic building blocks for inference techniques on the Semantic Web. Ontologies are a means of organizing and conceptualizing a domain of interest, and tend to be used for more complex collections of terms. Vocabularies are used when such complexity is not necessary. Different institutions develop unique vocabularies and your BIBFRAME use will comply with local norms and guidelines.

Notes

  1. Heath, Tom and Bizer, Christian. “Linked Data: Evolving the Web into a Global Data Space." Accessed April 19, 2021. http://linkeddatabook.com/editions/1.0/. Back to text.
  2. Ibid. Back to text.