MarcOnt/MarcOntMediationServices

From Corrib Clan Wiki

Jump to: navigation, search

Contents

MarcOnt Initiative - Mediation Services for Digital Libraries

Sebastian Kruk and Marcin Synak and Kerstin Zimmermann

Abstract

The Semantic Web effort, which partially originated from the digital library community (Dublin Core), is providing technology such as ontologies that can be potentially applied to the problem of managing resources. The goal of the MarcOnt Initiative is to create a new bibliographic description standard in the form of an ontology and related tools utilising semantic technologies. In this paper, we present the first version of the MarcOnt ontology and mediation services to support different legacy bibliographic formats.

Introduction

A digital library using semantic technologies can provide more accurate searching, better support for L2L (library-to-library) communication and many other services. The first step is to move existing information about resources to semantic descriptions compatible with one common ontology. Unfortunately, it is very hard to convince potential users to make the effort of creating semantic descriptions of resources such as books. But such a mundane task can be automatised, because the required information already exists and is stored in one of the popular legacy bibliographic formats such as MARC21 or BibTeX. The MarcOnt Initiative (http://www.marcont.org/) aims to create an ontology for bibliographic purposes with the aid of active participants from the librarian community, and to provide tools for simplifying the creation of semantic descriptions using existing information.

MarcOnt Ontology

The MarcOnt ontology provides no less description capacity than the MARC21 format (hence the name: Marc + Ont). However, it does not reuse the MARC21 structure, but rather creates one suitable for preserving semantic information. At the moment, we have a first draft of the MarcOnt ontology which provides the most commonly required functionality. Further evolution of the ontology will require a more active role from the librarian community.

Managing multiple bibliographic formats.

Currently, there are a number of bibliographic description formats used in librarian systems. The most popular formats include MARC21 and its derivatives, Dublin Core (DC) and BibTeX. The ontology is an ideal tool for describing bibliographic resources. Creating a network of concepts (or 'classes') with appropriate properties and constraints allows us to enclose information about this particular part of our world in an efficient manner. The ontology prepares the captured descriptions for further processing.

Unfortunately, capturing existing descriptions in, for example, the popular MARC21 format is no straightforward task. Information often cannot be mapped one-to-one, for example in MARC21 the publication title can be in few places as there are different forms of title (abbreviated, alternative, etc.). But we can use the ontology as a bridge between different bibliographic formats.

MarcOnt's mediation services architecture.

 Marcont mediation services architecture
Enlarge
Marcont mediation services architecture

The MarcOnt mediation service

The heart of the architecture is the RDF storage system for semantic descriptions. Input and output adapters are implemented using the Sesame inferencing mechanism. On one side of the adapters, we have semantic descriptions and on the other, bibliographic descriptions of data converted to an RDF model. Converting different description formats to the RDF model depends on the original file format. MARC21 records are stored in binary files, so converting requires first parsing them to MARC-XML format and then transforming the data to an RDF model using XSLT. There is no common RDF representation of MARC21 data in RDF. We use our own format that is a simple translation of taxonomy and has no semantic value. Mapping BibTeX to and from MarcOnt requires similar actions. Dublin Core presents a much simpler task, because DC is already in RDF.

Supporting other formats is a matter of developing appropriate adapters and transforms or parsers. Currently we are developing adapters for MARC21. This enormous task requires writing hundreds of inferencing rules to translate to appropriate MarcOnt individuals and properties with as little information loss as possible. Accomplishing this requires tools for simple rule creation. These tools will be part of MarcOnt portal being created under the WMap project (http://wmap.marcont.org/).



Further reading:

 * Benjelloun, Omar; Garcia-Molina, Hector; Su, Qi; Widom, Jennifer: 	Swoosh: A Generic Approach to Entity Resolution
Facts about MarcOnt/MarcOntMediationServices — Click + to find similar pages.RDF feed
Personal tools

Corrib cluster project is supported by Enterprise Ireland under Grant No. ILP/05/203, Science Foundation Ireland under Grant No. SFI/02/CE1/I131.
Hosted at DERI, NUI Galway.