JeromeDL/CommunicationProtocols

From Corrib Clan Wiki

Jump to: navigation, search

Contents

Communication protocols in Digital Libraries

author: Mariusz Cygan


Abstract

Internet is a huge source of information. Unfortunately, desired data is not always easy to find. Sometimes you have to spend hours searching. What is more, very often the quality of this information is low.

Dedicated digital libraries can be the solution to all these problems. Using new technologies, like XML or Web Services, digital libraries make searching faster and more accurate. They also possess information, which are well organized and certificated.

As more and more digital libraries are created, problem with connecting them appears. Dispersion, different ontologies, different metadata formats, all this make communication harder. To solve those issues special communication protocols were developed and introduced.

In this article I present a few communication protocols in digital libraries. I provide their descriptions, main features and possibilities of connecting. Finally I envision how the communication protocols of heterogeneous libraries can resolve the communication problems.


Introduction

Digital libraries possess all the merits of normal libraries. Informations are usually well organized and reliable. Furthermore they make searching process faster and more accurate. It would be a good idea to connect digital libraries together, so that a user could obtain results from many sources. That is why, special communication protocols were developed and introduced.

Different digital libraries can use different metadata standards. Different metadata standards implies different communication protocols. We observe that communication protocols are being developed simultaneously to digital libraries.

In this paper I present three protocols: DIENST, OAI-PMH and ELP. They are only a small group but they represent some of the basic ideas of communication.


DIENST

Overview

DIENST is a protocol for communication with distributed digital library servers. It's architecture is build on individually defined services that when combined together create a distributed digital library. The functionality of Dienst includes storage and access to resources (digital objects), deposit of new resources, discovery and browsing of those resources, and user registration. Communication with and among individual Dienst services takes place via an open protocol. The basic services defined in the protocol:

  • Repository Service - stores digital documents, each with unique name; supports multiple versions and different components
  • Index Service - serves queries
  • Query Mediator Service - dispatches queries to appropriate index servers
  • Info Service - return information about the state of a server
  • Collection Service - provides information about services interaction
  • Registry Service - stores information about users


Main features
Verbs and Versions

Individual Dienst protocol requests are called Verbs. Each service supports a set of verbs. A service may support more than one version of a verb. Versions allow backward-compatibility. Information on the services, verbs and versions is available through Info Service.


HTTP embedding

The Dienst is built on the framework of the HTTP. All protocol messages are embedded within HTTP. The response can be provided in one of the mime types: text/plain, text/html or text/xml. Two main advantages are: easy and free access via Web browser and continuous development of the Web technologies.


Request and Response

Dienst protocol requests are expressed as URLs embedded in HTTP requests. The path portion of the URL consists the folowing information: token 'Dienst', service name, version, verb, fixed arguments and keyword arguments. If the Repository service implements the Shred verb in version 1.2 and that verb accepts some optional keyword arguments, then the URL for thist request might be:

http://bar.com/Dienst/Repository/1.2/Shred?delay=9&amperage=7.4. 

Responses to requests are formatted as HTTP responses. The response can be provided as: raw text, html or xml.

  • text/plain is used for responses that contain unstructured information
  • text/xml is used for responses that contain structured information
  • text/html is used to response to suer interaface


Dates

All dates in the protocol are encoded using the format CCYY-MM-DD. e.g.: 2004-04-01. CC stands for century, YY for year, MM is the month and DD is the day.


OAI-PMH

Overview

The Open Archives Initiative Protocol for Metadata Harvesting provides an application-independent interoperability framework based on metadata harvesting. The OAI-PMH defines two classes of participants:

  • Data Providers - provide free access to metadata and may provide free access to full texts or other resources
  • Service Providers - harvesting and storing metadata


Image:CommunicationProtocols.jpg


Main features
HTTP embedding

The OAI-PMH is based on HTTP. Requests arguments are issued as GET or POST parameters.


Request and Response

All requests are submitted using the GET ot POST methods of HTTP. The repositories must support both methods. Responses are formatted as HTML responses. The content type is text/xml. The response format must is well-formed XML.


Dates

Dates and times are uniformly expressed in UTC throughout the protocol. All time indicators must end with the special UTC designator 'Z', e.g.: 1957-03-20T20:30:00Z.


Metadata schema

OAI-PMH supports any metadata format encoded in XML. OAI-PMH supports dissemination of multiple metadata formats from a repository. Dublin Core is the minimal format specified for basic interoperability.


Flow Control

Sometimes reply lists from data providers are large. OAI-PMH supports partitioning. When a response is long, repository replies to a request with an incomplete list and a resumptionToken. In order to make the response a complete list, the harvester will need to issue one or more requests with resumptionTokens as arguments.


Errors and exceptions

Repositories must indicate OAI-PMH errors by the inclusion of one or more error elements. All error identifieres are defined in OAI-PMH protocol.


ELP

Overview

Extensible Library Protocol is the first protocol that uses web services. It was designed specially for JeromeDL library. ELP's most unique capabilities are:

  • ability to communicate with libraries using different metadata formats
  • ability to recognize duplicate of books from different libraries
  • easy registration of new libraries
  • P2P-based architecture

The Extensible Library Protocol is build as an L2L extension for JeromeDL library. It is based on Web Services (SOAP) and utilizes DublinCore based ontology as a base metadata for describing queries and results.

Main features
Extensibility

ELP is easily extensible both in requests and responses. Extensions use their own namespaces.

XML-based communication

ELP uses SOAP to exchange information. SOAP stands for Simple Object Access Protocol It is a simple XML based protocol to exchange data over HTTP. Communication over HTTP is better as Remote Procedure Calls (RPC), because it is supported by all Internet browsers and servers. SOAP provides a way to communicate between applications running on different operating systems, with different technologies and programming languages. It is very popular and supported for all software corporations.


metadata standards

ELP allows usage of any existing or future metadata standards. The only requirement is that they have to be serializable in XML. ELP utilizes DublinCorebased ontology as a base metadata for describing queries and results.


Conclusions

In this paper I tried to present communication protocols in digital libraries. Although there are quite many of them, there is still need to develop a new one. These new protocols are supposed to offer more flexible way of communication.

One of the new ideas is to create protocol based on Sematic Web Services. Such protocols would enable automatic interoperability between heterogeneous network of digital libraries. Semantic web lets us describe services semantically. It would be possible to pass information to client what the service really does. In this way a program may take place in which human was required.

Having digital libraries and communication protocol we could use a P2P backbone with HyperCuP topology of WSMX servers to connect them. But this is a subject of another article.

References

  1. Dienst Protocol Specification\\http://www.cs.cornell.edu/cdlrg/dienst/protocols/DienstProtocol.htm
  2. Dienst, A Protocol for a Distributed Digital Document Library\\http://www.broadcatch.com/dienst.html
  3. The Open Archives Initiative Protocol for Metadata Harvesting\\http://www.openarchives.org/OAI/openarchivesprotocol.html
  4. OAI for Beginners - the Open Archives Forum online tutorial\\http://www.oaforum.org/tutorial/
  5. Okraszewski M., Krawczyk H.: Semantic Web Services in L2L
  6. Bugalski P, Grzonkowski S.: HyperCuP
Facts about JeromeDL/CommunicationProtocols — Click + to find similar pages.RDF feed
Personal tools

Corrib cluster project is supported by Enterprise Ireland under Grant No. ILP/05/203, Science Foundation Ireland under Grant No. SFI/02/CE1/I131.
Hosted at DERI, NUI Galway.