FOAFRealm/Documents/SesameJoined

From Corrib Clan Wiki

Jump to: navigation, search

last edited 2005-04-26 05:33:49 by AdamGzella

Contents

Purpose of this document

This documents describes Sesame system, its architecture and features. Some parts include consideration about migrating from Jena (other RDF storage) into Sesame and possibility to use in Marc Ont and FOAFRealm projects.

Short description

Sesame is an open source Java framework for storing, querying and reasoning with RDF and RDF Schema. It can be used as a database for RDF and RDF Schema, or as a Java library for applications that need to work with RDF internally. Sesame provides the necessary tools to parse, interpret, query and store all this information, possibly embedded in application, or in a separate database or even on a remote server.

Sesame has been designed with flexibility in mind. It can be deployed on top of a variety of storage systems (relational databases, in-memory, filesystems, keyword indexers, etc.), and offers a large scale of tools to developers to leverage the power of RDF and RDF Schema, such as a flexible access API, which supports both local and remote (through HTTP, SOAP or RMI) access, and several query languages, of which SeRQL is the most powerful one.

Repository

A central concept in the Sesame framework is the repository. A repository is a storage container for RDF. This can simply mean a Java object (or set of Java objects) in memory, or it can mean a relational database. Whatever way of storage is chosen, it is important to realize that almost every operation in Sesame happens with respect to a repository: when you add RDF data, you add it to a repository. When you do a query, you query a particular repository.

Storage

Sesame supports several ways of storing repositories i.e. RDBMS, memory or files. However the most likely is RDBMS. Currently, Sesame is able to use PostgreSQL, MySQL, Oracle (9i or newer) and SQL Server. Sesame has been tested with versions of MySQL starting from 3.23.47. Sesame uses the character set features that were introduced in MySQL 4.1 to properly handle non-ASCII characters, if available. Sesame will automatically detect whether these features are available and will fall back to using BLOBs when an older version of MySQL is used. In that case, Sesame will not be able to properly handle non-ASCII characters in literals and namespace names.

Installation and configuration (Configure Sesame!)

Installation

The easiest way to install sesame is as follows:

  • Go to the web applications directory ( [TOMCAT_DIR]/webapps/ by default) and create a directory 'sesame' there.
  • Extract the sesame.war file (which can be found in the lib directory of the binary Sesame distribution) to the newly created 'sesame' directory.
    • for example: jar -xf [PATH/TO/]sesame.war
    • You can also use a program like Win Zip or unzip to extract the archive.
  • In case you are planning to use a database with Sesame, copy the appropriate JDBC-driver file(s) to the directory [SESAME_DIR]/WEB-INF/lib/
  • Copy the file [SESAME_DIR]/WEB-INF/system.conf.example to [SESAME_DIR]/WEB-INF/system.conf The example file contains some repository entries for different Sails and databases, and one user account. The file may require some modifications in order to work on your machine. Please check out Server administration if you want to learn how to do this.
  • (Re)start your Tomcat server and Sesame should now be up and running. You can access the Sesame web interface at http://[MACHINE_NAME]:[TOMCAT_PORT]/sesame

To use Sesame as a library in a Java application, one needs the Sesame jar file sesame.jar. This file can be found in the lib/ directory of the binary download. Simply including this jar file in classpath will allow to use the functionality of Sesame in a Java application.

Configuration

Sesame's configuration is specified in the file [SESAME_DIR]/WEB-INF/system.conf . You can edit this configuration file locally on your Sesame server using the Configure Sesame! tool available in [SESAME_DIR]/WEB-INF/bin/ . To start this tool type:

	[SESAME_DIR]/WEB-INF/bin/[[config Sesame]].sh (or [[config Sesame]].bat, under Windows.)

This tool allows you to change all the Sesame settings, including

  • users configuration
  • repositories configuration
  • server configuration (admin password, log directory, log level, RMI settings)

Change tracking

Another tool developed in On-To-Knowledge is OMM, the Ontology Middleware Module, which was developed by Onto Text. OMM is an extension of Sesame that adds features such as change tracking and improved security. More details on this module is on the OMM project page http://www.ontotext.com/omm/.

Sesame architecture

attachment:ses_arch.png

Starting at the bottom, the Storage And Inference Layer, or SAIL API, is an internal Sesame API that abstracts from the storage format used (i.e. whether the data is stored in an RDBMS, in memory, or in files, for example), and provides reasoning support. SAIL implementations can also be stacked on top of each other, to provide functionality such as caching or concurrent access handling. Each Sesame repository has its own SAIL object to represent it.

On top of the SAIL, we find Sesame's functional modules, such as the SeRQL, RQL and RDQL query engines, the admin module, and RDF export.

Access to these functional modules is available through Sesame's Access APIs, consisting of two seperate parts: the Repository API and the Graph API. The Repository API provides high-level access to Sesame repositories, such as querying, storing of rdf files, extracting RDF, etc. The Graph API provides more fine-grained support for RDF manipulation, such as adding and removing individual statements, and creation of small RDF models directly from code. The two APIs complement each other in functionality, and are in practice often used together.

The Access APIs provide direct access to Sesame's functional modules, either to a client program or to the next component of Sesame's architecture, the Sesame server. This is a component that provides HTTP-based access to Sesame's APIs. Then, on the remote HTTP client side, we again find the access APIs, which can again be used for communicating with Sesame, this time not as a library, but as a server running on a remote location.

Inference

Sesame, supports RDF Schema inferencing. This means that given a set of RDF and/or RDF Schema, Sesame can find the implicit information in the data. Sesame supports this by simply adding all implicit information to the repository as well when data is being added.

It is important to realize that inferencing in Sesame is associated with the type of repository that you use. Sesame supports several different types of repositories. Some of these support inferencing, others do not. Whether you want Sesame to do inferencing for you is a choice that depends very much on your application.

SAIL and how to create MySQL connect

As mentioned, SAIL API allows Sesame to use variety of storage mechanisms. In foafrealm project MySQL database is used to store RFD data, so it is important to know how Sesame co-operate with MySQL.

Before you start using MySQL database, you must create new user (with all privileges) and create new database. The names of both user and database must be passed to SAIL module.

Connection to MySQL can be made in two ways.

  • There could be static (previously created) repository. Then we can configure it (using Configure Sesame!) to use MySQL database. (remember to set the suitable user and database names)
  • All this actions can be made in source code of the application. It could look like:
[[Repository Config]] [[rep Config]] = new [[Repository Config]]("[[my Custom Rep]]");

[[Sail Config]] [[sync Sail]] = new [[Sail Config]]("org.openrdf.sesame.sailimpl.sync.[[Sync Rdf Schema Repository]]");
[[Sail Config]] [[mem Sail]] = new org.openrdf.sesame.sailimpl.rdbms.[[Rdf Repository Config]](
	com.mysql.jdbc.Driver,
	jdbc:mysql://localhost:3306/testdb,
	“user_name”
	“user_password);

[[rep Config]].[[add Sail]]([[sync Sail]]);
[[rep Config]].[[add Sail]]([[mem Sail]]);
[[rep Config]].[[set World Readable]](true);
[[rep Config]].[[set World Writeable]](true);

[[Local Repository]] [[my Custom Repository]] = service.[[create Repository]]([[rep Config]]); 

Sesame API

This is the most important thing for programmers. As we know, API consists of two complementary parts.

Repository API

The Repository API is the central access point for Sesame repositories. It can be used to query and update the contents of both local and remote repositories. The Repository API handles all the details of client-server communication, allowing you to handle remote repositories as easily as local ones.

The main interfaces for the repository API can be found in package org.openrdf.sesame.repository. The implementations of these interface for local and remote repositories can be found in subpackages of this package.

Generally, there are three types of repositories available:

  • Local repository. An example of this type was shown in pt. 6 (creating repostory with MySQL support).
  • Remote repository. In case you want to use remote Sesame server. There should be of course connection established. In Sesame it can be done with simple HTTP connection or with RMI service. Examples of creating remote repository are shown below:

HTTP connection

java.net.URL [[sesame Server U R L]] = new java.net.URL("http://HOSTNAME/SESAME_DIR/");
[[Sesame Service]] service = Sesame.[[get Service]]([[sesame Server U R L]]);
service.login("USERNAME", "PASSWORD");

RMI connection

java.net.URI [[sesame Server U R I]] = new java.net.URI("rmi://HOSTNAME:PORT/");
[[Sesame Service]] service = Sesame.[[get Service]]([[sesame Server U R I]]);
service.login("USERNAME", "PASSWORD");

Finally the repository.

[[Sesame Repository]] [[my Repository]] = service.[[get Repository]]("[[remote Repository I D]]");
  • Server repository. This one should be used when we want to create our repository on local Sesame server. To get the local server there should be use:
[[Local Service]] service = [[Sesame Server]].[[get Local Service]]();

Using this object we can get to the available repositories in an identical fashion as in the local repository scenario.

This type of repository is probably the best for foafrealm manage. It allows to connect to Sesame server as a exterior service.

Querying a repository

There are a number of methods available for Sesame Repository objects (all can be found in Sesame Javadoc). As an example there is querying a repository presented. As we know, Sesame supports three query languages. There is shown a simple way to query a repository. In this example SeRQL is used. As the result we get Query Resutls Table. There is some methods that allows operate on this table (ie. Get Column Count(), Get Row Count(), Get Value())

String query = "SELECT * FROM {x} p {y}";
[[Query Results Table]] [[results Table]] = [[my Repository]].[[perform Table Query]]([[Query Language]].SERQL, query);

Adding RDF data to a repository

Adding RDF to a repository can be done in several ways: the RDF can be in the form of a local file, a location on the Web, or a java String object. Individual RDF statements can not be added through the repository API. For that purpose, use the Graph API.

The method in the Repository API for adding data is named add Data() and it takes several parameters. The most important is data source, the first parameter. The parameter baseURI specifies the base URI that any relative URIs in the data should be resolved against. format specifies the format of the data (RDF/XML, N-Triples or Turtle). verify Data is a boolean flag that specifies whether the data should be checked for syntactic correctness before attempting upload. Finally, listener specifies an Admin Listener object to which status updates during upload, and possible errors and warnings, are reported.

Below is an example, that shows one of the way of adding RDF data into repository.

java.net.URL myRDFData = new java.net.URL("http://www.foo.com/bar/[[my Rdf File]].rdf");
String baseURI = "http://my.base.uri#";
boolean [[verify Data]] = true;
[[Admin Listener]] [[my Listener]] = new [[Std Out Admin Listener]]();

[[my Repository]].[[add Data]](myRDFData, baseURI, RDFFormat.RDFXML, [[verify Data]], [[my Listener]]);

Graph API (simple queries and data adds)

The Graph API provides more fine-grained support for RDF manipulation, such as adding and removing individual statements, and creation of small RDF models directly from code.

The main interface for the Graph API is org.openrdf.model.Graph. The purpose of this class is to offer a convenient way for handling RDF graphs from code. Graphs can be built by programmaticaly adding statements to it, or they can be created by evaluating a SeRQL-construct query on a Sesame repository.

This API is also very important in case of using Sesame instead of Jena. All the Sesame methods should cover the Jena functions (for example this used in foafrealm project)

There are some examples in next few points, that show how to use this API.

Creating an empty Graph and adding statements to it

An empty graph can be acquired by simply creating a Graph Impl object:

Graph [[my Graph]] = new org.openrdf.model.impl.[[Graph Impl]]();

To add new statements to this graph, we have to create the building blocks of these statements (the subject, predicate, and object) first. This can be done using a Value Factory object, which can be obtained from the graph, as is shown in the following example:

[[Value Factory]] [[my Factory]] = [[my Graph]].[[get Value Factory]]();
String namespace = "http://www.foo.com/bar#";

URI [[my Subject]] = [[my Factory]].createURI(namespace, "actor1");
URI [[my Predicate]] = [[my Factory]].createURI(namespace, "[[has Name]]");
Literal [[my Object]] = [[my Factory]].[[create Literal]]("Tom Hanks");

[[my Graph]].add([[my Subject]], [[my Predicate]], [[my Object]]);
or
[[my Subject]].[[add Property]]([[rdf Type]], [[actor Class]]);

Adding/removing a Graph to/from a repository

Now that we have created a graph, we can use it to add all of its statements to our repository:

[[my Repository]].[[add Graph]]([[my Graph]]);

We can remove the graph from the repository in the same way:

[[my Repository]].[[remove Graph]]([[my Graph]]);

Creating a Graph for an existing repository

Often, you will not want to create a new graph from scratch, but rather create a graph that contains statements that are in a (local) Sesame repository. The Repository API allows you to do just this: it can produce a Graph object that wraps a local repository:

Graph [[my Graph]] = [[my Local Repository]].[[get Graph]]();

The produced Graph object allows you to use all of its convenience methods for manipulating the RDF graph. All changes that are made on the Graph object are directly passed on to the underlying repository.

Creating a graph using graph queries

This way should be used when there is no need of having all the repository in the Graph or when you want to have local copy/subset of the repository. Additionally it must be used when you have remote server connection. The following code fragment uses a SeRQL-construct query to create a Graph object containing all rdfs:sub Class Of statements:

String query = "CONSTRUCT * FROM {[[Sub Class]]} rdfs:[[sub Class Of]] {[[Super Class]]}";
Graph [[class Hierarchy]] = [[my Repository]].[[perform Graph Query]]([[Query Language]].SERQL, query);

A graph created in this fashion will be independent from the repository it was extracted from: changes to the graph will not be passed on to the repository.

The Graph API, in combination with graph queries, can be also very useful when you wish to update (e.g. change the value of) a large number of statements in your repository.

Query Languages (SeRQL, RQL, RDQL)

Sesame, as mentioned, supports three query languages. The most powerful is SeRQL, which detailed description is placed below.

As we know Jena supports RDQL language. So as long as we want still use RDQL, there should be no problem migrating to Sesame. But there should be consider changing the language into SeRQL, and take advantage of his features.

The SeRQL query language

SeRQL ("Sesame RDF Query Language", pronounced "circle") is a new RDF/RDFS query language that is currently being developed by Aduna as part of Sesame. It combines the best features of other (query) languages (RQL, RDQL, N-Triples, N3) and adds some of its own.

Some of SeRQL's most important features are:

  • graph transformation,
  • RDF Schema support,,
  • XML Schema datatype support,
  • expressive path expression syntax,
  • optional path matching.

Variables

Variables are identified by names. These names must start with a letter or an underscore ('_') and can be followed by zero or more letters, numbers, underscores, dashes ('-') or dots ('.').

SeRQL keywords are not allowed to be used as variable names. Currently, the following keywords are used or reserved for future use in SeRQL: select, construct, from, where, using, namespace, true, false, not, and, or, like, label, lang, datatype, null, isresource, isliteral, sort, in, union, intersect, minus, exists, forall, distinct, limit, offset.

Keywords in SeRQL are all case-insensitive, this in contrast to variable names; these are case-sensitive.

URIs

There are two ways to write down URIs in SeRQL: either as full URIs or as abbreviated URIs. Full URIs must be surrounded with "<" and ">". Examples of this are:

As URIs tend to be long strings with the first part being shared by several of them (i.e. the namespace), SeRQL allows one to use abbreviated URIs (or Qnames) by defining (short) names for these namespaces which are called "prefixes". A Qname always starts with one of the defined prefixes and a colon (":"). After this colon, the part of the URI that is not part of the namespace follows. The first part, consisting of the prefix and the colon, is replaced by the full namespace by the query engine. An example Qname is:

    • sesame:index.html

Literals

RDF literals consist of three parts: a label, a language tag, and a datatype. The language tag and the datatype are optional and at most one of these two can accompany a label (a literal can not have both a language tag and a datatype). The notation of literals in SeRQL has been modelled after their notation in N-Triples; literals start with the label, which is surrounded by double quotes, optionally followed by a language tag with a "@" prefix or by a datatype URI with a "" prefix. Example literals are:

Path expressions

One of the most prominent parts of SeRQL are path expressions. Path expressions are expressions that match specific paths through an RDF graph. Most current RDF query languages allow you to define path expressions of length 1, which can be used to find (combinations of) triples in an RDF graph. SeRQL, like RQL, allows you to define path expressions of arbitrary length.

Imagine that we want to query an RDF graph for persons who work for companies that are IT companies. Querying for this information comes down to finding the following pattern in the RDF graph. The SeRQL notation for path expressions is written down as:

{Person} foo:[[works For]] {Company} rdf:type {foo:ITCompany}

Short cuts

{subj1} pred1 {obj1, obj2, obj3}

is equivalent to

{subj1} pred1 {obj1},
{subj1} pred1 {obj2},
{subj1} pred1 {obj3}
{subj1} pred1 {obj1};
        pred2 {obj2}

is equivalent to

{subj1} pred1 {obj1},
{subj1} pred2 {obj2}

Optional path expressions

Consider an RDF graph that contains information about people that have names, ages, and optionally e-mail addresses. This is a situation that is likely to be very common in RDF data. A logical query on this data is a query that yields all names, ages and, when available, e-mail addresses of people, e.g.:

{Person} person:name {Name};
         person:age  {Age};
         person:email {[[Email Address]]}

Select and construct queries

The SeRQL query language supports two querying concepts. The first one can be characterized as returning a table of values, or a set of variable-value bindings. The second one returns a true RDF graph, which can be a subgraph of the graph being queried, or a graph containing information that is derived from it. The first type of queries are called "select queries", the second type of queries are called "construct queries".

Both types of queries are very similar to SQL queries. Simplifying select queries looks like:

SELECT DISTINCT
   variable list
FROM
   path list
WHERE
   logic conditions
USING NAMESPACE
   namespace definition

and construct queries:

CONSTRUCT
    path list
FROM
    path list
WHERE
   logic conditions
USING NAMESPACE
    namepace definition

Examples

Description: Find all artefacts whose English title contains the string "night" and the museum where they are exhibited. The artefact must have been created by someone with first name "Rembrandt". The artefact and museum should both be represented by their titles.

SELECT DISTINCT
   label([[Artefact Title]]), [[Museum Name]]
FROM
   {Artefact} arts:created_by {} arts:first_name {"Rembrandt"},
   {Artefact} arts:exhibited {} dc:title {[[Museum Name]]},
   {Artefact} dc:title {[[Artefact Title]]}
WHERE
   [[is Literal]]([[Artefact Title]]) AND
   lang([[Artefact Title]]) = "en" AND
   label([[Artefact Title]]) LIKE "*night*"
USING NAMESPACE
   dc   = <http://purl.org/dc/elements/1.0/>,
   arts = <http://www.arts.com/schema.rdf#>


This query derives that an artist who has created a painting, is a painter. The relation between the painter and the painting is modelled to be art:has Painted.

CONSTRUCT
    {Artist} rdf:type {art:Painter};
             art:[[has Painted]] {Painting}
FROM
    {Artist} rdf:type {art:Artist};
             art:[[has Created]] {Painting} rdf:type {art:Painting}
USING NAMESPACE
    rdf = <http://www.w3.org/1999/02/22-rdf-syntax-ns#>,
    art = <http://www.arts.com/rdf-schema.rdf#>

More information

More information on the SeRQL query language like list of all the operators and functions and grammar in BNF can be found on http://www.openrdf.org/doc/users/ch06.html.

Main advantages

  • Sesame is fairly easy to install and use,
  • it can be used both as a jar library in an application and a server,
  • server installation gives the possibility of changing it to some other repository with minimal changes to the existing code if there is a proper intermediate layer between the application and server,
  • it supports HTTP interface,
  • it is open source,
  • it is written in Java, which makes it platform independent,
  • it supports two most popular open source RDBMS: PostgreSQL and MySQL, the later is already used in the project,
  • simple to learn query language.

Possibility of use for MarcOnt and FOAFRealm

There is a possibility of use Sesame in MarcOnt for storing ontologies and using its query language.

The most likely configuration fit for use for MarcOnt (and for FOAFRealm as well) is as a server deployed in Tomcat and MySQL as storage. Both needed prerequisites are already used in the project which is in favour of this solution.

Possible utilizing of the change tracking feature needs further study of Ontology Middleware Module and its possibilities in comparison with MarcOnt requirements.

Some considerations about possibility to use Sesame with FOAFRealm are placed in the text beneath. The main question was if it is possible to replace Jena by Sesame. As you could see it is possible and would probably be better solution.

Sources

User guide for Sesame.

http://www.openrdf.org/doc/users/userguide.html

Sesame API Javadoc

http://www.openrdf.org/doc/api/sesame/
Facts about FOAFRealm/Documents/SesameJoined — Click + to find similar pages.RDF feed
Personal tools

Corrib cluster project is supported by Enterprise Ireland under Grant No. ILP/05/203, Science Foundation Ireland under Grant No. SFI/02/CE1/I131.
Hosted at DERI, NUI Galway.