MarcOnt/Projects/WMap/Documents/UseOfSesame
From Corrib Clan Wiki
zelerdoml bascnaeltore
Contents |
Introduction
Sesame is an open source Java framework for storing, querying and reasoning with RDF and RDF Schema. It can be used as a database for RDF and RDF Schema, or as a Java library for applications that need to work with RDF internally. Sesame provides the necessary tools to parse, interpret, query and store all this information, possibly embedded in application, or in a separate database or even on a remote server.
Repositories and inferencing
A central concept in the Sesame framework is the repository. A repository is a storage container for RDF. This can simply mean a Java object (or set of Java objects) in memory, or it can mean a relational database. Whatever way of storage is chosen however, it is important to realize that almost every operation in Sesame happens with respect to a repository: when you add RDF data, you add it to a repository. When you do a query, you query a particular repository.
Sesame, as mentioned, supports RDF Schema inferencing. This means that given a set of RDF and/or RDF Schema, Sesame can find the implicit information in the data. Sesame supports this by simply adding all implicit information to the repository as well when data is being added.
It is important to realize that inferencing in Sesame is associated with the type of repository that you use. Sesame supports several different types of repositories. Some of these support inferencing, others do not.
Installing Sesame
Library installation
To use Sesame as a library in a Java application, one needs the Sesame jar file sesame.jar. This file can be found in the lib/ directory of the binary download. Simply including this jar file in classpath will allow to use the functionality of Sesame in a Java application.
Server installation
To use Sesame as a stand alone server, one needs the Sesame war file sesame.war. This file can be found in the lib/ directory of the binary download. Another requirement is a Java servlet container for running the Sesame servlets, such a Tomcat 5. The process of deploying is not different from deploying any other Java web application. At the end of the process one should configure appropriate repositories which is done with a GUI application available to download.
Storage
Sesame supports several ways of storing repositories i.e. RDBMS, memory or files. However the most likely is RDBMS. Currently, Sesame is able to use PostgreSQL, MySQL, Oracle (9i or newer) and SQL Server. Sesame has been tested with versions of MySQL starting from 3.23.47. Sesame uses the character set features that were introduced in MySQL 4.1 to properly handle non-ASCII characters, if available. Sesame will automatically detect whether these features are available and will fall back to using BLOBs when an older version of MySQL is used. In that case, Sesame will not be able to properly handle non-ASCII characters in literals and namespace names.
Server administration
All administration tasks can be done with simple Java application Configure Sesame! or manually editing the XML configuration file which is not recommended. The tool gives such functionality like loading and storing configuration to file and directly to running server, managing users and configuring repositories.
Change tracking
Another tool developed in On-To-Knowledge is OMM, the Ontology Middleware Module, which was developed by OntoText. OMM is an extension of Sesame that adds features such as change tracking and improved security. More details on this module is on the OMM project page http://www.ontotext.com/omm/.
The SeRQL query language
SeRQL ("Sesame RDF Query Language", pronounced "circle") is a new RDF/RDFS query language that is currently being developed by Aduna as part of Sesame. It combines the best features of other (query) languages (RQL, RDQL, N-Triples, N3) and adds some of its own.
Some of SeRQL's most important features are:
- graph transformation,
- RDF Schema support,,
- XML Schema datatype support,
- expressive path expression syntax,
- optional path matching.
Variables
Variables are identified by names. These names must start with a letter or an underscore ('_') and can be followed by zero or more letters, numbers, underscores, dashes ('-') or dots ('.').
SeRQL keywords are not allowed to be used as variable names. Currently, the following keywords are used or reserved for future use in SeRQL: select, construct, from, where, using, namespace, true, false, not, and, or, like, label, lang, datatype, null, isresource, isliteral, sort, in, union, intersect, minus, exists, forall, distinct, limit, offset.
Keywords in SeRQL are all case-insensitive, this in contrast to variable names; these are case-sensitive.
URIs
There are two ways to write down URIs in SeRQL: either as full URIs or as abbreviated URIs. Full URIs must be surrounded with "<" and ">". Examples of this are:
- <http://www.openrdf.org/index.html>
- <mailto:sesame@openrdf.org>
- <file:///C:\rdffiles\test.rdf>
As URIs tend to be long strings with the first part being shared by several of them (i.e. the namespace), SeRQL allows one to use abbreviated URIs (or Qnames) by defining (short) names for these namespaces which are called "prefixes". A Qname always starts with one of the defined prefixes and a colon (":"). After this colon, the part of the URI that is not part of the namespace follows. The first part, consisting of the prefix and the colon, is replaced by the full namespace by the query engine. An example Qname is:
- sesame:index.html
Literals
RDF literals consist of three parts: a label, a language tag, and a datatype. The language tag and the datatype are optional and at most one of these two can accompany a label (a literal can not have both a language tag and a datatype). The notation of literals in SeRQL has been modelled after their notation in N-Triples; literals start with the label, which is surrounded by double quotes, optionally followed by a language tag with a "@" prefix or by a datatype URI with a "^^" prefix. Example literals are:
- "foo"
- "foo"@en
- "<foo/>"<http://www.w3.org/1999/02/22-rdf-syntax-ns#XMLLiteral>
Path expressions
One of the most prominent parts of SeRQL are path expressions. Path expressions are expressions that match specific paths through an RDF graph. Most current RDF query languages allow you to define path expressions of length 1, which can be used to find (combinations of) triples in an RDF graph. SeRQL, like RQL, allows you to define path expressions of arbitrary length.
Imagine that we want to query an RDF graph for persons who work for companies that are IT companies. Querying for this information comes down to finding the following pattern in the RDF graph. The SeRQL notation for path expressions is written down as:
{Person} foo:worksFor {Company} rdf:type {foo:ITCompany}
Short cuts
{subj1} pred1 {obj1, obj2, obj3}
is equivalent to
{subj1} pred1 {obj1},
{subj1} pred1 {obj2},
{subj1} pred1 {obj3}
{subj1} pred1 {obj1};
pred2 {obj2}
is equivalent to
{subj1} pred1 {obj1},
{subj1} pred2 {obj2}
Optional path expressions
Consider an RDF graph that contains information about people that have names, ages, and optionally e-mail addresses. This is a situation that is likely to be very common in RDF data. A logical query on this data is a query that yields all names, ages and, when available, e-mail addresses of people, e.g.:
{Person} person:name {Name};
person:age {Age};
person:email {EmailAddress}
Select and construct queries
The SeRQL query language supports two querying concepts. The first one can be characterized as returning a table of values, or a set of variable-value bindings. The second one returns a true RDF graph, which can be a subgraph of the graph being queried, or a graph containing information that is derived from it. The first type of queries are called "select queries", the second type of queries are called "construct queries".
Both types of queries are very similar to SQL queries. Simplifying select queries looks like:
SELECT DISTINCT variable list FROM path list WHERE logic conditions USING NAMESPACE namespace definition
and construct queries:
CONSTRUCT
path list
FROM
path list
WHERE
logic conditions
USING NAMESPACE
namepace definition
Examples
Description: Find all artefacts whose English title contains the string "night" and the museum where they are exhibited. The artefact must have been created by someone with first name "Rembrandt". The artefact and museum should both be represented by their titles.
SELECT DISTINCT
label(ArtefactTitle), MuseumName
FROM
{Artefact} arts:created_by {} arts:first_name {"Rembrandt"},
{Artefact} arts:exhibited {} dc:title {MuseumName},
{Artefact} dc:title {ArtefactTitle}
WHERE
isLiteral(ArtefactTitle) AND
lang(ArtefactTitle) = "en" AND
label(ArtefactTitle) LIKE "*night*"
USING NAMESPACE
dc = <http://purl.org/dc/elements/1.0/>,
arts = <http://www.arts.com/schema.rdf#>
This query derives that an artist who has created a painting, is a painter. The relation between the painter and the painting is modelled to be art:hasPainted.
CONSTRUCT
{Artist} rdf:type {art:Painter};
art:hasPainted {Painting}
FROM
{Artist} rdf:type {art:Artist};
art:hasCreated {Painting} rdf:type {art:Painting}
USING NAMESPACE
rdf = <http://www.w3.org/1999/02/22-rdf-syntax-ns#>,
art = <http://www.arts.com/rdf-schema.rdf#>
More information
More information on the SeRQL query language like list of all the operators and functions and grammar in BNF can be found on http://www.openrdf.org/doc/users/ch06.html.
Main advantages
- Sesame is fairly easy to install and use,
- it can be used both as a jar library in an application and a server,
- server installation gives the possibility of changing it to some other repository with minimal changes to the existing code if there is a proper intermediate layer between the application and server,
- it supports HTTP interface,
- it is open source,
- it is written in Java, which makes it platform independent,
- it supports two most popular open source RDBMS: PostgreSQL and MySQL, the later is already used in the project,
- simple to learn query language.
Possibility of use for MarcOnt
There is a possibility of use Sesame in MarcOnt for storing ontologies and using its query language.
The most likely configuration fit for use for MarcOnt is as a server deployed in Tomcat and MySQL as storage. Both needed prerequisites are already used in the project which is in favour of this solution.
Possible utilizing of the change tracking feature needs further study of Ontology Middleware Module and its possibilities in comparison with MarcOnt requirements.



