S3B/SQE/Algorithm
From Corrib Clan Wiki
trocdo dronricer viervirolcer
[edit]
Description of the Semantic Query Expansion algorithm in JeromeDL
- The user writes a query.
- Search is processed by the Lucene full text indexer.
- The results are primarily sorted using TF/IDF method (Term frequency/Inverse document frequency)
- each phrase found is given some points - if a word appears more often in the document it gets more points
- however, if a word is very common it looses points
- ranking is sorted according to the number of points
- Weights are associated to the results, according to their semantic similarity to the query - here is where Semantic Query Expansion is actually called.
- Words are given meanings, which are taken from one of the "controlled vocabularies" - e.g. authority files (users, authors, etc.), keywords, categories, data types. Note that new modules are possible to implement. As a result we get a set of Semantic Objects in place of Strings: String -> SemanticObject[]
- The Semantic Object from the previous step is processed by weight function. Weights are provided by three contexts:
- long-term context - i.e. user's interests. They are loaded using three steps:
- Loading a list of topics of interests from user's profile.
- Checking user's bookmarks, which means folders, their descriptions and their contents.
- Making two first steps using user's friends' profiles (the level of "friendship" is determined in FOAFRealm) - this is called "extrapolated user's profile".
- mid-term context - i.e. last searches and recently browsed resources. Weights are additionally changed, according to "how long ago" do the words "last" and "recently" mean.
- short-term context - i.e. last queries and the query itself.
- Weight is finally counted as a sum of weights from each context.
- If the result - mark of similarity - doesn't fulfill some predefined constraints (e.g. similarity threshold), the search is expanded, i.e. invoked recursively. New concepts, like synonyms etc., are loaded form RDF base and taken under consideration. The algorithm goes back to pt. 4.2.
- Otherwise, if the results are satisfactory, the search process is finished.
- All actions taken by the algorithm are saved in a log which can be read by the user. The user can manually change some steps taken by the algorithm to try to refine the query results, e.g. by changing weights and then making the search process start again.



