Chapter 13. Case study 2: SIREn
Contributed by RENAUD DELBRU, NICKOLAI TOUPIKOV, MICHELLE CATASTA, ROBERT FULLER, and GIOVANNI TUMMARELLO
In this case study, the crew from the Digital Enterprise Research Institute (DERI; http://www.deri.ie) describes how they created the Semantic Information Retrieval Engine (SIREn) using Lucene. SIREn (which is open source and available at http://siren.sindice.com) searches the semantic web, also known as Web 3.0 or the “Web of Data,” which is a quickly growing collection of semistructured documents available from web pages adopting the Resource Description Framework (RDF)[1] standard. With RDF, pages publicly available on the web encode structural relationships between arbitrary entities and objects via predicates. Although the standard has been defined for some time, it’s only recently that websites have begun adopting it in earnest.
A publicly accessible demonstration of SIREn is running at http://sindice.com, covering more than 50 million crawled structured documents, resulting in over 1 billion entity, predicate, and object triples. SIREn is a powerful alternative to the more common RDF triplestores, typically backed by relational databases and thus often limited when it comes to full-text search.