Difference between revisions of "WebLab 1.2.2/WebLab Search"

From WebLab Wiki
Jump to navigationJump to search
(Example)
(Creating a ResultSet in JAVA)
Line 109: Line 109:
 
wra.writeNumberOfResults(10);
 
wra.writeNumberOfResults(10);
  
 +
// create a "hit" for each result to present
 
for (int i = 0; i < 10; i++) {
 
for (int i = 0; i < 10; i++) {
 +
// define the hit URI (identifier)
 
URI hitURI = new URI("weblab://sample/hit"+i);
 
URI hitURI = new URI("weblab://sample/hit"+i);
 
wra.writeHit(hitURI);
 
wra.writeHit(hitURI);
+
 
 +
// add information on the hit
 
wra.startInnerAnnotatorOn(hitURI);
 
wra.startInnerAnnotatorOn(hitURI);
 +
// add hit's rank in the result list
 
wra.writeRank(i+1);
 
wra.writeRank(i+1);
 +
// add a description to the hit (here this is fake)
 
wra.writeDescription("ablabalbalbla l balb al lbalb labl balb alb lab lbalba lb laal balb lba lba ... qskldqlksdl");
 
wra.writeDescription("ablabalbalbla l balb al lbalb labl balb alb lab lbalba lb laal balb lba lba ... qskldqlksdl");
 
wra.endInnerAnnotator();
 
wra.endInnerAnnotator();
 
}
 
}
 
</source>
 
</source>
 +
 
=== XML sample ===
 
=== XML sample ===
 
<source lang="xml">
 
<source lang="xml">

Revision as of 04:55, 19 October 2011

This page describes the specification of content of a ResultSet that any implemented search service should returns.

Specification

To be compatible with other WebLab services and portlets, a ResultSet MUST follow these rules:

  • a ResultSet contains a PieceOfKnowledge
  • this PieceOfKnowledge contains all data about the results of the search
  • this PieceOfKnowledge is divided in two parts : information about the ResultSet and description of results

ResultSet Description

the number of results, the offset of the query and a reference to the query URI

<rdf:Description rdf:about="weblab://searcher/ResultSet1799034111" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:wretrieval="http://weblab.ow2.org/core/1.2/ontology/retrieval#">
   <rdf:type rdf:resource="http://weblab.ow2.org/core/1.2/ontology/model#ResultSet"/>
   <wretrieval:isResultOf rdf:resource="weblab://myQuery/thisone"/>
   <wretrieval:hasNumberOfResults>42</wretrieval:hasNumberOfResults>
   <wretrieval:hasQueryOffset>0</wretrieval:hasQueryOffset>
</rdf:Description>

The previous XML describes a ResultSet from a searcher;

  • ResultSet URI is weblab://indexsearch.searcher/resultSet
  • its number of results is 42
  • the query URI is weblab://myQuery/thisone
  • the query offset is 0

Results Description

ResultSet are composed by Hit. A Hit is entity that described a result and its metadata.

Metadata

Mandatory metadata are:

  • reference to the ResultSet
  • reference to the result entity

Optional metadata are:

  • short description of the result (snippet of text for example)
  • score
  • rank
<rdf:Description rdf:about="weblab://searcher/hit#0" xmlns:wretrieval="http://weblab.ow2.org/core/1.2/ontology/retrieval#">
      <rdf:type rdf:resource="http://weblab.ow2.org/core/1.2/ontology/retrieval#Hit"/>
      <wretrieval:inResultSetHit rdf:resource="weblab://searcher/ResultSet1799034111"/>
      <wretrieval:isLinkedTo rdf:resource="weblab://mytentityA"/>
      <wretrieval:hasDescription>short description of the result</wretrieval:hasDescription>
      <wretrieval:hasScore>0.6665389</wretrieval:hasScore>
      <wretrieval:hasRank>1</wretrieval:hasRank>
</rdf:Description>

The previous XML example defines a Hit with the following metadata:

  • the Hit belongs to a ResultSet with the following URI: weblab://searcher/ResultSet1799034111
  • the Hit references the result which URI is: weblab://mytentityA
  • the description of the result is: "short description of the result"
  • the score of this result compared to all results is 0.6665389
  • the rank of this result in the ResultSet is 1

Result

A result can be any resource described in RDF with its URI referenced in a Hit.

The following XML show a result from a text searcher (WebLab SOLr engine) that can be displayed in the result portlet:

<rdf:Description rdf:about="weblab://mytentityA">
  <dc:language>en</dc:language>
  <wprocessing:hasOriginalFileSize>4988</wprocessing:hasOriginalFileSize>
  <wprocessing:hasNativeContent rdf:resource="weblab://crawlerFolderContent/1248881938281/362"/>
  <wprocessing:isExposedAs>http://test/websites/NewsAfghan/www.myafghan.com/search20109.html</wprocessing:isExposedAs>
  <wprocessing:hasGatheringDate>2009-07-29T19:38:58+02:00</wprocessing:hasGatheringDate>
  <dct:modified>2009-04-24T21:19:01+02:00</dct:modified>
  <dct:extent>4988 bytes</dct:extent>
  <dc:title>My Afghan News - Afghanistan News Headlines for 2/29/2004</dc:title>
  <dc:source>http://www.myafghan.com/search2.asp?search=2/29/2004</dc:source>
  <wprocessing:hasOriginalFileName>search20109.html</wprocessing:hasOriginalFileName>
  dc:format>text/html</dc:format>
</rdf:Description>

With:

Example

Creating a ResultSet in JAVA

A JAVA class offers some example of ResultSet creation in JAVA. It is located in the helper-sample project and the class itself is org.ow2.weblab.core.helpers.samples.ResultSetCreator. The following example presents the creation of a simple ResultSet with minimal information:

		// create the ResultSet object 
		ResultSet rset = ResourceFactory.createResource("sample", "minimalRSet", ResultSet.class);

		// create a "fake" query that should be the one that is issued before the creation of the ResultSet
		Query q = QueryCreator.createFullTextQuery();

		// adding the query that generated the result set in the result set resources list
		rset.getResource().add(q);

		WRetrievalAnnotator wra = new WRetrievalAnnotator(rset);
		// add annotation to link result set and query
		wra.writeResultOf(new URI(q.getUri()));
		// add offset/limit expected
		wra.writeExpectedOffset(0);
		wra.writeExpectedLimit(10);
		// add number of results
		wra.writeNumberOfResults(10);

		// create a "hit" for each result to present
		for (int i = 0; i < 10; i++) {
			// define the hit URI (identifier)
			URI hitURI = new URI("weblab://sample/hit"+i);
			wra.writeHit(hitURI);

			// add information on the hit
			wra.startInnerAnnotatorOn(hitURI);
			// add hit's rank in the result list
			wra.writeRank(i+1);
			// add a description to the hit (here this is fake)
			wra.writeDescription("ablabalbalbla l balb al lbalb labl balb alb lab lbalba lb laal balb lba lba ... qskldqlksdl");
			wra.endInnerAnnotator();
		}

XML sample

<resource xsi:type="ns3:ResultSet" uri="weblab://searcher/ResultSet1799034111" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dct="http://purl.org/dc/terms/" xmlns:ns3="http://weblab.ow2.org/core/1.2/ontology/model#" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
    <pok uri="weblab://geographicSearch/Pok1799034111">
        <data>
            <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
                <rdf:Description rdf:about="weblab://searcher/ResultSet1799034111" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:wretrieval="http://weblab.ow2.org/core/1.2/ontology/retrieval#" >
                    <rdf:type rdf:resource="http://weblab.ow2.org/core/1.2/ontology/model#ResultSet"/>
                    <wretrieval:isResultOf rdf:resource="weblab://myQuery/thisone"/>
                    <wretrieval:hasNumberOfResults>2</wretrieval:hasNumberOfResults>
                    <wretrieval:hasQueryOffset>0</wretrieval:hasQueryOffset>
                </rdf:Description>
 
              <rdf:Description rdf:about="weblab://searcher/hit#0" xmlns:wretrieval="http://weblab.ow2.org/core/1.2/ontology/retrieval#">
                    <rdf:type rdf:resource="http://weblab.ow2.org/core/1.2/ontology/retrieval#Hit"/>
                    <wretrieval:inResultSetHit rdf:resource="weblab://searcher/ResultSet1799034111"/>
                    <wretrieval:isLinkedTo rdf:resource="weblab://mytentityA"/>
                    <wretrieval:hasDescription>short description of the result</wretrieval:hasDescription>
                    <wretrieval:hasScore>0.6665389</wretrieval:hasScore>
                    <wretrieval:hasRank>1</wretrieval:hasRank>
              </rdf:Description>
 
              <rdf:Description rdf:about="weblab://searcher/hit#1" xmlns:wretrieval="http://weblab.ow2.org/core/1.2/ontology/retrieval#">
                    <rdf:type rdf:resource="http://weblab.ow2.org/core/1.2/ontology/retrieval#Hit"/>
                    <wretrieval:inResultSetHit rdf:resource="weblab://searcher/ResultSet1799034111"/>
                    <wretrieval:isLinkedTo rdf:resource="weblab://mytentityB"/>
                    <wretrieval:hasDescription>short description of another result</wretrieval:hasDescription>
                    <wretrieval:hasScore>0.7895444</wretrieval:hasScore>
                    <wretrieval:hasRank>2</wretrieval:hasRank>
              </rdf:Description>
 
              <rdf:Description rdf:about="weblab://mytentityA" xmlns:wprocessing="http://weblab.ow2.org/core/1.2/ontology/processing#">
                    <dc:language>en</dc:language>
                    <wprocessing:hasOriginalFileSize>4988</wprocessing:hasOriginalFileSize>
                    <wprocessing:hasNativeContent rdf:resource="weblab://crawlerFolderContent/1248881938281/362"/>
                    <wprocessing:isExposedAs>http://test/websites/NewsAfghan/www.myafghan.com/search20109.html</wprocessing:isExposedAs>
                    <wprocessing:hasGatheringDate>2009-07-29T19:38:58+02:00</wprocessing:hasGatheringDate>
                    <dct:modified>2009-04-24T21:19:01+02:00</dct:modified>
                    <dct:extent>4988 bytes</dct:extent>
                    <dc:title>My Afghan News - Afghanistan News Headlines for 2/29/2004</dc:title>
                    <dc:source>http://www.myafghan.com/search2.asp?search=2/29/2004</dc:source>
                    <wprocessing:hasOriginalFileName>search20109.html</wprocessing:hasOriginalFileName>
                    <dc:format>text/html</dc:format>
              </rdf:Description>
 
              <rdf:Description rdf:about="weblab://mytentityB" xmlns:wprocessing="http://weblab.ow2.org/core/1.2/ontology/processing#">
                    <dc:language>fr</dc:language>
                    <wprocessing:hasOriginalFileSize>5180</wprocessing:hasOriginalFileSize>
                    <wprocessing:hasNativeContent rdf:resource="weblab://crawlerFolderContent/1248886578271/363"/>
                    <wprocessing:isExposedAs>http://test/websites/NewsAfghan/www.myafghan.com/search20546.html</wprocessing:isExposedAs>
                    <wprocessing:hasGatheringDate>2009-07-30T22:21:57+02:00</wprocessing:hasGatheringDate>
                    <dct:modified>2009-07-23T20:41:55+02:00</dct:modified>
                    <dct:extent>5180 bytes</dct:extent>
                    <dc:title>My Afghan News - Afghanistan News Headlines for 2/30/2004</dc:title>
                    <dc:source>http://www.myafghan.com/search2.asp?search=2/30/2004</dc:source>
                    <wprocessing:hasOriginalFileName>search20546.html</wprocessing:hasOriginalFileName>
                    <dc:format>text/html</dc:format>
              </rdf:Description>
 
 
            </rdf:RDF>
        </data>
    </pok>
</resource>