Difference between revisions of "WebLab 1.2.2/WebLab Search"

From WebLab Wiki
Jump to navigationJump to search
(Creating a ResultSet in JAVA)
(Example)
Line 125: Line 125:
 
</source>
 
</source>
  
 +
=== Reading a ResultSet in JAVA ===
 +
 +
The following example presents some possible use of a ResultSet to get minimal information:
 +
 +
<source lang="java">
 +
// getting SearchResult from a service
 +
SearchReturn res = solrSearcher.search(args);
 +
ResultSet set = res.getResultSet();
 +
 +
// getting the original query
 +
Resource q = set.getResource().get(0);
 +
if (q instanceof Query) {
 +
Query query = (Query) q;
 +
// most of the case this is a simple StringQuery
 +
if (query instanceof StringQuery) {
 +
StringQuery sQuery = (StringQuery) query;
 +
//get request
 +
String request = sQuery.getRequest();
 +
// do what you want with the request
 +
// ...
 +
}
 +
}
 +
 +
// exploring result set
 +
WRetrievalAnnotator wra = new WRetrievalAnnotator(new URI(set.getUri()), set.getPok());
 +
// getting the total number of result in the ResultSet
 +
Value<Integer> nbOfRes = wra.readNumberOfResults();
 +
if (nbOfRes != null) {
 +
int numberOfHits = nbOfRes.firstTypedValue();
 +
if (numberOfHits > 0) {
 +
 +
// getting offset of result list (ie rank of first hit)
 +
Value<Integer> expOffset = wra.readExpectedOffset();
 +
if(expOffset != null){
 +
int offset = expOffset.firstTypedValue();
 +
// do what you want with offset
 +
// ...
 +
}
 +
 +
// getting hit and link to docURI
 +
Value<URI> hits = wra.readHit();
 +
// do what you want with the list of hit
 +
// ...
 +
if (hits.size() > 0) {
 +
for (URI hit : hits) {
 +
wra.startInnerAnnotatorOn(hit);
 +
URI docURI = wra.readLinkedTo().firstTypedValue();
 +
// do what you want with docURI
 +
// ...
 +
 +
wra.endInnerAnnotator();
 +
}
 +
}
 +
}
 +
}
 +
</source>
 
=== XML sample ===
 
=== XML sample ===
 
<source lang="xml">
 
<source lang="xml">

Revision as of 05:05, 12 March 2012

This page describes the specification of content of a ResultSet that any implemented search service should returns.

Specification

To be compatible with other WebLab services and portlets, a ResultSet MUST follow these rules:

  • a ResultSet contains a PieceOfKnowledge
  • this PieceOfKnowledge contains all data about the results of the search
  • this PieceOfKnowledge is divided in two parts : information about the ResultSet and description of results

ResultSet Description

the number of results, the offset of the query and a reference to the query URI

<rdf:Description rdf:about="weblab://searcher/ResultSet1799034111" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:wretrieval="http://weblab.ow2.org/core/1.2/ontology/retrieval#">
   <rdf:type rdf:resource="http://weblab.ow2.org/core/1.2/ontology/model#ResultSet"/>
   <wretrieval:isResultOf rdf:resource="weblab://myQuery/thisone"/>
   <wretrieval:hasNumberOfResults>42</wretrieval:hasNumberOfResults>
   <wretrieval:hasQueryOffset>0</wretrieval:hasQueryOffset>
</rdf:Description>

The previous XML describes a ResultSet from a searcher;

  • ResultSet URI is weblab://indexsearch.searcher/resultSet
  • its number of results is 42
  • the query URI is weblab://myQuery/thisone
  • the query offset is 0

Results Description

ResultSet are composed by Hit. A Hit is entity that described a result and its metadata.

Metadata

Mandatory metadata are:

  • reference to the ResultSet
  • reference to the result entity

Optional metadata are:

  • short description of the result (snippet of text for example)
  • score
  • rank
<rdf:Description rdf:about="weblab://searcher/hit#0" xmlns:wretrieval="http://weblab.ow2.org/core/1.2/ontology/retrieval#">
      <rdf:type rdf:resource="http://weblab.ow2.org/core/1.2/ontology/retrieval#Hit"/>
      <wretrieval:inResultSetHit rdf:resource="weblab://searcher/ResultSet1799034111"/>
      <wretrieval:isLinkedTo rdf:resource="weblab://mytentityA"/>
      <wretrieval:hasDescription>short description of the result</wretrieval:hasDescription>
      <wretrieval:hasScore>0.6665389</wretrieval:hasScore>
      <wretrieval:hasRank>1</wretrieval:hasRank>
</rdf:Description>

The previous XML example defines a Hit with the following metadata:

  • the Hit belongs to a ResultSet with the following URI: weblab://searcher/ResultSet1799034111
  • the Hit references the result which URI is: weblab://mytentityA
  • the description of the result is: "short description of the result"
  • the score of this result compared to all results is 0.6665389
  • the rank of this result in the ResultSet is 1

Result

A result can be any resource described in RDF with its URI referenced in a Hit.

The following XML show a result from a text searcher (WebLab SOLr engine) that can be displayed in the result portlet:

<rdf:Description rdf:about="weblab://mytentityA">
  <dc:language>en</dc:language>
  <wprocessing:hasOriginalFileSize>4988</wprocessing:hasOriginalFileSize>
  <wprocessing:hasNativeContent rdf:resource="weblab://crawlerFolderContent/1248881938281/362"/>
  <wprocessing:isExposedAs>http://test/websites/NewsAfghan/www.myafghan.com/search20109.html</wprocessing:isExposedAs>
  <wprocessing:hasGatheringDate>2009-07-29T19:38:58+02:00</wprocessing:hasGatheringDate>
  <dct:modified>2009-04-24T21:19:01+02:00</dct:modified>
  <dct:extent>4988 bytes</dct:extent>
  <dc:title>My Afghan News - Afghanistan News Headlines for 2/29/2004</dc:title>
  <dc:source>http://www.myafghan.com/search2.asp?search=2/29/2004</dc:source>
  <wprocessing:hasOriginalFileName>search20109.html</wprocessing:hasOriginalFileName>
  dc:format>text/html</dc:format>
</rdf:Description>

With:

Example

Creating a ResultSet in JAVA

A JAVA class offers some example of ResultSet creation in JAVA. It is located in the helper-sample project and the class itself is org.ow2.weblab.core.helpers.samples.ResultSetCreator. The following example presents the creation of a simple ResultSet with minimal information:

		// create the ResultSet object 
		ResultSet rset = ResourceFactory.createResource("sample", "minimalRSet", ResultSet.class);

		// create a "fake" query that should be the one that is issued before the creation of the ResultSet
		Query q = QueryCreator.createFullTextQuery();

		// adding the query that generated the result set in the result set resources list
		rset.getResource().add(q);

		WRetrievalAnnotator wra = new WRetrievalAnnotator(rset);
		// add annotation to link result set and query
		wra.writeResultOf(new URI(q.getUri()));
		// add offset/limit expected
		wra.writeExpectedOffset(0);
		wra.writeExpectedLimit(10);
		// add number of results
		wra.writeNumberOfResults(10);

		// create a "hit" for each result to present
		for (int i = 0; i < 10; i++) {
			// define the hit URI (identifier)
			URI hitURI = new URI("weblab://sample/hit"+i);
			wra.writeHit(hitURI);

			// add information on the hit
			wra.startInnerAnnotatorOn(hitURI);
			// add hit's rank in the result list
			wra.writeRank(i+1);
			// add a description to the hit (here this is fake)
			wra.writeDescription("ablabalbalbla l balb al lbalb labl balb alb lab lbalba lb laal balb lba lba ... qskldqlksdl");
			wra.endInnerAnnotator();
		}

Reading a ResultSet in JAVA

The following example presents some possible use of a ResultSet to get minimal information:

			// getting SearchResult from a service
			SearchReturn res = solrSearcher.search(args);
			ResultSet set = res.getResultSet();

			// getting the original query
			Resource q = set.getResource().get(0);
			if (q instanceof Query) {
				Query query = (Query) q;
				// most of the case this is a simple StringQuery
				if (query instanceof StringQuery) {
					StringQuery sQuery = (StringQuery) query;
					//get request
					String request = sQuery.getRequest();
					// do what you want with the request
					// ...
				}
			}
			
			// exploring result set
			WRetrievalAnnotator wra = new WRetrievalAnnotator(new URI(set.getUri()), set.getPok());
			// getting the total number of result in the ResultSet
			Value<Integer> nbOfRes = wra.readNumberOfResults();
			if (nbOfRes != null) {
				int numberOfHits = nbOfRes.firstTypedValue();
				if (numberOfHits > 0) {
					
					// getting offset of result list (ie rank of first hit)
					Value<Integer> expOffset = wra.readExpectedOffset();
					if(expOffset != null){
						int offset = expOffset.firstTypedValue();
						// do what you want with offset
						// ...
					}
					
					// getting hit and link to docURI
					Value<URI> hits = wra.readHit();
					// do what you want with the list of hit
					// ...
					if (hits.size() > 0) {
						for (URI hit : hits) {
							wra.startInnerAnnotatorOn(hit);
							URI docURI = wra.readLinkedTo().firstTypedValue();
							// do what you want with docURI
							// ...

							wra.endInnerAnnotator();
						}
					}
				}
			}

XML sample

<resource xsi:type="ns3:ResultSet" uri="weblab://searcher/ResultSet1799034111" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dct="http://purl.org/dc/terms/" xmlns:ns3="http://weblab.ow2.org/core/1.2/ontology/model#" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
    <pok uri="weblab://geographicSearch/Pok1799034111">
        <data>
            <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
                <rdf:Description rdf:about="weblab://searcher/ResultSet1799034111" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:wretrieval="http://weblab.ow2.org/core/1.2/ontology/retrieval#" >
                    <rdf:type rdf:resource="http://weblab.ow2.org/core/1.2/ontology/model#ResultSet"/>
                    <wretrieval:isResultOf rdf:resource="weblab://myQuery/thisone"/>
                    <wretrieval:hasNumberOfResults>2</wretrieval:hasNumberOfResults>
                    <wretrieval:hasQueryOffset>0</wretrieval:hasQueryOffset>
                </rdf:Description>
 
              <rdf:Description rdf:about="weblab://searcher/hit#0" xmlns:wretrieval="http://weblab.ow2.org/core/1.2/ontology/retrieval#">
                    <rdf:type rdf:resource="http://weblab.ow2.org/core/1.2/ontology/retrieval#Hit"/>
                    <wretrieval:inResultSetHit rdf:resource="weblab://searcher/ResultSet1799034111"/>
                    <wretrieval:isLinkedTo rdf:resource="weblab://mytentityA"/>
                    <wretrieval:hasDescription>short description of the result</wretrieval:hasDescription>
                    <wretrieval:hasScore>0.6665389</wretrieval:hasScore>
                    <wretrieval:hasRank>1</wretrieval:hasRank>
              </rdf:Description>
 
              <rdf:Description rdf:about="weblab://searcher/hit#1" xmlns:wretrieval="http://weblab.ow2.org/core/1.2/ontology/retrieval#">
                    <rdf:type rdf:resource="http://weblab.ow2.org/core/1.2/ontology/retrieval#Hit"/>
                    <wretrieval:inResultSetHit rdf:resource="weblab://searcher/ResultSet1799034111"/>
                    <wretrieval:isLinkedTo rdf:resource="weblab://mytentityB"/>
                    <wretrieval:hasDescription>short description of another result</wretrieval:hasDescription>
                    <wretrieval:hasScore>0.7895444</wretrieval:hasScore>
                    <wretrieval:hasRank>2</wretrieval:hasRank>
              </rdf:Description>
 
              <rdf:Description rdf:about="weblab://mytentityA" xmlns:wprocessing="http://weblab.ow2.org/core/1.2/ontology/processing#">
                    <dc:language>en</dc:language>
                    <wprocessing:hasOriginalFileSize>4988</wprocessing:hasOriginalFileSize>
                    <wprocessing:hasNativeContent rdf:resource="weblab://crawlerFolderContent/1248881938281/362"/>
                    <wprocessing:isExposedAs>http://test/websites/NewsAfghan/www.myafghan.com/search20109.html</wprocessing:isExposedAs>
                    <wprocessing:hasGatheringDate>2009-07-29T19:38:58+02:00</wprocessing:hasGatheringDate>
                    <dct:modified>2009-04-24T21:19:01+02:00</dct:modified>
                    <dct:extent>4988 bytes</dct:extent>
                    <dc:title>My Afghan News - Afghanistan News Headlines for 2/29/2004</dc:title>
                    <dc:source>http://www.myafghan.com/search2.asp?search=2/29/2004</dc:source>
                    <wprocessing:hasOriginalFileName>search20109.html</wprocessing:hasOriginalFileName>
                    <dc:format>text/html</dc:format>
              </rdf:Description>
 
              <rdf:Description rdf:about="weblab://mytentityB" xmlns:wprocessing="http://weblab.ow2.org/core/1.2/ontology/processing#">
                    <dc:language>fr</dc:language>
                    <wprocessing:hasOriginalFileSize>5180</wprocessing:hasOriginalFileSize>
                    <wprocessing:hasNativeContent rdf:resource="weblab://crawlerFolderContent/1248886578271/363"/>
                    <wprocessing:isExposedAs>http://test/websites/NewsAfghan/www.myafghan.com/search20546.html</wprocessing:isExposedAs>
                    <wprocessing:hasGatheringDate>2009-07-30T22:21:57+02:00</wprocessing:hasGatheringDate>
                    <dct:modified>2009-07-23T20:41:55+02:00</dct:modified>
                    <dct:extent>5180 bytes</dct:extent>
                    <dc:title>My Afghan News - Afghanistan News Headlines for 2/30/2004</dc:title>
                    <dc:source>http://www.myafghan.com/search2.asp?search=2/30/2004</dc:source>
                    <wprocessing:hasOriginalFileName>search20546.html</wprocessing:hasOriginalFileName>
                    <dc:format>text/html</dc:format>
              </rdf:Description>
 
 
            </rdf:RDF>
        </data>
    </pok>
</resource>