Solr-engine/2.4.0

From WebLab Wiki
Jump to: navigation, search
Indexing and search service based on remote Solr
Details
Service Interfaces Indexer, Searcher, Analyser
Exchange model: WebLab 1.2.5
Versions: <ListSubPages />
Licence LGPL 2.1
Supported OS Windows/Linux/MacOS
Integrated COTS SolR
Binary solr-engine-2.4.0.war
Sources solr-engine-2.4.0-sources.jar
Javadoc solr-engine-2.4.0-javadoc.jar
SVN solr-engine
Maven Artifact

<groupId>org.ow2.weblab.webservices</groupId>

<artifactId>solr-engine</artifactId>

<version>2.4.0</version>
Release Note



It's a search engine based on on Apache Solr and it allows (1) to index text based content, (2) to enable text search with possibly advanced set f queries and (3) provide search support through multiple analyser services. The list of services available is:

  • <solr-engineURL>/indexer : standard text indexing service ;
  • <solr-engineURL>/searcher : search service accepting StringQuery, SimilarityQuery and ComposeQuery ;
  • <solr-engineURL>/highlighter : analyser service applied on ResultSet and which allow to create snippet and highlight relevant text given a text query ;
  • <solr-engineURL>/resultSetMetaEnricher : analyser service applied on ResultSet and which add metadata on the item linked to the result hits ;
  • <solr-engineURL>/facetSuggestion : analyser service applied on ResultSet or StringQuery and which provide facets as query suggested ;
  • <solr-engineURL>/dataoperator : technical interface that allows to get reports and manage dataset (ie indexes) stored on service side.


Configuration

One single configuration file - <solr-engine-home>/WEB-INF/cxf-servlet.xml - enable to configure all services.

Installation and deployment

Launch the standalone Solr server using the correct configuration (index scheme aligned with the one defined in the service). Configure the service with the URL of Solr server and deploy it on an application server.

UsageContext effects

By default, UsageContext are mapped to a specific index core as defined in Solr (see Solr wiki page on core). It means that a document indexed with a specific UsageContext will only be found while searching using the same UsageContext. This implementation corresponds to a simple data separation mechanism. Note that since the UsageContext is optionnal, there is a "default core" that is used when no UsageContext are passed as argument (both on index and search).

Finally, there is a "noCore" mode for each service. Setting this property to "true" will make the service only use the "default core" and thus ignoring any UsageContext that could be passed as argument.

Examples of SOAP Input/Output

I/O Indexer
Input
<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" 
xmlns:ind="http://weblab.ow2.org/core/1.2/services/indexer">
   <soapenv:Header/>
   <soapenv:Body>
      <ind:indexArgs>
         <resource xsi:type="ns3:Document" uri="weblab://gutenberg/ebook-12092" 
xmlns:ns3="http://weblab.ow2.org/core/1.2/model#" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
            <annotation uri="weblab://gutenberg/ebook-12092#a0">
               <data>
                  <rdf:RDF xmlns:wp="http://weblab.ow2.org/core/1.2/ontology/processing#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
                     <rdf:Description rdf:about="weblab://gutenberg/ebook-12092" xmlns:dc="http://purl.org/dc/elements/1.1/">
                        <wp:hasGatheringDate>2011-08-18T18:56:00+0200</wp:hasGatheringDate>
                        <dc:source>http://www.gutenberg.org/files/</dc:source>
                     </rdf:Description>
                  </rdf:RDF>
               </data>
            </annotation>
            <mediaUnit xsi:type="ns3:Text" uri="weblab://gutenberg/ebook-12092#1">
               <content>The Project Gutenberg EBook of The Oxford Movement, by R.W. Church
This eBook is for the use of anyone anywhere at no cost and with
almost no restrictions whatsoever.  You may copy it, give it away or
re-use it under the terms of the Project Gutenberg License included
with this eBook or online at www.gutenberg.net</content>
            </mediaUnit>
         </resource>
      </ind:indexArgs>
   </soapenv:Body>
</soapenv:Envelope>
Output
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
   <soap:Body>
      <ns7:indexReturn xmlns:ns3="http://weblab.ow2.org/core/1.2/model#" 
xmlns:ns7="http://weblab.ow2.org/core/1.2/services/indexer" />
   </soap:Body>
</soap:Envelope>
TODO: This part has to be completed. Adding I/O sample for the other interfaces : Searcher and the multiple Aanalyser (meta enrichment, facet suggestion, highlighter and spell suggestion).

Known Limitations

All services need to connect to a standard Solr server that should be started in background. This component has a strong dependency to the index configuration (ie definition of fields) and server configuration (ie query-handler) of the remote solr server. A default solr-home with all necessary files is present in the source of the component:

src/main/resources/solr/

This should be used as solr-home to meet basic needs and provide a good start for fine tuning.


Dependencies

List off all dependencies of this service:

org.ow2.weblab.webservices:solr-engine:war:2.4.0
+- org.ow2.weblab.core.helpers:rdf-helper-jena:jar:1.4.0:compile
|  \- org.apache.jena:jena-core:jar:2.7.2:compile
|     +- org.apache.jena:jena-iri:jar:0.9.2:compile
|     \- xerces:xercesImpl:jar:2.9.0:compile
+- org.ow2.weblab.core.helpers:rdf-helper-jena-selection:jar:1.6.1:compile
|  \- joda-time:joda-time:jar:2.1:compile
+- org.apache.solr:solr-solrj:jar:4.2.1:compile
|  +- org.apache.zookeeper:zookeeper:jar:3.4.5:compile
|  +- org.apache.httpcomponents:httpclient:jar:4.2.1:compile
|  |  \- org.apache.httpcomponents:httpcore:jar:4.2.1:compile
|  +- org.apache.httpcomponents:httpmime:jar:4.2.3:compile
|  +- org.codehaus.woodstox:wstx-asl:jar:3.2.7:runtime
|  \- org.slf4j:slf4j-api:jar:1.6.4:compile
+- commons-io:commons-io:jar:2.3:compile
+- org.ow2.weblab.core.helpers:samples:jar:1.1.0:test
+- org.apache.solr:solr-core:jar:4.2.1:test
|  +- org.apache.lucene:lucene-core:jar:4.2.1:test
|  +- org.apache.lucene:lucene-codecs:jar:4.2.1:test
|  +- org.apache.lucene:lucene-analyzers-common:jar:4.2.1:test
|  +- org.apache.lucene:lucene-analyzers-kuromoji:jar:4.2.1:test
|  +- org.apache.lucene:lucene-analyzers-phonetic:jar:4.2.1:test
|  +- org.apache.lucene:lucene-highlighter:jar:4.2.1:test
|  +- org.apache.lucene:lucene-memory:jar:4.2.1:test
|  +- org.apache.lucene:lucene-misc:jar:4.2.1:test
|  +- org.apache.lucene:lucene-queryparser:jar:4.2.1:test
|  +- org.apache.lucene:lucene-spatial:jar:4.2.1:test
|  |  \- com.spatial4j:spatial4j:jar:0.3:test
|  +- org.apache.lucene:lucene-suggest:jar:4.2.1:test
|  +- org.apache.lucene:lucene-grouping:jar:4.2.1:test
|  +- org.apache.lucene:lucene-queries:jar:4.2.1:test
|  +- commons-codec:commons-codec:jar:1.7:compile
|  +- commons-cli:commons-cli:jar:1.2:test
|  +- commons-fileupload:commons-fileupload:jar:1.2.2:test
|  +- org.restlet.jee:org.restlet:jar:2.1.1:test
|  +- org.restlet.jee:org.restlet.ext.servlet:jar:2.1.1:test
|  +- org.slf4j:jcl-over-slf4j:jar:1.6.4:test
|  +- org.slf4j:slf4j-jdk14:jar:1.6.4:test
|  +- commons-lang:commons-lang:jar:2.6:test
|  \- com.google.guava:guava:jar:13.0.1:test
+- org.mortbay.jetty:jetty:jar:6.1.26:test
|  +- org.mortbay.jetty:jetty-util:jar:6.1.26:test
|  \- org.mortbay.jetty:servlet-api:jar:2.5-20081211:test
+- org.slf4j:slf4j-log4j12:jar:1.6.4:runtime
|  \- log4j:log4j:jar:1.2.16:compile
+- org.ow2.weblab.core:model:jar:1.2.5:compile
+- org.ow2.weblab.core:extended:jar:1.2.5:compile
+- org.ow2.weblab.core:annotator:jar:1.2.5:compile
+- org.apache.cxf:cxf-rt-frontend-jaxws:jar:2.6.3:compile
|  +- xml-resolver:xml-resolver:jar:1.2:compile
|  +- asm:asm:jar:3.3.1:compile
|  +- org.apache.cxf:cxf-api:jar:2.6.3:compile
|  |  +- org.codehaus.woodstox:woodstox-core-asl:jar:4.1.4:runtime
|  |  |  \- org.codehaus.woodstox:stax2-api:jar:3.1.1:runtime
|  |  +- org.apache.ws.xmlschema:xmlschema-core:jar:2.0.3:compile
|  |  +- org.apache.geronimo.specs:geronimo-javamail_1.4_spec:jar:1.7.1:compile
|  |  \- wsdl4j:wsdl4j:jar:1.6.2:compile
|  +- org.apache.cxf:cxf-rt-core:jar:2.6.3:compile
|  |  \- com.sun.xml.bind:jaxb-impl:jar:2.2.5:compile
|  +- org.apache.cxf:cxf-rt-bindings-soap:jar:2.6.3:compile
|  |  \- org.apache.cxf:cxf-rt-databinding-jaxb:jar:2.6.3:compile
|  +- org.apache.cxf:cxf-rt-bindings-xml:jar:2.6.3:compile
|  +- org.apache.cxf:cxf-rt-frontend-simple:jar:2.6.3:compile
|  \- org.apache.cxf:cxf-rt-ws-addr:jar:2.6.3:compile
|     \- org.apache.cxf:cxf-rt-ws-policy:jar:2.6.3:compile
|        \- org.apache.neethi:neethi:jar:3.0.2:compile
+- org.apache.cxf:cxf-rt-transports-http:jar:2.6.3:compile
+- org.apache.cxf:cxf-rt-management:jar:2.6.3:compile
+- org.springframework:spring-web:jar:3.0.7.RELEASE:compile
|  +- aopalliance:aopalliance:jar:1.0:compile
|  +- org.springframework:spring-beans:jar:3.0.7.RELEASE:compile
|  +- org.springframework:spring-context:jar:3.0.7.RELEASE:compile
|  |  +- org.springframework:spring-aop:jar:3.0.7.RELEASE:compile
|  |  +- org.springframework:spring-expression:jar:3.0.7.RELEASE:compile
|  |  \- org.springframework:spring-asm:jar:3.0.7.RELEASE:compile
|  \- org.springframework:spring-core:jar:3.0.7.RELEASE:compile
+- xalan:xalan:jar:2.7.1:compile
|  \- xalan:serializer:jar:2.7.1:compile
+- xml-apis:xml-apis:jar:1.3.04:compile
+- commons-logging:commons-logging:jar:1.1.1:compile
+- junit:junit:jar:4.10:test
|  \- org.hamcrest:hamcrest-core:jar:1.1:test
\- javax.servlet:servlet-api:jar:2.5:provided