Folder Crawler Service/1.5.3

From WebLab Wiki
Jump to navigationJump to search
Folder Crawler Service
Details
Service Interfaces QueueManager, SourceReader, Configurable
Exchange model: WebLab 1.2.2
Versions: <ListSubPages />
Licence LGPL 2.1
Supported OS Windows/Linux/MacOs
Binary folder-crawler-service-{{{version}}}.war
Sources folder-crawler-service-{{{version}}}-sources.jar
Javadoc folder-crawler-service-{{{version}}}-javadoc.jar
SVN Crawler Service%2Ffolder-crawler-service%2F folder-crawler-service
Maven Artifact

<groupId>org.ow2.weblab.webservices</groupId>

<artifactId>folder-crawler-service</artifactId>

<version>{{{version}}}</version>
Release Note


This service aims to gather files from a local folder and create a Resource pointing to each file, in order to allow others services to work on these files.

Configuration

The property file FolderCrawlerService.config contains several options:

  • folders : the folder which will be crawled (../../toIndex by default)
  • extensions : the extensions (not) to be crawled, separated by a ; (svn;jpg by default)
  • recursive : if true, activate the recursion in the folders (true by default)
  • reject : if true, reject the extensions put above, if false, crawl only the extensions put above (true by default)

The Configurable interface allows to configure a specific folder to a specific UsageContext. The ConfigureArgs must contain :

UsageContext effects

The specification of a usage context allow to crawl a particular folder, previously configurated.

Examples of SOAP Input/Output

  • configure Request
<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:con="http://weblab.ow2.org/core/1.2/services/configurable">
   <soapenv:Header/>
   <soapenv:Body>
      <con:configureArgs>
         <usageContext>weblab://randomUri/randomUri</usageContext>
         <configuration uri="weblab://random.uri" xmlns:ns3="http://weblab.ow2.org/core/1.2/model#" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
    <data>
        <rdf:RDF xmlns:dct="http://purl.org/dc/terms/" xmlns:j.0="http://weblab.ow2.org/core/1.2/ontology/processing#crawler/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:wlr="http://weblab.ow2.org/core/1.2/ontology/retrieval#" xmlns:wlp="http://weblab.ow2.org/core/1.2/ontology/processing#">
            <rdf:Description rdf:about="weblab://randomUri/randomUri">
                <j.0:folder>/folder/To/Crawl</j.0:folder>
            </rdf:Description>
        </rdf:RDF>
    </data>
         </configuration>
      </con:configureArgs>
   </soapenv:Body>
</soapenv:Envelope>
  • configure Return
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
   <soap:Body>
      <ns4:configureReturn xmlns:ns2="http://weblab.ow2.org/core/1.2/services/queuemanager" xmlns:ns3="http://weblab.ow2.org/core/1.2/model#" xmlns:ns4="http://weblab.ow2.org/core/1.2/services/configurable" xmlns:ns5="http://weblab.ow2.org/core/1.2/services/sourcereader"/>
   </soap:Body>
</soap:Envelope>
  • resetConfiguration Request
<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:con="http://weblab.ow2.org/core/1.2/services/configurable">
   <soapenv:Header/>
   <soapenv:Body>
      <con:resetConfigurationArgs>
         <usageContext>weblab://randomUri/randomUri</usageContext>
      </con:resetConfigurationArgs>
   </soapenv:Body>
</soapenv:Envelope>
  • resetConfiguration Return
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
   <soap:Body>
      <ns4:resetConfigurationReturn xmlns:ns2="http://weblab.ow2.org/core/1.2/services/queuemanager" xmlns:ns3="http://weblab.ow2.org/core/1.2/model#" xmlns:ns4="http://weblab.ow2.org/core/1.2/services/configurable" xmlns:ns5="http://weblab.ow2.org/core/1.2/services/sourcereader"/>
   </soap:Body>
</soap:Envelope>
  • nextResource Request
<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:que="http://weblab.ow2.org/core/1.2/services/queuemanager">
   <soapenv:Header/>
   <soapenv:Body>
      <que:nextResourceArgs>
         <usageContext>weblab://randomUri/randomUri</usageContext>
      </que:nextResourceArgs>
   </soapenv:Body>
</soapenv:Envelope>
  • nextResource Return
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
   <soap:Body>
      <ns4:nextResourceReturn xmlns:ns2="http://weblab.ow2.org/core/1.2/model#" xmlns:ns3="http://weblab.ow2.org/core/1.2/services/sourcereader" xmlns:ns4="http://weblab.ow2.org/core/1.2/services/queuemanager" xmlns:ns5="http://weblab.ow2.org/core/1.2/services/configurable">
         <resource xsi:type="ns2:Document" uri="weblab://crawlerFolder/file0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
            <annotation uri="weblab://crawlerFolder/file0#a0">
               <data>
                  <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:wp="http://weblab.ow2.org/core/1.2/ontology/processing#">
                     <rdf:Description rdf:about="weblab://crawlerFolder/file0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dcterms="http://purl.org/dc/terms/">
                        <wp:hasNativeContent rdf:resource="file:/apache-tomcat-7.0.16/bin/data/content/weblab.7749916438996635653.content"/>
                        <wp:hasGatheringDate>2012-02-22T15:01:51+0100</wp:hasGatheringDate>
                        <wp:hasOriginalFileName>test.txt</wp:hasOriginalFileName>
                        <wp:hasOriginalFileSize>2</wp:hasOriginalFileSize>
                        <dc:source>/folder/To/Crawl/test.txt</dc:source>
                        <dcterms:extent>2 bytes</dcterms:extent>
                        <dcterms:modified>2012-02-22T14:58:49+0100</dcterms:modified>
                     </rdf:Description>
                  </rdf:RDF>
               </data>
            </annotation>
         </resource>
      </ns4:nextResourceReturn>
   </soap:Body>
</soap:Envelope>
  • getResource Request
<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:sour="http://weblab.ow2.org/core/1.2/services/sourcereader">
   <soapenv:Header/>
   <soapenv:Body>
      <sour:getResourceArgs>
         <usageContext>weblab://randomUri/randomUri</usageContext>
         <offset>0</offset>
         <limit>1</limit>
      </sour:getResourceArgs>
   </soapenv:Body>
</soapenv:Envelope>
  • getResource Return
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
   <soap:Body>
      <ns5:getResourceReturn xmlns:ns2="http://weblab.ow2.org/core/1.2/services/configurable" xmlns:ns3="http://weblab.ow2.org/core/1.2/services/queuemanager" xmlns:ns4="http://weblab.ow2.org/core/1.2/model#" xmlns:ns5="http://weblab.ow2.org/core/1.2/services/sourcereader">
         <resources uri="weblab://folderCrawlerService/tempCollection-1329919468795">
            <resource xsi:type="ns4:Document" uri="weblab://crawlerFolder/file0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
               <annotation uri="weblab://crawlerFolder/file0#a0">
                  <data>
                     <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:wp="http://weblab.ow2.org/core/1.2/ontology/processing#">
                        <rdf:Description rdf:about="weblab://crawlerFolder/file0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dcterms="http://purl.org/dc/terms/">
                           <wp:hasNativeContent rdf:resource="file:/apache-tomcat-7.0.16/bin/data/content/weblab.7139005485517505911.content"/>
                           <wp:hasGatheringDate>2012-02-22T15:04:28+0100</wp:hasGatheringDate>
                           <wp:hasOriginalFileName>test.txt</wp:hasOriginalFileName>
                           <wp:hasOriginalFileSize>2</wp:hasOriginalFileSize>
                           <dc:source>/folder/To/Crawl/test.txt</dc:source>
                           <dcterms:extent>2 bytes</dcterms:extent>
                           <dcterms:modified>2012-02-22T14:58:49+0100</dcterms:modified>
                        </rdf:Description>
                     </rdf:RDF>
                  </data>
               </annotation>
            </resource>
         </resources>
      </ns5:getResourceReturn>
   </soap:Body>
</soap:Envelope>

Known Limitations

N/A

Dependencies

List off all dependencies of this service:

org.ow2.weblab.webservices:folder-crawler-service:war:1.5.3-SNAPSHOT
+- org.ow2.weblab.components:folder-crawler:jar:1.8.3-SNAPSHOT:compile
|  \- org.ow2.weblab.components:content-manager:jar:1.8.3:compile
|  \- commons-io:commons-io:jar:2.0.1:compile
+- org.ow2.weblab.core.helpers:rdf-helper-jena:jar:1.3.2:compile
|  \- com.hp.hpl.jena:jena:jar:2.6.4:compile
|     +- com.hp.hpl.jena:iri:jar:0.8:compile
|     +- com.ibm.icu:icu4j:jar:3.4.4:compile
|     +- xerces:xercesImpl:jar:2.7.1:compile
|     +- org.slf4j:slf4j-api:jar:1.5.8:compile
|     +- org.slf4j:slf4j-log4j12:jar:1.5.8:runtime
|  \- log4j:log4j:jar:1.2.13:runtime
+- org.ow2.weblab.core:model:jar:1.2.2:compile
+- org.ow2.weblab.core:extended:jar:1.2.2:compile
+- org.ow2.weblab.core:annotator:jar:1.2.4:compile
|  \- joda-time:joda-time:jar:1.6.2:compile
+- org.apache.cxf:cxf-rt-frontend-jaxws:jar:2.4.0:compile
|  +- xml-resolver:xml-resolver:jar:1.2:compile
|  +- asm:asm:jar:3.3:compile
|  +- org.apache.cxf:cxf-api:jar:2.4.0:compile
|  |  +- org.apache.cxf:cxf-common-utilities:jar:2.4.0:compile
|  |  +- org.apache.ws.xmlschema:xmlschema-core:jar:2.0:compile
|  |  \- org.apache.neethi:neethi:jar:3.0.0:compile
|  |  +- wsdl4j:wsdl4j:jar:1.6.2:compile
|  |  \- org.codehaus.woodstox:woodstox-core-asl:jar:4.1.1:compile
|  |  \- org.codehaus.woodstox:stax2-api:jar:3.0.2:compile
|  +- org.apache.cxf:cxf-rt-core:jar:2.4.0:compile
|  |  +- com.sun.xml.bind:jaxb-impl:jar:2.1.13:compile
|  |  \- org.apache.geronimo.specs:geronimo-javamail_1.4_spec:jar:1.7.1:compile
|  +- org.apache.cxf:cxf-rt-bindings-soap:jar:2.4.0:compile
|  |  +- org.apache.cxf:cxf-tools-common:jar:2.4.0:compile
|  |  \- org.apache.cxf:cxf-rt-databinding-jaxb:jar:2.4.0:compile
|  +- org.apache.cxf:cxf-rt-bindings-xml:jar:2.4.0:compile
|  +- org.apache.cxf:cxf-rt-frontend-simple:jar:2.4.0:compile
|  \- org.apache.cxf:cxf-rt-ws-addr:jar:2.4.0:compile
+- org.apache.cxf:cxf-rt-transports-http:jar:2.4.0:compile
|  +- org.apache.cxf:cxf-rt-transports-common:jar:2.4.0:compile
|  \- org.springframework:spring-web:jar:3.0.5.RELEASE:compile
|     +- aopalliance:aopalliance:jar:1.0:compile
|     +- org.springframework:spring-beans:jar:3.0.5.RELEASE:compile
|     +- org.springframework:spring-context:jar:3.0.5.RELEASE:compile
|     |  +- org.springframework:spring-aop:jar:3.0.5.RELEASE:compile
|     |  +- org.springframework:spring-expression:jar:3.0.5.RELEASE:compile
|     |  \- org.springframework:spring-asm:jar:3.0.5.RELEASE:compile
|     \- org.springframework:spring-core:jar:3.0.5.RELEASE:compile
+- xalan:xalan:jar:2.7.1:compile
|  \- xalan:serializer:jar:2.7.1:compile
|     \- xml-apis:xml-apis:jar:1.3.04:compile
+- commons-logging:commons-logging:jar:1.1.1:compile
+- junit:junit:jar:4.8.2:test
\- javax.servlet:servlet-api:jar:2.4:provided