Bundle 1.2.6

From WebLab Wiki
Jump to navigationJump to search

The bundle aims to gather coherent services and portlets around the WebLab platform to demonstrate its capability in term of service integration and orchestration for unstructured document processing and retrieval. The bundle can crawl a local folder (toIndex) in order to analyze text based documents, index them to finally offer access to them through a portal. The processing capabilities are limited (only default rules for the named-entity extraction engine are used) but it allows to have a complete processing chain and ease integration and test of new components either on processing chain or on user interface.

This bundle is regularly released on Download and build nightly with latest services/portlets, see [1]. WebLab Team is using the last Bundle to make sure new services/portlets/components satisfy integration rules/compatibility.


Overview

WebLab Bundle 1.2.6 provides the following features:

 * a configured Apache Tomcat server to deploy new services
 * a configured Liferay server to deploy new portlets
 * a configured Petals ESB to create new chains of services 

To ease integration, we provide following basic functionnalities:

 * Desktop documents processing
 * WARC processing
 * Metadata extraction
 * Text and Metadata search
 * Entities extraction (People, Organisation, Location)
 * Annotated Document view

Installation

1. To install the WebLab Bundle; download the last Bundle.
2. Unzip the archive named WebLab-Bundle-XXX.zip 

You're done !

The first directories levals should be:

WebLab-Bundle-1.2.6
├── apache-tomcat-7.0.26
│   └── [...]
├── conf
│   ├── contentManager.properties
│   ├── exposed-configuration
│   ├── registry.xml
│   ├── samples
│   ├── services
│   └── solr
├── data
│   ├── toIndex
│   └── warcs
├── liferay-portal-6.1.0-ce-ga1
│   └── [...]
├── petals-esb-4.0
│   └── [...]
├── README.txt
├── weblab.bat
├── weblab-launcher.jar
└── weblab.sh

Launcher

WebLab Bundle is controlled through the WebLab Launcher. This launcher allows you to manage WebLab Bundle:

  • weblab.sh start or weblab.bat start will start the Bundle
  • weblab.sh stop or weblab.bat stop will stop the Bundle
  • weblab.sh status or weblab.bat status will give you status for each part of the Bundle

The WebLab Launcher has more advanced features documented here.

First Steps

We provide some data resources allowing you to test WebLab Bundle functionalities into *data* directory.

 * *toIndex* directory contains pdf files about WebLab
 * *warcs* directory contains a warc
   <https://webarchive.jira.com/wiki/display/Heritrix/Heritrix> file
   crawled from the weblab-project.org 

A typical use is:

1. add documents (word, pdf ...) in the *data/toIndex* directory
2. open your browser at http://localhost:8080
3. click on the *Index* tab
4. click on the button, it will execute the processing chain (see
   below) on your documents
5. wait a bit until some documents are beginning to be indexed
6. click on the *Search* tab to search for you documents 


Details

WebLab Bundle allows you to process desktop files (word, pdf ...) and warcs [2].

Included services:

 * folder-listener
 * tika-nomalizer
 * simple-gazetteer
 * gate-extraction
 * solr-engine 

Included portlets:

 * launch-chain
 * simple-search
 * facet
 * result
 * metadata
 * document-viewer 

The following processing chain is executed in the Bundle:

weblab-processing-chain.png

These services and chains are defined and configured in the WebLab Registry.


Bundle configuration

These configuration files are managed in the *conf* directory in WebLab Bundle home.

Hierarchy:

conf
├── contentManager.properties
├── camelBeans.xml
├── registry.xml
├── exposed-configuration
│   ├── weblab-client.xml
│   └── weblab-portlet-filters.xml
├── samples
│   ├── cxf-bean.xml.sample
│   ├── server.xml
│   ├── sa-jbi.sample
│   ├── su-jbi.consumer
│   └── su-jbi.sample
├── services
│   ├── folder-listener.cxfBeanFile.xml
│   └── simple-file-repository.cxf-servlet.xml
└── solr
[...]

HowTos

The following howtos are the most frequently asked question about WebLab Bundle.


How to change Tomcat/Liferay version ?

You can easily change the Tomcat or Liferay server version, to do this:

1. unzip the WebLab-Bundle archive
2. backup web application you want to keep (for example: web-services
   from Tomcat)
3. remove current unwanted server (for example: apache-tomcat-7.0.26)
4. replace it by your version (for example: apache-tomcat-my-version)
5. update weblab.properties to modify your server home; see Custom
   server </index.php?title=WebLab_1.2.5/WebLab_Launcher#Custom> (for
   example: /path/to/WebLab-Bundle-1.2.5/apache-tomcat-my-version) 


How to change directory to listen for documents to process ?

To change the directory:

1. update *weblab.files* from weblab.properties (see Custom files
   directory
   </index.php?title=WebLab_1.2.5/WebLab_Launcher#Tomcat_Custom>) with
   the path to the directory. 


How to add a new service/portlet ?

You may want to go further by adding some Services or Portlets to this demonstration. To do so you can use the WebLab Developer Dashboard to learn developing services, portlets and chains. You can also ask your questions on the WebLab mailing list: user@weblab-project.org

NB: "data/toIndex" and "data/warcs" folders contain a file provided for testing purpose.

How to add a new endpoint refering to a WebLab service ?

1. Make sure your service is available
2. Copy the url to the wsdl exposed by your service (i.e.
   http://localhost:8181/simple-gazetteer/simple-gazetteer?wsdl )
3. execute the following command: 

weblab.sh chain add your_wsdl_url or weblab.bat chain add your_wsdl_url

You are done. You can make sure the endpoint is available when checking petals status with:

weblab.sh petals status or weblab.bat petals status

Other

If you have another question, send a mail on *user@weblab-project.org*


Known Issues

Why WebLab does not process all my WARC files ?

Warc crawler from folder-listener has a garbage collector issue see http://jira.ow2.org/browse/WEBLAB-798


I already have a Tomcat installed on my computer, why does the WebLab not work ?

WebLab is also using Tomcat, since Tomcat applications may require to define the environment variable CATALINA_HOME, it might lead to some conflict preveting WebLab to start. Please, make sure the CATALINA_HOME points to the WebLab Tomcat.


I can not index documents, portlet says there is an error at startup, why are some portlets missing or not deployed in Liferay ?

Sometimes Spring might generate an invalid applicationContext.xml file that contains the wrong version number preventing portlets to deploy correctly. See http://jira.ow2.org/browse/WEBLAB-832 to fix the problem.


Why does PEtALS not start when there is a space in its path variable ?

See http://jira.petalslink.com/browse/PETALSESBCONT-89


When I click on the "Launching the indexing process !" button, I keep getting the following error message "Error during indexation process."

Something went wrong during WebLab intialisation. Check WebLab status (with the command: /weblab.bat status/ or /weblab.sh status/), all elements must be *STARTED*. If not, please contact the support at /user@weblab-project.org/ and join the *weblab.log* file.