Bundle 2.0.0

From WebLab Wiki
Revision as of 07:48, 13 May 2014 by Asaval (talk | contribs) (Overview)
Jump to navigationJump to search

The bundle aims to gather coherent services and portlets around the WebLab platform to demonstrate its capability in terms of service integration and orchestration for unstructured document processing and retrieval. The bundle can crawl a local folder (toIndex) in order to analyze text based documents, index them, and finally offer access to them through a portal. The processing capabilities are limited (only default rules for the named-entity extraction engine are used) but it demonstrates a complete processing chain and eases integration and testing of new components either in the processing chain or the user interface.

This bundle is regularly released on Download and build nightly with the latest services/portlets, see [1]. The WebLab Team uses the latest Bundle to make sure new services/portlets/components satisfy integration rules/compatibility.

Overview

WebLab Bundle integrates several platforms:

  • an Apache Tomcat server to deploy application services
  • a Liferay server to deploy user interface portlets
  • Camel an EIP messaging powered by an OGSi lightweight container Karaf to manage service chains

This bundle provides the following features:

  • Desktop document processing
  • WARC processing
  • Metadata extraction
  • Text and Metadata search
  • Entity extraction (People, Organisation, Location)
  • Annotated Document view

The WebLab Bundle allows you to process desktop files (word, pdf ...) and warc files.

Included services:

Included portlets:

The following processing chain is executed in the Bundle:

chaine.png

These services and chains are defined and configured in a Camel Context deployed in Karaf.

Installation

1. To install the WebLab Bundle; download the last Bundle.
2. Unzip the archive named WebLab-Bundle-XXX.zip 

You're done !

Launcher

The WebLab Bundle is controlled through the WebLab Launcher.

Starting WebLab

To start WebLab on Linux/Mac:

 weblab.sh start 

To start WebLab on Windows:

 weblab.bat start 

Stopping WebLab

To stop WebLab on Linux/Mac:

 weblab.sh stop 

To stop WebLab on Windows:

 weblab.bat stop 

Launcher Documentation

The WebLab Launcher has more advanced features documented here.

Configuration

WebLab configurations files are located in the conf directory. Configuration options are detailed in the Launcher documentation.

First Steps

We provide some data resources allowing you to test the WebLab Bundle functionalities in the data directory.

  • the toIndex directory contains pdf files about WebLab
  • the warcs directory contains a warc file crawled from the weblab-project.org

A typical use is:

  1. add documents (word, pdf ...) to the data/toIndex directory
  2. open your browser at http://localhost:8080
  3. click on the Search tab to search for your documents

Getting Started (Screencast)

This video demonstrates WebLab Bundle capabilities: