Bundle 2.0.0/ESB-Management

From WebLab Wiki
Jump to navigationJump to search

The Enterprise Service Bus (ESB) is composed of:

  • an OGSi lightweight container Karaf
  • ActiveMQ a messaging and Integration Patterns server.
  • Camel an integration framework based on EIP to define routing and mediation rules
  • Hawtio a web console

The ESB supports two main functions:

Overview

Karaf

Karaf provides the OSGi framework where ActiveMQ, Camel, and Hawtio are installed. It also supports OSGi Bundle deployment to deploy new processing chains or service definitions in OSGi.

A simple way to deploy a new OSGi bundle and Camel XML files is to copy them to the deploy directory in Karaf.

ActiveMQ

ActiveMQ provides a fast and reliable Messaging system. It is mostly used to manage JMS queues and to transfer messages.

Camel

Camel allows the user to define his own routes and to manage information flow. If you intend to add new services and to define new processing chains, we advise you to read the Camel documentation, especially the Enterprise Integration Pattern documentation.

We call a "Processing Chain" the set containing the definition of routing patterns and services.

In the following figure, the purple arrows define routing patterns and circles are service definitions:

chaine.png

Here is an example of a XML declaration of the Camel route used by the WebLab Processing Chain:

[...]
<route id="processWeblabDocsFromQueue" streamCache="true" autoStartup="true">

	<!-- Consuming messages -->
	<from uri="jms:queue:weblabdocsin?concurrentConsumers=1" />

	<log loggingLevel="INFO" message="New incoming WL resource to process" />
		
<!-- ============= -->
<!-- SERVICE CALLS -->
<!-- ============= -->

	<!-- call Tika service -->
	<to uri="weblab:analyser:service-tika" />
	<!-- call NGramJ service -->
	<to uri="weblab:analyser:service-ngramj" />
	<!-- call Gate service -->
	<to uri="weblab:analyser:service-gate" />

	<!-- call Solr & the repository services in // -->
	<multicast parallelProcessing="true">
		<to uri="weblab:indexer:service-solr" /> 
		<to uri="weblab:resourceSaver:service-repository" />
	</multicast>
		
</route>
[...]

You will find more explanations on this route below.

Hawtio

Hawtio is a web interface dedicated to WebLab administrators, it allows you to manage OSGi bundles, ActiveMQ JMS queues, Camel routes and much more.

By default the Hawtio web console is available at the following address: http://localhost:8282/hawtio (with login/password: weblab/weblab)

The following figure is an example of the Camel route above deployed in Karaf:

hawtio-processing-chain.png

Data gathering to WebLab Documents

This part explains how to fetch data and convert it into one or more WebLab Documents ready to be processed by WebLab services.

It corresponds to the blue box in the following figure:

chaine-gathering.png


For each data source we want to listen to, we apply the following pattern:

  1. connect to the data source and fetch available data,
  2. convert the data or part of the data into a WebLab Document,
  3. send each WebLab Document into a JMS queue named weblabdocsin.

We split data gathering from data processing in order to achieve more scalability. A user can define as many data sources as he wants and send all the WebLab Documents to one or more processing chains. Since all WebLab Documents are sent to JMS queues, the user can add several consumers on these queues.


In the WebLab Bundle, we can process desktop files and warcs files from their respective directories:

  1. these directories (/path/to/weblab/data/toIndex and /path/to/weblab/data/warcs) are our two data sources,
  2. for each file from those directories, we create WebLab Documents (see how to create WebLab Document for more details),
  3. we send those WebLab document to the JMS queue weblabdocsin.

The following XML defines this process:

<!-- This route consumes Files -->
<route id="consumeFile" streamCache="true" autoStartup="true">

	<!-- Read each file from the /path/to/weblab/data/toIndex directory see http://camel.apache.org/file2.html  -->
	<from uri="/path/to/weblab/data/toIndex" />

	<log message="new file to process ..." />

	<!-- Prepare annotations to add to the WebLab Document (its source, gathering date and size) -->
	<setHeader headerName="weblab:dc:source">
		<simple>${headers.CamelFileAbsolutePath}</simple>
	</setHeader>
	<setHeader headerName="weblab:wlp:hasGatheringDate">
		<simple resultType="java.util.Date">${date:now:yyyy-MM-dd'T'HH:mm:ss.SSSZ}</simple>
	</setHeader>
	<setHeader headerName="weblab:dc:modified">
		<simple resultType="java.util.Date">${header.CamelFileLastModified}</simple>
	</setHeader>
	<setHeader headerName="weblab:wlp:hasOriginalFileSize">
		<simple>${header.CamelFileLength}</simple>
	</setHeader>

	<!-- Convert the given file as a java.io.File -->
	<convertBodyTo type="java.io.File" />

	<!-- To produce a WebLab Document -->
	<to uri="weblab://create?type=Document&amp;outputMethod=xml" />

	<convertBodyTo type="java.lang.String" />

	<!-- Send the WebLab Document to a JMS queue -->
	<inOnly uri="jms:queue:weblabdocsin" />
</route>

The result of this route on the file /path/to/weblab/data/toIndex/WebLab ICSSEA08.pdf is:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<resource xsi:type="ns3:Document" uri="http://resource_3978a8a5-ad81-4f40-910c-ab8ad3f489ee" xmlns:ns3="http://weblab.ow2.org/core/1.2/model#" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
    <annotation uri="http://resource_3978a8a5-ad81-4f40-910c-ab8ad3f489ee#a0">
        <data>
            <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:wlp="http://weblab.ow2.org/core/1.2/ontology/processing#">
                <rdf:Description rdf:about="http://resource_3978a8a5-ad81-4f40-910c-ab8ad3f489ee" xmlns:dc="http://purl.org/dc/elements/1.1/">
                    <wlp:hasNativeContent rdf:resource="file:/path/to/weblab/data/content/weblab.6472339864147457026.content"/>
                    <wlp:hasOriginalFileSize rdf:datatype="http://www.w3.org/2001/XMLSchema#long">670707</wlp:hasOriginalFileSize>
                    <dc:source>/path/to/weblab/data/toIndex/WebLab ICSSEA08.pdf</dc:source>
                    <dc:modified rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2014-02-17T14:39:00+01:00</dc:modified>
                    <wlp:hasGatheringDate rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2014-02-17T15:59:05.857+01:00</wlp:hasGatheringDate>
                </rdf:Description>
        </rdf:RDF>
        </data>
    </annotation>
</resource>

WebLab Document processing chain

This part explains how WebLab Documents are routed and processed through WebLab services.

It corresponds to the blue box on the following figure:

chaine-processing.png

The WebLab Document is a central structure that will be sent from one WebLab service to another in order to enrich the document by processing and analysing its content through annotations. To achieve this, you have to:

  1. register all WebLab services as OSGi services
  2. create a Camel route using these services

Service definition

WebLab services are deployed in Tomcat. You can access them here: http://localhost:8181/manager/html (login/password: weblab/weblab) To register a WebLab service in the ESB, you have to define it as an OGSi service in Karaf.

There are already some WebLab services defined as OSGi services referencing CXF web services deployed on Tomcat. Here is the definition of the language extraction service (using ngramj):

[...]

<!-- CXF Endpoint referring to the language extraction service deployed on Apache Tomcat at http://localhost:8181/ngramj-language-extraction/analyser -->
<cxf:cxfEndpoint id="ngramj"
	address="http://localhost:8181/ngramj-language-extraction/analyser"
	serviceClass="org.ow2.weblab.core.services.Analyser">
	<cxf:properties>
		<entry key="dataFormat" value="PAYLOAD" />
		<entry key="allowStreaming" value="true" />
	</cxf:properties>
</cxf:cxfEndpoint>

<!-- The OSGi service referencing the previous CXF Endpoint -->
<osgi:service id="osgi-service-ngramj" ref="ngramj"  interface="org.apache.camel.Endpoint" >
	<osgi:service-properties>
   		<entry key="name" value="service-ngramj"/><!-- We will use this name when we refer to the service in Camel routes -->
	</osgi:service-properties>
</osgi:service>

[...]

All WebLab services can be referenced using their OSGi name property. By default those services are deployed with the following names:

  • weblab service tika with the OSGi service name: service-tika,
  • weblab service ngramj with the OSGi service name: service-ngramj,
  • weblab service gate with the OSGi service name: service-gate,
  • weblab service solr with the OSGi service name: service-solr,
  • weblab service simple-repository with the OSGi service name: service-repository.

Route definition

Using exposed OSGi services, we can create a processing chain with the OSGi services name property:

[...]
<route id="processWeblabDocsFromQueue" streamCache="true" autoStartup="true">

	<!-- Consuming messages -->
	<from uri="jms:queue:weblabdocsin?concurrentConsumers=1" />

	<log loggingLevel="INFO" message="New incoming WL resource to process" />
		
	<!-- ============== -->
	<!-- SERVICES CALLS -->
	<!-- ============== -->

	<!-- call Tika service -->
	<to uri="weblab:analyser:service-tika" />
	<!-- call NGramJ service -->
	<to uri="weblab:analyser:service-ngramj" />
	<!-- call Gate service -->
	<to uri="weblab:analyser:service-gate" />

	<!-- call Solr & the repository services in // -->
	<multicast parallelProcessing="true">
		<to uri="weblab:indexer:service-solr" /> 
		<to uri="weblab:resourceSaver:service-repository" />
	</multicast>
		
</route>
[...]

We use the WebLab Endpoint to call services with following syntax:

weblab:service-interface:OSGi-service-name-property

In this route the chain processes the document in several steps:

  1. it retrieves a document from the JMS weblabdocsin queue,
  2. it normalises the document content with the service-tika,
  3. it detects the language of the document thanks to the service-ngramj,
  4. it extracts named entities found with the service-gate,
  5. finally it indexes the annotated document in service-solr and stores it in the service-repository at the same time.

More details are available in Camel-WebLab services definition and call.

See Also