Gate-extraction/2.5.0

From WebLab Wiki
< Gate-extraction(Redirected from Gate Extraction)
Jump to: navigation, search
Gate extraction service
Details
Service Interfaces Analyser, Configurable
Exchange model: WebLab 1.2.5
Versions: <ListSubPages />
Licence LGPL 2.1
Supported OS Windows/Linux/MacOs
Integrated COTS Gate
Binary gate-extraction-2.5.0.war
Sources gate-extraction-2.5.0-sources.jar
Javadoc gate-extraction-2.5.0-javadoc.jar
SVN gate-extraction
Maven Artifact

<groupId>org.ow2.weblab.webservices</groupId>

<artifactId>gate-extraction</artifactId>

<version>2.5.0</version>
Release Note


This service is an integration of the well-known text processing framework GATE.

It is able to load a Gate application of type CorpusAnalyser or ConditionalCorpusAnalyser from a GAPP.xml file (sometimes written .xgapp).

Each text section received in the process method is converted to a GATE Document and processed through the GATE application loaded.

At the end, GATE Annotations worn by each Document are written on the WebLab Document and Text units by an extendible class GateConverter.

Configuration

This service is configurable with two ways to change its behavior.

  • You have use the constructor of the class (using the CXF/Spring bean definition) to configure the four main settings:
    • gateHomePath: The path to Gate home folder (provided as a Spring Resource).
    • defaultGappFile: The default Gapp file (provided as a Spring Resource) to be used when no specific configuration has been done with using the configurable interface, see below.
    • pluginsPath: The path to the plugins repository (provided as a Spring Resource).
    • converter: An instance of the GateConverter in charge of reading Gate documents and annotate WebLab texts.


  • Finally, the use of the configuration interface can be used to set up specific values for the given usageContext. Values that are defined through constructor are the default value. They are used when an unknown or not configured usage context is used. The configuration only let change the gapp to be used for a specific context.


<configuration xmlns:model="http://weblab.ow2.org/core/1.2/model#" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" uri="weblab:///configuration/t" xsi:type="model:pieceOfKnowledge">
	<data>
		<rdf:RDF xmlns:gateConf="http://weblab.ow2.org/core/1.2/ontology/processing#gate/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
			<rdf:Description rdf:about="http://weblab.ow2.org/webservices/gateservice">
				<gateConf:gappFilePath>src/main/resources/eng_fre.gapp</gateConf:gappFilePath>
			</rdf:Description>
		</rdf:RDF>
	</data>
</configuration>


  • On top of that, Gapp files are highly configurable application that can be defined though the GATE UI and exported as XML gapp files. They enable you to define specific plugins (LanguageResource) to be executed over document of a given language for instance. Two other files can be overriden for configuration purposes: gate.xml and user-gate.xml. Both must be in the gate home folder.

UsageContext effects

The usageContext is the key used to select the Gate Pipeline that should process the input document.

If the usageContext is unknown (or null), then a default application is loaded. If the usageContext has not been used by the process method but has already been used by the configure method, then the configured application is loaded. If the usageContext has been used, then the application is reused. The execution of a gate application is protected behind a synchronised block to since they might not be thread safe.

Examples of SOAP Input/Output

Analyser:process

Input

<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:anal="http://weblab.ow2.org/core/1.2/services/analyser">
   <soapenv:Header/>
   <soapenv:Body>
      <anal:processArgs>
         <resource xsi:type="model:Document" uri="weblab://SmallEnglishTest/1" xmlns:model="http://weblab.ow2.org/core/1.2/model#" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
            <annotation uri="weblab://SmallEnglishTest/1#0-a2">
               <data>
                  <rdf:RDF xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
                     <rdf:Description rdf:about="weblab://SmallEnglishTest/1">
                        <dc:language>en</dc:language>
                     </rdf:Description>
                  </rdf:RDF>
               </data>
            </annotation>
            <mediaUnit xsi:type="model:Text" uri="weblab://SmallEnglishTest/1#0">
               <content>WebLab: An integration infrastructure to ease the development of multimedia processing applications
Patrick GIROUX, Stephan BRUNESSAUX, Sylvie BRUNESSAUX, Jérémie DOUCY, Gérard DUPONT, Bruno GRILHERES, Yann MOMBRUN, Arnaud SAVAL
Information Processing Control and Cognition (IPCC)
EADS Defence and Security Systems
Parc d'Affaire des Portes
27106 Val de Reuil
http://weblab-project.org
ipcc@weblab-project.org
{patrick.giroux, stephan.brunessaux, sylvie.brunessaux, jeremie.doucy, gerard.dupont, bruno.grilheres, yann.mombrun, arnaud.saval}@eads.com
Abstract:
In this paper, we introduce the EADS' WebLab platform (http://weblab-project.org) that aims at providing an integration infrastructure for multimedia information processing components. In the following, we explain the motivations that have led to the realisation of this project within EADS and the requirements that have led our choices. After a quick review of existing information processing platforms, we present the chosen service oriented architecture, and the three layers of the WebLab project (infrastructure, services and applications).
Then, we detail the chosen exchange model and normalised services interfaces that enable semantic interoperability between information processing components. We present the technical choices made to guarantee technical interoperability between the components by the use of an Enterprise Service Bus (ESB).
Moreover, we present the orchestration and portal mechanisms that we have added to the WebLab to enable architects to quickly build multimedia processing applications. In the following, we illustrate the integration process by describing three applications that have been developed on top of this architecture on three R&amp;D projects (Vitalas, WebContent and eWok-Hub). Finally, we propose some perspectives such as the realisation of an information processing services directory, or a toolkit following MDA (Model Driven Architecture) approach to ease the integration process.
Keywords:
Integration infrastructure, Service Oriented Architecture, Semantics, Multimedia Information Processing Platform.</content>
            </mediaUnit>
         </resource>
      </anal:processArgs>
   </soapenv:Body>
</soapenv:Envelope>

Output

<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
   <soap:Body>
      <ns3:processReturn xmlns:ns2="http://weblab.ow2.org/core/1.2/model#" xmlns:ns3="http://weblab.ow2.org/core/1.2/services/analyser">
         <resource xsi:type="ns2:Document" uri="weblab://SmallEnglishTest/1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
            <annotation uri="weblab://SmallEnglishTest/1#0-a2">
               <data xmlns:model="http://weblab.ow2.org/core/1.2/model#">
                  <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/">
                     <rdf:Description rdf:about="weblab://SmallEnglishTest/1">
                        <dc:language>en</dc:language>
                     </rdf:Description>
                  </rdf:RDF>
               </data>
            </annotation>
            <mediaUnit xsi:type="ns2:Text" uri="weblab://SmallEnglishTest/1#0">
               <annotation uri="weblab://SmallEnglishTest/1#0-a0">
                  <data>
                     <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dct="http://purl.org/dc/terms/" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:wlp="http://weblab.ow2.org/core/1.2/ontology/processing#" xmlns:wlr="http://weblab.ow2.org/core/1.2/ontology/retrieval#" xmlns:wookie="http://weblab.ow2.org/wookie#">
                        <rdf:Description rdf:about="http://weblab.ow2.org/wookie/instances/Unit#enterprise_service">
                           <rdfs:label>Enterprise Service</rdfs:label>
                           <rdf:type rdf:resource="http://weblab.ow2.org/wookie#Unit"/>
                        </rdf:Description>
                        <rdf:Description rdf:about="http://weblab.ow2.org/wookie/instances/Person#patrick_giroux">
                           <rdfs:label>Patrick GIROUX</rdfs:label>
                           <rdf:type rdf:resource="http://weblab.ow2.org/wookie#Person"/>
                        </rdf:Description>
                        <rdf:Description rdf:about="weblab://SmallEnglishTest/1#0-2">
                           <wlp:refersTo rdf:resource="http://weblab.ow2.org/wookie/instances/Person#stephan_brunessaux"/>
                        </rdf:Description>
                        <rdf:Description rdf:about="weblab://SmallEnglishTest/1#0-1">
                           <wlp:refersTo rdf:resource="http://weblab.ow2.org/wookie/instances/Unit#enterprise_service"/>
                        </rdf:Description>
                        <rdf:Description rdf:about="http://weblab.ow2.org/wookie/instances/Person#stephan_brunessaux">
                           <rdfs:label>Stephan BRUNESSAUX</rdfs:label>
                           <rdf:type rdf:resource="http://weblab.ow2.org/wookie#Person"/>
                        </rdf:Description>
                        <rdf:Description rdf:about="http://weblab.ow2.org/wookie/instances/Unit#dupont">
                           <rdfs:label>DUPONT</rdfs:label>
                           <rdf:type rdf:resource="http://weblab.ow2.org/wookie#Unit"/>
                        </rdf:Description>
                        <rdf:Description rdf:about="http://weblab.ow2.org/wookie/instances/Person#bruno_grilheres">
                           <rdfs:label>Bruno GRILHERES</rdfs:label>
                           <rdf:type rdf:resource="http://weblab.ow2.org/wookie#Person"/>
                        </rdf:Description>
                        <rdf:Description rdf:about="weblab://SmallEnglishTest/1#0-3">
                           <wlp:refersTo rdf:resource="http://weblab.ow2.org/wookie/instances/Person#patrick_giroux"/>
                        </rdf:Description>
                        <rdf:Description rdf:about="weblab://SmallEnglishTest/1#0-7">
                           <wlp:refersTo rdf:resource="http://weblab.ow2.org/wookie/instances/Person#yann_mombrun"/>
                        </rdf:Description>
                        <rdf:Description rdf:about="http://weblab.ow2.org/wookie/instances/Unit#eads_defence_and_security_systems">
                           <rdfs:label>EADS Defence and Security Systems</rdfs:label>
                           <rdf:type rdf:resource="http://weblab.ow2.org/wookie#Unit"/>
                        </rdf:Description>
                        <rdf:Description rdf:about="weblab://SmallEnglishTest/1#0-a0">
                           <wlp:isProducedBy rdf:resource="http://weblab.ow2.org/webservices/gateservice"/>
                        </rdf:Description>
                        <rdf:Description rdf:about="weblab://SmallEnglishTest/1#0-5">
                           <wlp:refersTo rdf:resource="http://weblab.ow2.org/wookie/instances/Unit#dupont"/>
                        </rdf:Description>
                        <rdf:Description rdf:about="weblab://SmallEnglishTest/1#0-4">
                           <wlp:refersTo rdf:resource="http://weblab.ow2.org/wookie/instances/Person#bruno_grilheres"/>
                        </rdf:Description>
                        <rdf:Description rdf:about="http://weblab.ow2.org/wookie/instances/Person#yann_mombrun">
                           <rdfs:label>Yann MOMBRUN</rdfs:label>
                           <rdf:type rdf:resource="http://weblab.ow2.org/wookie#Person"/>
                        </rdf:Description>
                        <rdf:Description rdf:about="weblab://SmallEnglishTest/1#0-6">
                           <wlp:refersTo rdf:resource="http://weblab.ow2.org/wookie/instances/Unit#eads_defence_and_security_systems"/>
                        </rdf:Description>
                     </rdf:RDF>
                  </data>
               </annotation>
               <segment xsi:type="ns2:LinearSegment" start="100" end="114" uri="weblab://SmallEnglishTest/1#0-3"/>
               <segment xsi:type="ns2:LinearSegment" start="116" end="134" uri="weblab://SmallEnglishTest/1#0-2"/>
               <segment xsi:type="ns2:LinearSegment" start="177" end="183" uri="weblab://SmallEnglishTest/1#0-5"/>
               <segment xsi:type="ns2:LinearSegment" start="185" end="200" uri="weblab://SmallEnglishTest/1#0-4"/>
               <segment xsi:type="ns2:LinearSegment" start="202" end="214" uri="weblab://SmallEnglishTest/1#0-7"/>
               <segment xsi:type="ns2:LinearSegment" start="281" end="314" uri="weblab://SmallEnglishTest/1#0-6"/>
               <segment xsi:type="ns2:LinearSegment" start="1383" end="1401" uri="weblab://SmallEnglishTest/1#0-1"/>
               <content>WebLab: An integration infrastructure to ease the development of multimedia processing applications
Patrick GIROUX, Stephan BRUNESSAUX, Sylvie BRUNESSAUX, Jérémie DOUCY, Gérard DUPONT, Bruno GRILHERES, Yann MOMBRUN, Arnaud SAVAL
Information Processing Control and Cognition (IPCC)
EADS Defence and Security Systems
Parc d'Affaire des Portes
27106 Val de Reuil
http://weblab-project.org
ipcc@weblab-project.org
{patrick.giroux, stephan.brunessaux, sylvie.brunessaux, jeremie.doucy, gerard.dupont, bruno.grilheres, yann.mombrun, arnaud.saval}@eads.com
Abstract:
In this paper, we introduce the EADS' WebLab platform (http://weblab-project.org) that aims at providing an integration infrastructure for multimedia information processing components. In the following, we explain the motivations that have led to the realisation of this project within EADS and the requirements that have led our choices. After a quick review of existing information processing platforms, we present the chosen service oriented architecture, and the three layers of the WebLab project (infrastructure, services and applications).
Then, we detail the chosen exchange model and normalised services interfaces that enable semantic interoperability between information processing components. We present the technical choices made to guarantee technical interoperability between the components by the use of an Enterprise Service Bus (ESB).
Moreover, we present the orchestration and portal mechanisms that we have added to the WebLab to enable architects to quickly build multimedia processing applications. In the following, we illustrate the integration process by describing three applications that have been developed on top of this architecture on three R&amp;D projects (Vitalas, WebContent and eWok-Hub). Finally, we propose some perspectives such as the realisation of an information processing services directory, or a toolkit following MDA (Model Driven Architecture) approach to ease the integration process.
Keywords:
Integration infrastructure, Service Oriented Architecture, Semantics, Multimedia Information Processing Platform.</content>
            </mediaUnit>
         </resource>
      </ns3:processReturn>
   </soap:Body>
</soap:Envelope>

Configurable:configure

Input

<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:con="http://weblab.ow2.org/core/1.2/services/configurable">
   <soapenv:Header/>
   <soapenv:Body>
      <con:configureArgs>
         <usageContext>default.gapp-configuration</usageContext>
         <configuration uri="weblab:///configuration/t" xsi:type="ns111:PieceOfKnowledge" xmlns:ns111="http://weblab.ow2.org/core/1.2/model#" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
            <data>
               <rdf:RDF xmlns:gate="http://weblab.ow2.org/core/1.2/ontology/processing#gate/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
                  <rdf:Description rdf:about="http://weblab.ow2.org/webservices/gateservice">
                     <gate:gappFilePath>../webapps/gate-extraction/WEB-INF/classes/default.gapp</gate:gappFilePath>
                  </rdf:Description>
               </rdf:RDF>
            </data>
         </configuration>
      </con:configureArgs>
   </soapenv:Body>
</soapenv:Envelope>

Output

<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
   <soap:Body>
      <conf:configureReturn xmlns:conf="http://weblab.ow2.org/core/1.2/services/configurable" />
   </soap:Body>
</soap:Envelope>

Configurable:resetConfiguration

Input

<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:conf="http://weblab.ow2.org/core/1.2/services/configurable">
   <soapenv:Header/>
   <soapenv:Body>
      <conf:resetConfigurationArgs>
         <usageContext>default.gapp-configuration</usageContext>
      </conf:resetConfigurationArgs>
   </soapenv:Body>
</soapenv:Envelope>

Output

<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
   <soap:Body>
      <conf:resetConfigurationReturn xmlns:conf="http://weblab.ow2.org/core/1.2/services/configurable" />
   </soap:Body>
</soap:Envelope>

Known Limitations

  • WEBLAB-673 - "-" is marked as a location when using the French plugin. This is already solved in the version 2.5.1-SNAPSHOT


Dependencies

List off all dependencies of this service:

org.ow2.weblab.webservices:gate-extraction:war:2.5.0
+- org.ow2.weblab.core.helpers:rdf-helper-jena:jar:1.4.1:compile
|  +- org.apache.jena:jena-core:jar:2.7.4:compile
|  |  +- org.apache.jena:jena-iri:jar:0.9.4:compile
|  |  +- xerces:xercesImpl:jar:2.9.0:compile
|  |  +- org.slf4j:slf4j-api:jar:1.6.4:compile
|  |  \- org.slf4j:slf4j-log4j12:jar:1.6.4:compile
|  \- org.apache.commons:commons-lang3:jar:3.2.1:compile
+- uk.ac.gate:gate-core:jar:8.0:compile
|  +- uk.ac.gate:gate-asm:jar:3.1:compile
|  +- uk.ac.gate:gate-compiler-jdt:jar:4.3.2-P20140317-1600:compile
|  +- commons-lang:commons-lang:jar:2.6:compile
|  +- commons-io:commons-io:jar:2.4:compile
|  +- org.jdom:jdom:jar:1.1.3:compile
|  +- net.sourceforge.nekohtml:nekohtml:jar:1.9.14:compile
|  +- org.codehaus.woodstox:woodstox-core-lgpl:jar:4.2.0:compile
|  |  \- org.codehaus.woodstox:stax2-api:jar:3.1.1:compile
|  +- org.apache.ivy:ivy:jar:2.3.0:compile
|  +- org.apache.ant:ant:jar:1.9.3:compile
|  |  \- org.apache.ant:ant-launcher:jar:1.9.3:compile
|  +- com.thoughtworks.xstream:xstream:jar:1.4.7:compile
|  +- xpp3:xpp3:jar:1.1.4c:runtime
|  +- jaxen:jaxen:jar:1.1.6:runtime
|  +- gnu.getopt:java-getopt:jar:1.0.13:compile
|  +- org.springframework:spring-aop:jar:3.0.7.RELEASE:compile
|  |  \- org.springframework:spring-asm:jar:3.0.7.RELEASE:compile
|  +- com.fasterxml.jackson.core:jackson-databind:jar:2.3.2:compile
|  |  +- com.fasterxml.jackson.core:jackson-annotations:jar:2.3.0:compile
|  |  \- com.fasterxml.jackson.core:jackson-core:jar:2.3.2:compile
|  \- org.xhtmlrenderer:flying-saucer-core:jar:9.0.4:compile
+- org.apache.tika:tika-core:jar:1.5:runtime
+- org.ow2.weblab.core:model:jar:1.2.5:compile
+- org.ow2.weblab.core:extended:jar:1.2.5:compile
+- org.ow2.weblab.core:annotator:jar:1.2.6:compile
|  \- joda-time:joda-time:jar:2.3:compile
+- org.apache.cxf:cxf-rt-frontend-jaxws:jar:2.6.11:compile
|  +- xml-resolver:xml-resolver:jar:1.2:compile
|  +- asm:asm:jar:3.3.1:compile
|  +- org.apache.cxf:cxf-api:jar:2.6.11:compile
|  |  +- org.codehaus.woodstox:woodstox-core-asl:jar:4.2.0:compile
|  |  +- org.apache.ws.xmlschema:xmlschema-core:jar:2.0.3:compile
|  |  +- org.apache.geronimo.specs:geronimo-javamail_1.4_spec:jar:1.7.1:compile
|  |  \- wsdl4j:wsdl4j:jar:1.6.3:compile
|  +- org.apache.cxf:cxf-rt-core:jar:2.6.11:compile
|  |  \- com.sun.xml.bind:jaxb-impl:jar:2.2.5.1:compile
|  +- org.apache.cxf:cxf-rt-bindings-soap:jar:2.6.11:compile
|  |  \- org.apache.cxf:cxf-rt-databinding-jaxb:jar:2.6.11:compile
|  +- org.apache.cxf:cxf-rt-bindings-xml:jar:2.6.11:compile
|  +- org.apache.cxf:cxf-rt-frontend-simple:jar:2.6.11:compile
|  \- org.apache.cxf:cxf-rt-ws-addr:jar:2.6.11:compile
|     \- org.apache.cxf:cxf-rt-ws-policy:jar:2.6.11:compile
|        \- org.apache.neethi:neethi:jar:3.0.2:compile
+- org.apache.cxf:cxf-rt-transports-http:jar:2.6.11:compile
+- org.apache.cxf:cxf-rt-management:jar:2.6.11:compile
+- org.springframework:spring-web:jar:3.0.7.RELEASE:compile
|  +- aopalliance:aopalliance:jar:1.0:compile
|  +- org.springframework:spring-beans:jar:3.0.7.RELEASE:compile
|  +- org.springframework:spring-context:jar:3.0.7.RELEASE:compile
|  |  \- org.springframework:spring-expression:jar:3.0.7.RELEASE:compile
|  \- org.springframework:spring-core:jar:3.0.7.RELEASE:compile
+- xalan:xalan:jar:2.7.1:compile
|  \- xalan:serializer:jar:2.7.1:compile
+- xml-apis:xml-apis:jar:1.3.04:compile
+- commons-logging:commons-logging:jar:1.1.3:compile
+- log4j:log4j:jar:1.2.17:runtime
+- junit:junit:jar:4.11:test
|  \- org.hamcrest:hamcrest-core:jar:1.3:test
\- javax.servlet:servlet-api:jar:2.5:provided