Structure Normaliser Service/1.0

From WebLab Wiki
Jump to navigationJump to search
Structure Normaliser Service
Details
Service Interfaces Analyser
Exchange model: WebLab 1.2.2
Versions: <ListSubPages />
Licence LGPL 2.1
Supported OS Linux
Binary Normaliser Service-SNAPSHOT/pdf-normalizer-service-Structure Normaliser Service-SNAPSHOT.war pdf-normalizer-service-Structure Normaliser Service-SNAPSHOT.war
Sources Normaliser Service-SNAPSHOT/pdf-normalizer-service-Structure Normaliser Service-SNAPSHOT-sources.jar pdf-normalizer-service-Structure Normaliser Service-SNAPSHOT-sources.jar
Javadoc Normaliser Service-SNAPSHOT/pdf-normalizer-service-Structure Normaliser Service-SNAPSHOT-javadoc.jar pdf-normalizer-service-Structure Normaliser Service-SNAPSHOT-javadoc.jar
SVN pdf-normalizer
Maven Artifact

<groupId>org.ow2.weblab.webservices</groupId>

<artifactId>pdf-normalizer-service</artifactId>

<version>Structure Normaliser Service-SNAPSHOT</version>
Normaliser Service-SNAPSHOT%22+OR+affectedVersion+%3D+%22pdf-normalizer-service-Structure Normaliser Service-SNAPSHOT%22+ORDER+BY+issuetype+DESC%2C+status+ASC%2C+key+DESC&tempMax=1000 Release Note


This service will retrieve the dc:source annotation from a resource and it will try to convert it in a PDF format. This service supports document from file system or http/ftp document from the web. Its goal is to normalise any multimedia document into WebLab resources without losing all the structure information.

This service will create WebLab MediaUnit and it will refer to structure ontology to annotate position, format and style into WebLab annotations.

Installation

You can install pdftohtml with the following script: egc_tools_install.sh

At the following prompt:

$ ./egc_tools_install.sh 
What do you want to install ?
All (by default, press enter)
1. only Libreoffice
2. only wkhtmltopdf
3. only pdftohtml

select 3, then press enter.

It will ask for your administrator's password to download required libraries, compile pdftohtml from sources and install it on your system.


Remark: most of existing linux operating systems already have this library installed. However those versions are too old, compatible versions of poppler will start from 0.20.x.

Configuration

none.

UsageContext effects

none.

Examples of SOAP Input/Output

Known Limitations

The Structure Normaliser can only process PDF files for the moment.

Dependencies

org.ow2.weblab.webservices:pdf-normalizer-service:war:1.0
+- jdom:jdom:jar:1.0:compile
+- jaxen:jaxen:jar:1.1:compile
|  +- dom4j:dom4j:jar:1.6.1:compile
|  +- xml-apis:xml-apis:jar:1.3.02:compile
|  \- xom:xom:jar:1.0:compile
|     \- xerces:xmlParserAPIs:jar:2.6.2:compile
+- org.ow2.weblab.core.helpers:rdf-helper-jena:jar:1.3.2:compile
|  \- com.hp.hpl.jena:jena:jar:2.6.4:compile
|     +- com.hp.hpl.jena:iri:jar:0.8:compile
|     +- com.ibm.icu:icu4j:jar:3.4.4:compile
|     +- org.slf4j:slf4j-api:jar:1.5.8:compile
|     \- org.slf4j:slf4j-log4j12:jar:1.5.8:runtime
+- org.apache.commons:commons-exec:jar:1.1:compile
+- org.jsoup:jsoup:jar:1.6.1:compile
+- net.sf.mime-util:mime-util:jar:1.2:compile
|  \- log4j:log4j:jar:1.2.14:runtime
+- org.ow2.weblab.core.helpers:rdf-helper-jena-selection:jar:1.5.3:compile
|  \- joda-time:joda-time:jar:1.6.2:compile
+- commons-io:commons-io:jar:2.0.1:compile
+- org.ow2.weblab.components:content-manager:jar:1.8.4-SNAPSHOT:compile
+- xerces:xercesImpl:jar:2.7.1:compile
+- commons-codec:commons-codec:jar:1.6:compile
+- org.ow2.weblab.core:model:jar:1.2.2:compile
+- org.ow2.weblab.core:extended:jar:1.2.2:compile
+- org.ow2.weblab.core:annotator:jar:1.2.4:compile
+- org.apache.cxf:cxf-rt-frontend-jaxws:jar:2.4.0:compile
|  +- xml-resolver:xml-resolver:jar:1.2:compile
|  +- asm:asm:jar:3.3:compile
|  +- org.apache.cxf:cxf-api:jar:2.4.0:compile
|  |  +- org.apache.cxf:cxf-common-utilities:jar:2.4.0:compile
|  |  +- org.apache.ws.xmlschema:xmlschema-core:jar:2.0:compile
|  |  \- org.apache.neethi:neethi:jar:3.0.0:compile
|  |     +- wsdl4j:wsdl4j:jar:1.6.2:compile
|  |     \- org.codehaus.woodstox:woodstox-core-asl:jar:4.1.1:compile
|  |        \- org.codehaus.woodstox:stax2-api:jar:3.0.2:compile
|  +- org.apache.cxf:cxf-rt-core:jar:2.4.0:compile
|  |  +- com.sun.xml.bind:jaxb-impl:jar:2.1.13:compile
|  |  \- org.apache.geronimo.specs:geronimo-javamail_1.4_spec:jar:1.7.1:compile
|  +- org.apache.cxf:cxf-rt-bindings-soap:jar:2.4.0:compile
|  |  +- org.apache.cxf:cxf-tools-common:jar:2.4.0:compile
|  |  \- org.apache.cxf:cxf-rt-databinding-jaxb:jar:2.4.0:compile
|  +- org.apache.cxf:cxf-rt-bindings-xml:jar:2.4.0:compile
|  +- org.apache.cxf:cxf-rt-frontend-simple:jar:2.4.0:compile
|  \- org.apache.cxf:cxf-rt-ws-addr:jar:2.4.0:compile
+- org.apache.cxf:cxf-rt-transports-http:jar:2.4.0:compile
|  +- org.apache.cxf:cxf-rt-transports-common:jar:2.4.0:compile
|  \- org.springframework:spring-web:jar:3.0.5.RELEASE:compile
|     +- aopalliance:aopalliance:jar:1.0:compile
|     +- org.springframework:spring-beans:jar:3.0.5.RELEASE:compile
|     +- org.springframework:spring-context:jar:3.0.5.RELEASE:compile
|     |  +- org.springframework:spring-aop:jar:3.0.5.RELEASE:compile
|     |  +- org.springframework:spring-expression:jar:3.0.5.RELEASE:compile
|     |  \- org.springframework:spring-asm:jar:3.0.5.RELEASE:compile
|     \- org.springframework:spring-core:jar:3.0.5.RELEASE:compile
+- xalan:xalan:jar:2.7.1:compile
|  \- xalan:serializer:jar:2.7.1:compile
+- commons-logging:commons-logging:jar:1.1.1:compile
+- junit:junit:jar:4.8.2:test
\- javax.servlet:servlet-api:jar:2.4:provided