WebLab-Demo-1.1/Demonstration Installation Guide
This documentation is the WebLab demonstration installation guide.
This simple demo (available here) just aims to show the WebLab platform capability in term of service integration and orchestration for unstructured document processing and retrieval. In a nutshell, this demo can crawl a local folder in order to analyze text based document, index them to finally offer access to them through a portal. The processing capabilities are very limited (only gazeeter based on static dictionnary) but it allows to have a complete processing chains and aims to ease integration and test of new components either on processing chain or on user interface.
This demo presents an information retrieval system based on the complete Weblab Architecture. It is composed of several WebLab services :
- An homemade folder crawler able to explore the content of a given folder,
- A normaliser that will extract the text content of common desktop files (ms-office, pdf, rtf, etc.) based on Apache Tika,
- An homemade text formatter that removes extra newlines,
- An homemade information extraction service that detects words from gazeeters (technical gazeeters) in the document and annotate it,
- An indexer that will index the text content and make it searchable based on Apache SOLR.
A WebLab BPEL chain
- contains the 5 previous services.
And 4 WebLab portlets :
- A launchCrawl portlet that will launch the crawling of documents on the directory,
- A search portlet that will launch query on the SOLR searcher,
- A result portlet that display the results of the query,
- An annotatted document portlet that display the document annotated with the annotation added by the gazeeter service.
- You should have a jdk1.6.17 or greater from SUN (we encountered problems with openjdk) installed in order to run the WebLab demo,
- JAVA_HOME must be declared and java must be available in your path and should point to a path without space (e.g. on windows c:\java and not c:\program files\java)
- Ports 8080, 8005, 8009 (for tomcat), 8181, 8105, 8109 (for liferay) 8084, 7600, 7700, 7800, 7900 (for petals) should be free of use on the computer that runs the WebLab
Your computer should have at least 2Go of RAM to run the WebLab demo but 4Go is recommended to run efficiently. Remember that WebLab demo is a server application and not a desktop one.
Launch the installWebLab.sh (Linux) or installWebLab.bat (Windows) script from the root folder. You should be in a path without any whitespace (e.g. on windows c:\weblab and not c:\program files\weblab)
The demo uses mainly the following ports : liferay on 8181, tomcat on 8080 and petals on 8084. Other ports are used to monitor shutdown/restart command, however since all is installed on one machine, there is no specific network configuration needed. Only port 8181 need to be accessible if you want to connect an external client to the machine that support the WebLab. Other ports are used to monitor shutdown/restart command, however since all is installed on one machine, there is no specific network configuration needed. Only port 8181 need to be accessible if you want to connect an external client to the machine that support the WebLab.
- Launch the script called startupWebLab.sh (Linux) or startupWebLab.bat (Windows) regarding your OS, (It may take several minutes)
- Go with your favorite browser to http://localhost:8181/
- Log in with email firstname.lastname@example.org password demo,
- You can now launch an indexation that will crawl and analyse the content of your "toIndex" folder. By default, you will find inside some documents about the Weblab project. Then you can search on the acquired documents (for example type in the search field "test", to find document containing the word test)
If you want to restart a new indexing session from scratch, launch the script called resetWebLab.sh (Linux) or resetWebLab.bat (Windows). It will clear the current index and repository and make the WebLab ready for new indexing session. You don't need to stop the application before calling a reset.
You can stop the application by calling the shutdownWebLab.sh (Linux) or shutdownWebLab.bat (Windows). Note that the servers may take several seconds to be stopped depending on your hardware resources.
You may want to go further by adding some services or portlet to this demonstration. To do so, please go to Customizing services and processing chains.