WebLab Content Exchange
This page describe the current model proposed in WebLab in order to exchange raw content. This need is extremely important in the beginning of any processing chain that consume document from an unstructured source of information. In most of the case, the documents themselves come in their native format. Technically speaking, these are either raw file of stream and in any case only assimilable as arrays of bytes. This kind of content is not well handled over the SOAP used in the rest of WebLab information exchange. Thus another paradigm was needed.
After testing a lot of different solutions (inclusing byte arrays in XML, SOAP attachement, FTP...), we now recommend two simple solutions :
- simple file exchange over network shared folder ;
- use of HTTP protocol and WebDAV server.
This two solutions reflect two different consensus and their particular advantages or drawbacks will be described hereafter.
What content ?
The simplest solution is to rely on a standard shared file system. "pros":
- ease of use: nothing to change on service side since content are accessible as simple file.
- efficiency: using the OS native file system for sharing files (possibly among a network) is obviously efficient (no detailed benchmark though...).
- only possible on local network: this solution does not easily allow to share content through the web (ie. when the system is distributed) and thus is only possible on installation that are on the same network.
- rights managements: access rights to content may need to go through OS user rights which could be complex to synchronized with application level user rights (no complete tests so far).
HTTP and WebDAV server
- keeping on HTTP (as SOAP is already used)
- rights managed on WebDAV server
- less efficient: the complexity added by the HTTP layer should lead to less efficient exchange (no detailed benchmark though...).