WebLab 1.2.5/WebLab Exchange Model
From WebLab Wiki
In this page you will find the description of the 1.2.5 version of the data exchange model.
Background & guidelines
As services needs to collaborate through the WebLab workflow, a common data exchange model must be designed. It will be used among the services which will then be easily chained: a producer service will encode its result following the model and provide them to a consumer service which will decodeand then process it. The orchestration script will then be simpler since it won't need anay translation mechanisms between each service. The unique data exchange format will allow reducing the computational effort. The service chaining task will also be simplified and the introduction of new services will not need too much adaptation.
One can note that the data exchange model will allow describing the structure and parts of content of any document without defining a new document format. The main issue is to build a description which permit to each service to easily process the document. Thus the model must handle a description of the document structure in order to provide easy access to any section or sub-section of it, in accordance to any segmentation strategy. Then, an annotation mechanism should be designed in order to provide metadata on any of those sections. Those metadata could be from low level characteristics to high level semantic information extracted. They could be processed by other services (i.e. for low level picture description features) or directly available to the end-user (i.e. for semantic annotation).
However the model should not provide any description of the rendering features of the original document (such as bold and italic for text document) when it is not related to information extraction tasks. The model is also not intended to transport raw data, such as the original content, but will allow encapsulating small normalised parts.
Its main capabilities should be:
- allowing to describe different views of the same document or part of document (such as grey scale view of a picture or faces extracted from photography) without necessarily involving the full document content,
- adaptability and extensibility to any multimedia document,
- allowing to annotate the document content with various level of abstraction (low level feature to high semantic annotation) through an annotation framework, interoperable with standards,
- scalability to very large document corpus.
The existing document models used in state-of-the-art tools – IPTC, MPEG7, FDL, KLV or MXF – appears not to be well adapted to the scope of the model defined. Most of them are limited to certain document types (i.e. pictures for IPTC), does not provide a sufficient level of abstraction for annotation and/or are intended to transport content. Thus we develop our own WebLab model taking these multimedia standard initiatives as a starting point. One can note that other issues will arise about the raw content exchange and storage format (see more information about data storage in WebLab).
In order to develop the exchange model, we began by defining the data model on which we will deduce the model grammar. For technical purposes, this grammar is expressed through an XML-scheme and the service interface could reuse it in their own definition. The data exchange model will serve as a basis to define the SOAP messages. This reference will describe the object classes which will enable the description of any data type and their respective annotation.
The semantic Web standards (RDF, RDFS/OWL) have been used in order to ensure the sustainability of the model, its compatibility with existing software components involved in the application domain of the WebLab platform, the capability to exploit existing domain ontologies or build with semantic information extraction tools.
The core of the WebLab data exchange model is the Resource, defined hereafter. All other classes are sub-classes of Resource and/or are defined on top of it.
Note that the data exchange model has evolved and many changes have been taken into account in order to process correctly the content and to produce valid service answer. Complete version history can be found on a dedicated page.
A resource refers to any object that could be manipulated in the WebLab platform. It will be identified by a unique URI. It can hold some annotations that will describe the resource itself at a semantic level. It can also hold some low level descriptors that are describing itself at a lower level. It will be a common interface which will be inherited by almost all WebLab objects. Any reference to a resource will use its URI.
Piece of Knowledge
A PieceOfKnowledge (or PoK) object should be understood as the definition provided by the W3C in its resource description framework. One can note that we involve the RDF methodology concepts and design formalism which should not be misunderstood as the RDF/XML serialisation format. Each PoK is composed of a set of RDF statements composed of a set of triple: subject, property and object. Thus any description can be applied on any resource as soon as the property and object element refer to a consistent vocabulary in a certain domain. This vocabulary could be expressed in accordance to a specified ontology. However, one of the project's guidelines is extensibility, thus PieceOfKnowledge should be extendable using any OWLS or RDFS. The set of triple statements described a resource will be contained in the PieceOfKnowledge object serialised in the RDF/XML format. Since an PoK is a resource, it can also contain Annotations. The contained statements can be seen as meta-Annotations describing, for example, how and when the first level of annotations has been created (which service with what resource or configuration).
An Annotation will contain a set of RDF statements which refer to a specific resource or part of resource (segment) known as the subject of the annotations. Thus an annotation object is fully dependent on the resource it describes. So an Annotation is a PieceOfKnowledge but where each statement in its data refers to the resource it depends on. Annotation is one of the major objects manipulated through the WebLab platform since it will allow adding any particular information on any type of WebLab resource. For example, many process services will add annotations in order to link the extracted information to the resource processed (i.e. a language recognition service will annotate a document in order to note the language identified, or perhaps that the language could not be recognised).
A ComposedResource defines a simple aggregate of resources and allows to regroup them within the same unit. Since a multimedia document is a resource, the simplest ComposedResource will be a corpus of documents such as the video about the same program during one year. It could also be a set of useful resources needed by a service such as a set of annotations or a document with a attached PoK that allows to analyse it. As a resource, a ComposedResource could be annotated to provide, for example, meta-information on its content or the reasons of the aggregate.
Low Level Descriptor
A low level descriptor is composed of features that have been extracted from a document or a part of document. It aims at representing low level annotation (not at a semantic level but at a numerical level) mainly for computing similarity between objects For instance, Text could use lowLevelDescriuptor to store TFIDF values, Image could use lowLevelDescriuptor to handle color histograms. A low level descriptor may be composed of a set of features for instance, an image may contains edge feature Vector as well as red color histogram vector. As a resource, a low level descriptor could be annotated to provide, for example, meta information on its creation date, its version or its application domain.
The format used to represent features values in the low level descriptor object is open since it's really an application dependent and that no common models are today proposed. However, initiatives like the XRFF model should be taken as starting point to avoid the design of "yet-another-model".
Document description elements
This is the major object manipulated through the WebLab platform. It is mainly assumed that almost all processing services should accept the media unit as a common argument and provide enhanced media unit as a result. A media unit can be defined as the structural annotable representation of any multimedia content. It is a resource and thus can be retrieved by an URI and annotated by any kind of descriptive annotations or low level descriptors. As described in the following sections any particular type of media (i.e. picture, text, video segment, audio sample or video frame...) will inherit from this object. It could have a composed unit as a parent when it is contained in a group of units. Finally it could contain annotable segments which will allow describing sub-parts of its own content.
A document is assumed to be a composed unit which handle the references of all the unit which have been build on it through the multiple processes. As a resource, a document could be annotated to provide, for example, meta-information on the original document source, the author or the original creation date...
A text unit is the simplest specification of the media unit dedicated to full text data. Its unique attribute is a string which is actually the text extracted from the original document content. This content is optional, since sometimes it's too large to be handled in this field. In that case, we encourage to create a content (in the content package) and to annotate the text with the identifier (URI) of the created content. A simple text document such as HTML page could be described by a document which contains a unique text unit. As a resource, a text unit could be annotated and could also be described with low level descriptors.
An audio unit describes audio content out of media document. It does not have any particular attribute but only allow describing the structure. As a resource it can be annotated to describe the audio content. It can also be described using low level descriptors. The original content may be in content field or a reference using an annotation should be made to the content.
If the content is available,t, the FLAC codec is recommended as a general purpose and open encoding scheme.
An image unit describe a visual static content out of media document. It does not have any particular attribute but only allow describing the structure. As a resource it can be annotated to describe the content. It can also be described using low level descriptors. The original content may be in the content field or a reference using annotation should be made to the content.
If the content is available, the PNG format is recommended as an open and lossless data compression format.
A video unit describes a dynamic visual content out of media document. As a resource it can be annotated to describe the content. It can also be described using low level descriptors. The original content may be in the content field or a reference using annotation should be made to the content.
If present, the content should be base64 coded and the MPEG-4 Part 2 “Advanced Simple Profile” codec is recommended to encode the data.
Locating information in documents
A segment is not a WebLab Resource but it has a URI and is owned by a media unit. It allows to locate or identify the content of a unit with at much finer level. The segment could not be annotated but as it has an URI it could be the subject of triples in annotation at the media unit level. Thus it provides a way to assign annotation on a very low level of structure description without adding weight to the model by transporting the data.
The object contains positioning information which allows localising itself in the parent unit. As media type are very different the position itself will be specialised to them and thus provide localisation adapted to the media. This object cannot be used itself (this is an abstract object). However, multiple implementations adapted to each class of media are available. A class of media stands here for a list of segments which can be localised in the same manner.
A linear segment is a position which can be described by a start and stop reference in accordance to a number of UTF-8 characters metric. It is the kind of segment that should be used for text units. As a resource, linear segments could be annotated to provide. This should be done to provide information on a particular part a text, like a word for instance.
A temporal segment is a segment that is described by a start and stop reference in accordance to a millisecond metric. It is the kind of position that may be used for audio and video units.
A spatial segment is aiming to localise any meaningful content in a spatial document. For example, one can think to a picture showing some people. Specific segment can be localised with coordinate descriptors defining the silhouettes of each person in the picture. Then specific annotation can be applied to each segment (i.e. to declare that those are human beings). It is the kind of segment that may be used for image and video unit. When there are only two coordinates given associated to the spatialSegment, it means that the shape is a disk. The first coordinate is the centre and the second one in one point of the circle. Otherwise, coordinates are describing a closed shape since the last coordinate of the list will be linked to the first one.
Spatio Temporal Segment
A spatiotemporal segment aims at giving the instant location of an object in a spacial document (i.e. Video). For example, one can think to a video showing a moving object. Specific segments can be localised with coordinates descriptors defining the boundaries of the object in each of the video images where it appears. Then specific annotation can be applied to each segment. The way to define the location is the same than the SpatialSegment one. When there are only two coordinates given associated to the spatialSegment, it means that the shape is a disk. The first coordinate is the centre and the second one in one point of the circle. Otherwise, coordinates are describing a closed shape since the last coordinate of the list will be linked to the first one.
A coordinate is a simple container of two dimensional coordinate aiming at localising information spatially in a mediaUnit; for instance in an image or a video. Those dimensions are generally explained in pixel, given the fact that it starts at bottom-left with (0;0). Other measure somehow references are possible.
A TrackSegment provides a way to assign a single annotation on a set of linked segments. A TrackSegment is a kind of segment and pool several other segments in order to allow the desciption of an entity that appears several times in the annotated content.
Query and results
A query is a resource which contains request data. This data describe a each specific specialisation of query, in a specific format. The query object is then abstract and several specialisations will be used to describe any type of queries. A proposal is to adopt common standards in order to provide consistent definitions over all the WebLab platform. XQuery and SPARQL format has been identified as a promising candidate respectively for structured and semantic search.As a resource, a query could be annotated to provide, for example, meta-informations its creation date, its author, etc.
This kind of query is simply not a ComposedQuery. It contains a weight (with default value at 1.0) which could be used for Query combination.
This query is dedicated for any query expression language which can be expressed in a simple string such as full text queries, XQuery or SPARQL. Details on the language used to express the query should be provided in an Annotation on the query.
This query is dedicated for search by similarity and thus its content is one or more resources.
This kind of query is only an aggregation of query using a n-ary boolean operator to combine them. Composing queries using various composed queries will enable to deal with brackets. Please note that when dealing with the operators AND and OR, the number of queries should be at least two to be used; used with a simple query it will have no effects. NOT is a unary operator, so the list of queries should be reduced to one, but when using it in a more-that-one-query list, a AND NOT operator will be used in fact.
This is a simple restriction of String to reduce the values to the three boolean operators AND/OR/NOT. Note that in a n-ary context, the unary operator NOT is considered as a AND NOT.
A ResultSet is specific resource to described result of a Query. The results should be described semantically in the PoK linked to the ResultSet. See the WebLab ontology for the property on ResultSet. Additional and specific information could be added for results presentation in the Resources list.As a Resource, a ResultSet could be annotated to provide more information. But any information about the results by themselves and the query shall be in pok field.