WebLab Document Annotation

From WebLab Wiki

Jump to: navigation, search

This page describes WebLab Document Annotation principles. One should follow these guidelines to develop services fully compatible with already existing WebLab services.

Contents

Introduction

The WebLab model provide a very flexible approach to annotate documents (and in general resources) in order to enable any meta-data to be added on any document or part of document. Every WebLab resource can be annotated. A semantic annotation will link a resource to a concept or any entity out of an ontology or provide a property value explicitly in the annotation. Each annotation will be expressed as a RDF statement with respect to the common standards in order to facilitate the services interoperability. The statement will be composed as follow:

File:Graph-ex.gif

  • A subject which is the annotated resource ;
  • A predicate defining the property applied to the subject ;
  • An object which can be another resource or a literal value using a simple type (string, integer, float, date...).

One can note that this is the common Resource Description Framework as defined by the W3C (see RDF concepts and abstract syntax and it will be expressed through the RDF/XML syntax (see RDF/XML syntax). The RDF/XML content will be integrated in the model in Piece of Knowledge (simply called PoK) or Annotation (sub class of PoK) in order to be linked to a resource and exchanged during the processing chain.

Through this mechanism, any semantic annotation could be added to a resource. However in the context of a specific application and in order to achieve interoperability between services that produce annotations and services that exploit it, common ontologies must be selected. The main idea is to reuse existing ontologies as much as possible, for instance by searching for existing models in the linked data initiative. For specific domain one will probably need to define its own model.

WebLab ontologies

We provide three ontologies as part of the WebLab Core in order to tackle the most important elements of the resource annotation problems :

Model as owl

The "model.owl" file is a semantic transcription of the whole data exchange model. It is mainly used when the system include a knowledge base that serve as unique search access point.

Processing properties

The "processing.owl" file describe the most important properties that are used during the execution of a standard processing chain in WebLab.

Retrieval

The last ontology, stored in "retrieval.owl", describe the properties used to annotate search results.

Best practices

On top of the open RDF annotation model, we recommend some general guidelines when thinking to annotation of document in the context of a document processing and retrieval system. These recommendations are only here to ease the use of RDF/XML and simplify its exploitation.

  • Annotations on a Document should be at top level ;
  • RDF elements referenced in Annotation ;
  • Make a coherent use of RDF ;
  • Using Annotators to manage metadata - see Annotator Documentation ;
  • Using Helpers to manage metadata - see Helpers Documentation ;
  • Using RDF in your way.
Personal tools