From Open Sources to Web of Data

Fusion of workshops "Sources Ouvertes et Services" and "Données Liées pour un Web de Données"

31 Janvier 2011 in Bordeaux

Co-located with EGC2012

Workshop presentation

This workshop aims to bring together works dealing on the one hand, issues related to heterogeneous and independent open-sources, and other hand problems on semantic links existing between the structured data to facilitate their process and integration of over the Web of data. Thus, it is the merge of the third edition of the SOS (Services and Open Sources, RFIA'2010 and EGC'2011) workshop and the first edition of the DLWD (Related Data for a Web of Data) workshop.
This workshop will be organized in two sessions:

  • Session 1 : Services and Open Sources (SOS)
  • Session 2 : Related Data for a Web of Data (DLWD)

Services and Open Sources

This session will highlight the many problems associated with processing of data available on open sources (OS). OS means all media freely accessible, free or paid, such as the Internet, public databases, journals, CD-ROMs, television and radio, etc. as opposed to closed sources whose access needs to have specific permissions. These OS provide huge amount of heterogeneous multimedia data (images, text, audio, video, etc..) requiring appropriate treatment to facilitate their exploitation.

In addition to the problems related the heterogeneity of data, the implementation of processing chains that can exploit these data represents a scientific and technical challenge. The interest will be focused on all steps, starting from the discovery phase of the sources of information, the collection and analysis of data collected until the phase of capitalization and exploitation.

The session will also focus on architectural choices selected for the implementation of applications using OS. Indeed, these applications usually attempt to reconcile several software components (COTS, open source software, ad hoc development, etc..) to make them work together in order to achieve a particular task. Emphasis will be placed on service-oriented architecture (SOA) and the use of Semantic Web technologies.

Related Data for a Web of Data

In this second session, we will address issues related to the publication of structured data and their exploitation on the Web of data. Over the past four years, the number of sources of structured data made available on the Web has lead to an explosive growth of the global data space with billions of assertions (31 billions in September 2011).

In this data space, semantic links can be established between data. These links allow crawlers, browsers or applications to navigate through the data sources and combine information from different sources. However, in an open environment like the Web, different URIs are created regularly to identify the same object. The relationship between two resources can be set manually, but the data being numerous, some approaches deal with the automatic generation of links between RDF data sources. Moreover, even if recognized vocabularies exist for representing data on the Web (FOAF, Dublin Core,...), these vocabularies are evolving and are often insufficient for some application areas that develop their own schema (or ontology). This raises the problem of linked data integration despite the heterogeneity of the vocabularies used. The data (or links) can be inaccurate, outdated, false or under restricted use and certain approaches are interested in the quality of data sources. Finally, different architectures can be defined which are largely dependant on the application domain. In this regard, several initiatives are taken at national level (like the DataLift project ( and international level (such as the LOD2 ( and Planet Data ( projects) to initiate and consolidate efforts to solve the problems caused by the mass of linked data available.

Workshop topics

  • Identification and automatic discovery of information sources
  • Access and information gathering from open sources (Web, social networks, RSS, etc.).
  • Classification, filtering information of interest, extracting information from unstructured text and / or using specific vocabularies (blogs, texting, forums, etc.).
  • Extracting information from large volumes of multi-media data (text, image, video, audio)
  • Modeling and capitalizing on knowledge extracted from open sources (ontologies, semantic annotations, etc.).
  • Use of knowledge extracted from open sources: reasoning, DECISION SUPPORT, visualization, etc..
  • Detection of weak signals
  • Public and Government Data
  • Assessment of information sources
  • Provenance and trust of data and links
  • Assessment of information extracted from open sources
  • Inference, search and validation of data links.
  • Interoperability of data sources and ontology alignment
  • Generation and publication of data
  • Querying the contents of the LOD
  • Development of linked data related services
  • Privacy / access control on linked data
  • Integration platforms, heterogeneous processing, services: services interoperability, orchestration semantics, etc..
  • Business intelligence applications or business models for open sources
  • OSINT Applications