This workshop aims to bring together works dealing on the one hand, issues related to heterogeneous and independent open-sources, and other hand problems on semantic links existing between the structured data to facilitate their process and integration of over the Web of data. Thus, it is the merge of the third edition of the SOS (Services and Open Sources, RFIA'2010 and EGC'2011) workshop and the first edition of the DLWD (Related Data for a Web of Data) workshop.
This session will highlight the many problems associated with processing of data available on open sources (OS). OS means all media freely accessible, free or paid, such as the Internet, public databases, journals, CD-ROMs, television and radio, etc. as opposed to closed sources whose access needs to have specific permissions. These OS provide huge amount of heterogeneous multimedia data (images, text, audio, video, etc..) requiring appropriate treatment to facilitate their exploitation.
In addition to the problems related the heterogeneity of data, the implementation of processing chains that can exploit these data represents a scientific and technical challenge. The interest will be focused on all steps, starting from the discovery phase of the sources of information, the collection and analysis of data collected until the phase of capitalization and exploitation.
The session will also focus on architectural choices selected for the implementation of applications using OS. Indeed, these applications usually attempt to reconcile several software components (COTS, open source software, ad hoc development, etc..) to make them work together in order to achieve a particular task. Emphasis will be placed on service-oriented architecture (SOA) and the use of Semantic Web technologies.
In this second session, we will address issues related to the publication of structured data and their exploitation on the Web of data. Over the past four years, the number of sources of structured data made available on the Web has lead to an explosive growth of the global data space with billions of assertions (31 billions in September 2011).
In this data space, semantic links can be established between data. These links allow crawlers, browsers or applications to navigate through the data sources and combine information from different sources. However, in an open environment like the Web, different URIs are created regularly to identify the same object. The relationship between two resources can be set manually, but the data being numerous, some approaches deal with the automatic generation of links between RDF data sources. Moreover, even if recognized vocabularies exist for representing data on the Web (FOAF, Dublin Core,...), these vocabularies are evolving and are often insufficient for some application areas that develop their own schema (or ontology). This raises the problem of linked data integration despite the heterogeneity of the vocabularies used. The data (or links) can be inaccurate, outdated, false or under restricted use and certain approaches are interested in the quality of data sources. Finally, different architectures can be defined which are largely dependant on the application domain. In this regard, several initiatives are taken at national level (like the DataLift project (http://datalift.org) and international level (such as the LOD2 (http://lod2.eu) and Planet Data (http://planet-data.eu) projects) to initiate and consolidate efforts to solve the problems caused by the mass of linked data available.