In the past few years, we have witnessed the World Wide Web becoming more and more open and social. Open-source has been going through a renaissance in the past decade or so, and this has manifested in the proliferation of open standards, interoperability frameworks and recently the introduction of numerous Web APIs (Application Programming Interfaces). A growing number of web applications have made their APIs public to enable developers to integrate their data and services in an easy way to foster an open architecture of sharing content. A new emerging trend is the steady increase of so-called web mashups that create innovative products and services by leveraging the power of existing open Web APIs. The term “mashup” originates from the music industry: it is the act when an artist combines certain tracks from two or more songs to create a wholly new song. By definition, a web mashup hybrid takes information from multiple online services and combines them into a new application that presents the data in a unique way. It should be noted, however, that not necessarily all mashups need to “mix” multiple data sources to come up with something original: in certain cases just a single source of information suffices. One of the best examples of such “single source mashups” and the open data movement is DBpedia, which can be thought of as a structured database version of Wikipedia. DBpedia has a large number of datasets covering a broad range of accumulated human knowledge which are interlinked with other external datasets on the Web, enriching the information accessible through DBpedia even further—a prime example of taking information from an existing service via an open Web API and then enhancing the data to provide added value to the user.
Described by Tim Berners-Lee as one of the more famous parts of the Linked Data project, DBpedia aims to extract structured information from Wikipedia, such as infobox data, categorisation information, images, geo-coordinates and links to external pages, and to make this data available on the Web. The structured content allows users to ask extremely sophisticated and detailed queries against Wikipedia, such as “give me all German musicians that were born in Berlin in the 19th century” or “give me all soccer players with tricot number 11, playing for a club having a stadium with over 40,000 seats and is born in a country with over 10 million inhabitants”—clearly, only the imagination is the limit. Currently, Wikipedia is unable to handle such expressive queries because it is lacking semantic cross-references between its millions of individual article pages. As of September 2011, the DBpedia dataset already contains more than 3.64 million “things”, which include “416,000 persons, 526,000 places, 106,000 music albums, 60,000 films, 17,500 video games, 16,9000 organisations, 18,3000 species and 5400 diseases”—and the list is continuously growing.
As of January 2011, there are more than 6.5 million interlinks between DBpedia and external datasets such as Freebase, GeoNames, CIA World Fact Book, US Census Data and Project Gutenberg, just to name a few. Although DBpedia is a research project, it is used by some high-profile organisations already: for example The British Broadcasting Corporation (BBC), the largest broadcaster in the world, uses it to cross-link its online content on semantic web principles, enabling their numerous micro-sites to be linked semantically together. DBpedia’s semantically cross-referenced data could also be used for consistency checks of Wikipedia articles that would greatly help authors and editors to spot inconsistencies and also provide automatic correction suggestions. This would be especially useful because the checks could be carried out against all the different languages that Wikipedia supports (283, as of the time of this writing).
It is quite easy to see that the technology behind DBpedia has the potential to completely change the way we are searching for information on the Web today. Instead of the ability to perform only simple keyword and relevancy based searches that virtually all major search engines are providing currently, this new technology could open up the possibilities for sophisticated and expressive queries, which could revolutionise the access to the vast amounts of information that the World Wide Web has to offer.
Wikipedia – Mashup (web application hybrid)
Wikipedia – DBpedia
Wikipedia – BBC
Mashup, a new and exciting aspect of Web 2.0
DBpedia – About
DBpedia – Use Cases
Did You Blink? The Structured Web Just Arrived
DBpedia – Querying Wikipedia like a Database
DBpedia – Extracting structured data from Wikipedia
Sir Tim Berners-Lee Talks with Talis about the Semantic Web
BBC Learning Open Lab – Reference
Case Study: Use of Semantic Web Technologies on the BBC Web Sites