Innovation in Assembly

Overview

The DBpedia logo

The DBpedia logo

In the past few years, we have witnessed the World Wide Web becoming more and more open and social. Open-source has been going through a renaissance in the past decade or so, and this has manifested in the proliferation of open standards, interoperability frameworks and recently the introduction of numerous Web APIs (Application Programming Interfaces). A growing number of web applications have made their APIs public to enable developers to integrate their data and services in an easy way to foster an open architecture of sharing content. A new emerging trend is the steady increase of so-called web mashups that create innovative products and services by leveraging the power of existing open Web APIs. The term “mashup” originates from the music industry: it is the act when an artist combines certain tracks from two or more songs to create a wholly new song. By definition, a web mashup hybrid takes information from multiple online services and combines them into a new application that presents the data in a unique way. It should be noted, however, that not necessarily all mashups need to “mix” multiple data sources to come up with something original: in certain cases just a single source of information suffices. One of the best examples of such “single source mashups” and the open data movement is DBpedia, which can be thought of as a structured database version of Wikipedia. DBpedia has a large number of datasets covering a broad range of accumulated human knowledge which are interlinked with other external datasets on the Web, enriching the information accessible through DBpedia even further—a prime example of taking information from an existing service via an open Web API and then enhancing the data to provide added value to the user.

Comparisons

Described by Tim Berners-Lee as one of the more famous parts of the Linked Data project, DBpedia aims to extract structured information from Wikipedia, such as infobox data, categorisation information, images, geo-coordinates and links to external pages, and to make this data available on the Web. The structured content allows users to ask extremely sophisticated and detailed queries against Wikipedia, such as “give me all German musicians that were born in Berlin in the 19th century” or “give me all soccer players with tricot number 11, playing for a club having a stadium with over 40,000 seats and is born in a country with over 10 million inhabitants”—clearly, only the imagination is the limit. Currently, Wikipedia is unable to handle such expressive queries because it is lacking semantic cross-references between its millions of individual article pages. As of September 2011, the DBpedia dataset already contains more than 3.64 million “things”, which include “416,000 persons, 526,000 places, 106,000 music albums, 60,000 films, 17,500 video games, 16,9000 organisations, 18,3000 species and 5400 diseases”—and the list is continuously growing.

Wikipedia logo

Wikipedia logo

Implications

As of January 2011, there are more than 6.5 million interlinks between DBpedia and external datasets such as Freebase, GeoNames, CIA World Fact Book, US Census Data and Project Gutenberg, just to name a few. Although DBpedia is a research project, it is used by some high-profile organisations already: for example The British Broadcasting Corporation (BBC), the largest broadcaster in the world, uses it to cross-link its online content on semantic web principles, enabling their numerous micro-sites to be linked semantically together. DBpedia’s semantically cross-referenced data could also be used for consistency checks of Wikipedia articles that would greatly help authors and editors to spot inconsistencies and also provide automatic correction suggestions. This would be especially useful because the checks could be carried out against all the different languages that Wikipedia supports (283, as of the time of this writing).

Future directions

It is quite easy to see that the technology behind DBpedia has the potential to completely change the way we are searching for information on the Web today. Instead of the ability to perform only simple keyword and relevancy based searches that virtually all major search engines are providing currently, this new technology could open up the possibilities for sophisticated and expressive queries, which could revolutionise the access to the vast amounts of information that the World Wide Web has to offer.

References

Wikipedia – Mashup (web application hybrid)
Wikipedia – DBpedia
Wikipedia – BBC
Mashup, a new and exciting aspect of Web 2.0
DBpedia – About
DBpedia – Use Cases
Did You Blink? The Structured Web Just Arrived
DBpedia – Querying Wikipedia like a Database
DBpedia – Extracting structured data from Wikipedia
Sir Tim Berners-Lee Talks with Talis about the Semantic Web
BBC Learning Open Lab – Reference
Case Study: Use of Semantic Web Technologies on the BBC Web Sites

Advertisements

9 thoughts on “Innovation in Assembly

  1. Great post – I have never heard of DBPedia, so thanks for sharing! I found it interesting that this sort of mashup enables us to search for things really specific – should be interesting in the future!

  2. “single source mashups” is cool! It is really help me solved my misconceptions. Thanks a lot! After read your blog, i visited DBpedia website, I found they provide “Query Builder” and “Query interfaces” to user. Can I consider they are API from DBpedia? If I do not misunderstand, DBpedia is also a good API provider. For example, the query language they use is similar to common SQL, in addition they user own api for their service. It is a good platform between Wikipedia and users. I believe, it will be the next Google in the future~!

    • If I understand it correctly, DBPedia collects the information from Wikipedia through its API first, but then it builds its own structured database based on the received data. So when you’re querying it, that query is actually running against DBPedia’s own dataset.

  3. Wow very helpful blog! I was at the beginning a bit unsure about “Innovation in Assembly” until I read your post. Helped me understand heaps!. I’ve also never heard of DBpedia but I’m eager to check it out now. Excellent use of imagery there as well, gave it a nice touch 🙂 !.
    Thanks a bunch!

  4. A search engine that could process expressive queries would definitely be more useful than simple keyword searches. Since sometimes a common word like “All” changes what is important in your search. With keyword searches they often discount often used words like The, All, This. An expressive query based search engine would have to take into consideration semantics. This is definitely the way of the future.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s