Innovation in Assembly

Overview

The DBpedia logo

The DBpedia logo

In the past few years, we have witnessed the World Wide Web becoming more and more open and social. Open-source has been going through a renaissance in the past decade or so, and this has manifested in the proliferation of open standards, interoperability frameworks and recently the introduction of numerous Web APIs (Application Programming Interfaces). A growing number of web applications have made their APIs public to enable developers to integrate their data and services in an easy way to foster an open architecture of sharing content. A new emerging trend is the steady increase of so-called web mashups that create innovative products and services by leveraging the power of existing open Web APIs. The term “mashup” originates from the music industry: it is the act when an artist combines certain tracks from two or more songs to create a wholly new song. By definition, a web mashup hybrid takes information from multiple online services and combines them into a new application that presents the data in a unique way. It should be noted, however, that not necessarily all mashups need to “mix” multiple data sources to come up with something original: in certain cases just a single source of information suffices. One of the best examples of such “single source mashups” and the open data movement is DBpedia, which can be thought of as a structured database version of Wikipedia. DBpedia has a large number of datasets covering a broad range of accumulated human knowledge which are interlinked with other external datasets on the Web, enriching the information accessible through DBpedia even further—a prime example of taking information from an existing service via an open Web API and then enhancing the data to provide added value to the user.

Comparisons

Described by Tim Berners-Lee as one of the more famous parts of the Linked Data project, DBpedia aims to extract structured information from Wikipedia, such as infobox data, categorisation information, images, geo-coordinates and links to external pages, and to make this data available on the Web. The structured content allows users to ask extremely sophisticated and detailed queries against Wikipedia, such as “give me all German musicians that were born in Berlin in the 19th century” or “give me all soccer players with tricot number 11, playing for a club having a stadium with over 40,000 seats and is born in a country with over 10 million inhabitants”—clearly, only the imagination is the limit. Currently, Wikipedia is unable to handle such expressive queries because it is lacking semantic cross-references between its millions of individual article pages. As of September 2011, the DBpedia dataset already contains more than 3.64 million “things”, which include “416,000 persons, 526,000 places, 106,000 music albums, 60,000 films, 17,500 video games, 16,9000 organisations, 18,3000 species and 5400 diseases”—and the list is continuously growing.

Wikipedia logo

Wikipedia logo

Implications

As of January 2011, there are more than 6.5 million interlinks between DBpedia and external datasets such as Freebase, GeoNames, CIA World Fact Book, US Census Data and Project Gutenberg, just to name a few. Although DBpedia is a research project, it is used by some high-profile organisations already: for example The British Broadcasting Corporation (BBC), the largest broadcaster in the world, uses it to cross-link its online content on semantic web principles, enabling their numerous micro-sites to be linked semantically together. DBpedia’s semantically cross-referenced data could also be used for consistency checks of Wikipedia articles that would greatly help authors and editors to spot inconsistencies and also provide automatic correction suggestions. This would be especially useful because the checks could be carried out against all the different languages that Wikipedia supports (283, as of the time of this writing).

Future directions

It is quite easy to see that the technology behind DBpedia has the potential to completely change the way we are searching for information on the Web today. Instead of the ability to perform only simple keyword and relevancy based searches that virtually all major search engines are providing currently, this new technology could open up the possibilities for sophisticated and expressive queries, which could revolutionise the access to the vast amounts of information that the World Wide Web has to offer.

References

Wikipedia – Mashup (web application hybrid)
Wikipedia – DBpedia
Wikipedia – BBC
Mashup, a new and exciting aspect of Web 2.0
DBpedia – About
DBpedia – Use Cases
Did You Blink? The Structured Web Just Arrived
DBpedia – Querying Wikipedia like a Database
DBpedia – Extracting structured data from Wikipedia
Sir Tim Berners-Lee Talks with Talis about the Semantic Web
BBC Learning Open Lab – Reference
Case Study: Use of Semantic Web Technologies on the BBC Web Sites

Data Is the Next ‘Intel Inside’

Overview

The LinkedIn logo

The LinkedIn logo

“Data is the next Intel Inside”, so goes the famous saying of Tim O’Reilly. Indeed, in the era of Web 2.0, data plays a vital role in the success of widely-known web services such as Google Search, GMail, Facebook, Twitter, LinkedIn and Flickr, just to name a few of the most important ones. Who controls the data, controls the internet—hence the Intel analogy in one of the Web 2.0 movement’s most well-known slogans.

“We live in a world clothed in data, and as we interact with it, we create more” is the motto of the 2011 Web 2.0 Summit Map, of which the incredibly popular social networking site LinkedIn is a prime example. While LinkedIn is conceptually very similar to Facebook, it is squarely aimed at grown-up professionals in the 25 to 65 age range instead of teenagers and young adults. As of now February 2012, LinkedIn has 150 millions subscribers, half of which are from the United States, therefore it is rightfully called the “de facto tool for professional networking”.

Comparisons

Compared to other social networking sites, the aim of LinkedIn is to maintain a list of contact details of people with whom the user had some sort of professional relationship. The service is very popular among employers who are looking for potential candidates and job seekers who wish to seek out business opportunities recommended by someone in their contact network. The application is being continually enhanced with new useful features that sets it apart from its competition. For example, in October 2008 LinkedIn introduced the new “Application Platform” that allows members to embed data from other online services into their profiles. Members can display their latest blog entries using the WordPress application or display a list of books they are currently reading through a connection to their Amazon Reading List.

Implications

According to LinkedIn’s co-founder and chairman Reid Hoffman, the future of the World Wide Web will be all about data and how we can utilise it. Apart from the so-called explicit data that users voluntarily give out about themselves in the form of blog posts, tweets and social network profiles, there is a second class of implicit data as well that can be harvested from the implicitly shared user information. A good example of this is LinkedIn Skills, where by pouring vast of amounts of user data through sophisticated mathematical algorithms, industry trends and insights are revealed, things like which skills are the most in demand and which are the fastest growing industries.

Potential legal and ethical issues

Although Hoffman publicly stated that “Good Internet companies do not ambush their users”, there is a growing concern about the way LinkedIn uses their members’ data for their own agenda. In March 2012, a class action lawsuit has been launched against several popular social networking sites, LinkedIn being among of them, accusing them for stealing information from users without their knowledge or prior consent. This is not the first occasion, as the company has been accused in the past of making profit from user data in the form of targeted advertising programs.

Future directions

While LinkedIn practically “owns” the professional networking space currently, there is certainly room for improvement in many areas of the service. For instance, currently there is no feature that would facilitate group communication between the increasing number of members, and after all, in its current form LinkedIn is just a bit more than a massive CV database with some social media add-ons as an afterthought. With new competitors such as BranchOut and BeKnown appearing on the horizon, who are building similar sites by leveraging existing user data provided by Facebook, LinkedIn is facing the serious challenge of renewing itself to stay relevant and on the top of the professional networking landscape where it is today.

References

The Web 2.0 Summit Map
Wikipedia – LinkedIn
How LinkedIn Broke Through
LinkedIn Launches New Application Platform To Help Members Get Down to Business
Well-known apps named in privacy lawsuit
LinkedIn Founder: Web 3.0 Will Be About Data
HOW TO: Optimize Your LinkedIn Profile’s New Skills Section
LinkedIn Sells Private Customer Data
LinkedIn Adds Social-Driven News, Skills, ‘Maps’ Pages
What is the Future of LinkedIn?

Harnessing Collective Intelligence

Overview

digg logo

The Digg logo

Henry Jenkins describes collective intelligence as “the ability of virtual communities to leverage the combined expertise of their members”, which accurately describes the collective spirit demonstrated by the users of the many online communities in existence today. In fact, “crowdsourcing”, “collective intelligence” and “the wisdom of the crowds” (a term originally coined by James Surowiecki in his influential book titled the same) are probably the buzzwords that most accurately describe the nature of Web 2.0 as it stands today. The theory behind all these terms is deceptively simple: large groups of “unwashed” people are in general smarter, wiser and better at solving problems than an “elite club” of experts.

Officially touted as a user-driven, collective content discovery tool, the website Digg aims at harnessing the collective intelligence of it’s user base to gather, filter and analyse content so the very best of the best can rise the top. The premise is that by bringing together literally millions of people to do the massive work of finding, submitting, categorising, reviewing, discussing and featuring news items, blog entries, articles, images and just about every bit of conceivable information that is to be discovered on the vast perpetual data flow of the World Wide Web, Digg would eventually surface the most interesting, most wanted and most relevant content—”the best stuff”, as voted by their online community.

Comparisons

Originally the brainchild of Kevin Rose, an American Internet entrepreneur and former TechTv co-host, Digg was first launched in December 2004 after an initial investment of $1000. From it’s humble beginnings it rapidly rose to an enormous success, becoming in a flash one of the most prominent and influential social bookmarking sites of the Internet, it’s user base growing exponentially, hitting the 2.7 million individual user account mark as soon as in 2008, according to JCG.org’s estimates.

The basic function of Digg is quite easy to grasp: after having logged in, the user is presented with the moment’s most popular stories on the front page. It is possible to browse stories, filter content, create customised categories, add comments to a particularly interesting story, “follow” each others activity (similar to Twitter) but most importantly, to “Bury” (down-vote) or “Digg” content (up-vote, very similar to Facebook‘s “Like” concept). In the beginning, this novel concept of voting content up or down was what set Digg apart from existing online social bookmarking offerings, a concept that prompted the creating of countless similar social networking sites with content submission and voting systems.

Implications

From the perspective of the user, Digg is an excellent tool to find content worth spending time reading, especially when taking user definable categories into account which enable the user to effectively create customised feeds that closely match their interests. In the heyday of the service being featured on the front page used to be every blogger’s dream and effectively the best way to increase traffic in an explosive way. It didn’t take long that the term “The Digg Effect” was coined (also known as the phrase “dugg to death”), which refers to the situation when the traffic generated by a particularly popular front page story overloads the website’s server, causing it to collapse under the large number of simultaneous users and thus becoming unavailable for period of time.

Potential legal and ethical issues

People often mistakenly believe that the content that rises to the top on Digg is indeed representative of what the majority of their user base thinks is important, but as it has been recently pointed out, in most cases this couldn’t be further from the truth. According to some recent statistical analysis, more than 20% of the content featured on the front page of Digg comes from a surprisingly small group of only about 20 users. Clearly, there seems to a discrepancy between the way Digg attempt to market themselves (self-organizing folksonomy, democracy of opinion, crowdsourcing) and the way their system actually works (“wisdom” derived from a homogenous monoculture, a microscopic “elite” group of privileged individuals). Truth to be told, Digg is well aware of this fact and even makes this information publicly available on their top users’ statistics page. It should be also noted that this problem is in no way particular to Digg only; other popular social bookmarking sites such as Reddit or Delicious (or as a matter of fact, even Wikipedia) exhibit exactly the same type of skewed user contribution statistics.

Future directions

As with all Web 2.0 sites whose success is solely dependent on the input of the people using the service to generate valuable content, the recent massive flock of users from social bookmarking sites to more ‘hip’ services such as Twitter and Facebook begs the inevitable question: are social bookmarking sites here to stay, or are their days already numbered, continuing their slow fade into irrelevancy? According to recent research results, bloggers are getting social media traffic from Facebook and Twitter mainly. As of January 2009, Twitter already had twice as many young users aged 25 to 34 as Digg. The reasons why users move to new services are highly complex and not always rooted solely in the usefulness of a particular piece of technology, but also—and, one could argue, even more so—in societal and fashion trends. At present, no one could tell for sure in what state social bookmarking sites will be in a year from now. Whether Digg and similar sites could regain their former glory, that is yet to seen.

References

What is Digg?
Wikipedia – Digg

Discover and Share Content on Digg
How Digg Works
Wikipedia – Kevin Rose
Harvesting the Collective Intelligence of Social Networks
Top 100 Digg Users Control 56% of Digg’s HomePage Content
Digg loses popularity contest to Reddit
Digg, Reddit, Netscape: The Wisdom of Crowds or Mob Rule?
Twitter Overtakes Digg in Popularity
Can Digg Apologize Its Way Back to Popularity?
Are Social Bookmarking Sites Dying?