Extracting Semantic Data from wikipedia Infoboxes. Dbpedia

One thing quite amazing about Wikipedia is the huge amount of information it provides. At the time of writing the Wiki data dump articles was at a size of 7.8 GB compressed, 34.8 GB uncompressed, mind you all that without any images. So much data and that too available free . But wikipedia has one problem, due to the mechanical additions of entries scraping wikipedia for any data is a serious pain in the ass(at-least for me). Then the other day i came across Dbpedia , a effort to make the data(information) in Wikipedia info boxes through a freely available, query-able interface.  The data is in RDF format and you can write semantic queries, more like questions asked to wikipedia and it gets better, the queries can be easily imported as a XML/JSON. At first glance this might seem completely trivial, in a way it is very trivial. But the good news is Wikipedia is expanding and Dbpedia is getting better. Imagine a world where one day non programmers become content producers rather than consumers. Thats what i believe Dbpedia will do to the internet and to top it up there are amazing tools like Exhibit which makes visualizing all the data very easy and fun.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s