Wednesday, November 24, 2010

Google Refine -a way to clean up and transform your data

I came across the online tool Google Refine the other day.

This is something like the search and replace function for data, only more powerful, and incorporating online resources.

Google acquired the technology from Freebase and its Gridworks software.

The tool allows you to manipulate data in both simple and sophisticated ways. Simple as in finding and replacing text and the like. Simple as in finding closely related entries in a spreadsheet and combining them with a few clicks of the mouse. Handy in finding typos in database entries and correcting them (I like this feature a lot!!).

More complex manipulations would be like taking information in a spreadsheet cell and using that to call up related information from the web, which can be added to the spreadsheet as additional columns. E.g., you could have a list of cities, and potentially query the web to get monthly temperature and rainfall data for each of them, without having to go through them all individually.

Mind you, this requires having the information available, and also it will require a bit of learning of the syntax required by the software to do the more complex manipulations, but on the other hand, it's a big step up from the functions available in the typical spreadsheet!