Almost synonymous with the Big Data hype is the Hadoop hype, but what does Hadoop actually do? Built by Yahoo and Google to essentially index the Internet, Hadoop is not a data warehouse or storage solution—It’s a tool that’s useful when information can be broken up, analyzed in pieces and put back together. For example, if a chain of convenience stores needs to find out how many customers used Mastercard, Visa, American Express, or cash at the pump in the past year, they can use Hadoop as a tool to retrieve that information because it can be divided up and managed in pieces per location, without affecting the big picture.
However, if you’re working with data that requires an examination of the relationships – or connections – within the data, you can’t just look at it in pieces and get the “big picture” of what the data is trying to tell you. So, back to the previous example, this approach would fail when the chain wants to know what food and beverages are being purchased together in rural vs. urban locations and how weather impacts those buying patterns.
The hype around Hadoop makes it seem like a one-size-fits-all solution for leveraging big data, but the reality is – not all problems are Hadoop-able.
While Hadoop is an effective and low-cost tool for some companies, it is not helpful when mining for critical relationships/connections within data. Leveraging connections in that data can be extremely helpful for companies. Consider the following examples:
Rather than crunch the data, organizations need the ability to visualize – and ultimately leverage – these connections of patterns within their data. Hadoop breaks down those critical connections because its main function is breaking up data.
Companies with data that can be represented as a large sparse graph, also known as a “small world graph” because it takes nodes that aren’t directly or obviously connected and reveals their connectivity (like the “six degrees of separation” phenomenon), would especially benefit from leveraging these patterns and cannot rely on Hadoop to do so.
The problem is that most organizations don’t know that their data can be represented as a large sparse graph and the possibilities that come with leveraging connections within the data.
Take healthcare for example — You have nodes for people, medicine, symptoms and side effects. To determine the type of person most likely to have the least side effects related to a certain medication, one needs to leverage the patterns of connections within data – as opposed to breaking apart those connections into disparate clusters. This is the kind of analysis that is not going to lend itself to being partitioned in a Hadoop-able way.
Law enforcement agencies have data comprised of people, organizations, places and words. These connection points, if not Hadoop-ed, can reveal valuable connections to networks of interest and key influencers within those networks. But the value lies in the data that’s sparse, so it needs to be assessed all at the same time instead of as distributed fragments.
Whether it’s healthcare, manufacturing, retail or telecom, companies with data that can be represented as a large sparse graph require effective solutions beyond the methods they are likely using now such as spreadsheets made by guesstimates and intuition. Though Hadoop is getting a lot of attention, it may not always be the best approach to crunching Big Data for strategic insights.
What else is there, you may be wondering? Stay tuned – soon I’ll discuss ways companies can leverage the data that is not Hadoop-able.
~Radhika Subramanian, CEO, Emcien