Data mining (sometimes called data or knowledge discovery) is the process of analyzing data from different perspectives and summarizing it into useful information. As the term suggests, the data is mined or queried for insight. For example, retailers use data mining techniques to do basket analysis (customers who bought this also bought that) and to further understand what other factors influence a purchase.
Traditionally, data mining has consisted of analysts generating questions to feed to a database in the hope of finding an answer. This could be something like asking the data belonging to a clothing retailer, “Are customers buying Hawaiian shirts in Atlanta?” Sounds very applicable, especially when it comes to the hype around Big Data, doesn’t it?
Applicable, yes. Effective? Not so much.
Given today’s explosion of “Big Data,” companies need more advanced methods for leveraging their data – methods that don’t rely solely on tribal knowledge, personal experience or best guesses. What’s needed are new technologies and purpose-built solutions that reveal questions to answers no one even knew to ask.
That leads me to the three main reasons why traditional data mining methods are going the way of the dodo:
- The current volume of data is unprecedented. In fact, 15 of 17 sectors in the U.S. have more data stored per company than the entire U.S. Library of Congress. According to IDC, in 2015, an estimated 7.9 zettabytes of data will be produced and replicated – the equivalent of 18 million libraries of congress. With these massive data sets, it’s close to impossible to figure out what to query? The number of queries exponentially explodes with the number of data elements. Should I query about customers buying shirts in Atlanta? Or in summer? Or in summer with a coke? Or with a hot dog?…the list is endless. As one my customers said – “I do not know what questions to ask. Therein is the limitation!” The breadth and depth of this “big” data makes querying seem like trying to strike oil while digging with a toothpick.
- Added to volume is velocity of the data. The data is piling up faster and faster. A company encounters a continuous stream of real-time data – social media updates, customer feedback, sales figures, financial data, supply chain data, product quality data, product monitoring data and on and on and on. There’s simply not enough time to manually query the data – it’s like a physician trying to diagnose thousands of patients at the same time. The data must constantly inform the end-user – ie. diagnose itself and recommend a treatment – for it to be of any strategic value.
- As I’ve already discussed, conventional data mining techniques are driven by the analyst – or group of people – tasked with coming up with a hypothesis, which is subjective and vulnerable to personal bias and human error. Given the amount of information that’s out there, asking the right question every time is becoming more and more of a challenge because even the smartest, most experienced analysts “don’t know what they don’t know.” Querying methods are seriously biased by what the analyst thinks to ask. Again, going to back to the striking oil analogy, if the analyst thinks there is oil under a certain rock, that is the only place he will dig. He could be sitting on a gold mine 50 feet away, but he’d completely miss it.
Data mining is limited to manual endeavors – why limit company success to antiquated methods that by design fail to leverage the data for all it’s worth? It’s time to usher in new methods – new technologies – for transforming the enterprise from reactive – based on guesstimates, hunches, and flawed insight – to proactive – based on data-driven, actionable insight.