When it comes to big data, one of the largest data sets in the world is the Web. Thanks to Google, the Web is essentially indexed as though it were a massive database. Consequently, the world’s 1.9 billion Web users are conditioned to search, relying on Google or other search engines to find the answers they are looking for. The problem with the Google approach to Big data is: What if you don’t know what question to ask?
That is the real challenge (and opportunity) of big data analytics.
Every day I receive phone calls and emails for data analysts, IT departments, and those tasked with big data projects within organizations across sectors asking what can we do? How can we extract gold from the mine that is big data? While the technological advances that have made it possible to collect, store and analyze big data are tremendous, organizations have hit a wall where the amount of data available far exceeds the human capacity to process it.
Most of them are querying their way through the data, so it’s important to understand that while the size or the number of records is big, the real challenge is the breadth of the data. As more and more data is aggregated, this problem continues to grow. For example, if you have a database with 100 columns and 6 choices per column, there are more possible queries than there are atoms in the universe. The sheer magnitude of potential queries exceeds the capability of mainstream data mining methods, making them mere data-shovels.
Consider the case of network security. We spoke to an organization that uses query-based tools to detect network intrusions. But the network attacks are continuous and ongoing, so the analysts can’t seem to stay ahead of the curve because they can’t detect new intrusion patterns in real-time.
What if they could?
When evaluating new approaches to big data analytics, here are some key considerations:
- Does this approach reduce the effect of noise in the data?
- Does it speeds up processing time?
- Does it reduce storage requirements?
- Does it automate the pursuit of needles in haystacks?
- Does it discover unexpected connections across data sources?
- Does it automatically generate questions?
Google may have made search easy for information seekers, but search and query-based tools simply cannot deliver automatic, mission-critical insights from an organization’s big data, and throwing more data scientists at the problem is cost-prohibitive and, well, manual. To truly leverage big data for competitive advantage, organizations need a new approach – one that automatically surfaces the information you need when you need it.