There have been 5 summers since former Wired Editor-in-Chief Chris Anderson published an article about big data that concluded with the following statement: “Correlation supersedes causation, and science can advance even without coherent models, unified theories, or really any mechanistic explanation at all.” Big data is not only changing science-it’s changing the way businesses interact with and gain insight from their data.
Business intelligence and statistical analysis tools are used to derive insight from data collected about transactions, the populations they’re serving, and to figure out the next product or service that clients demand and deserve. These tools aren’t new-SAS, for instance, has been around since the 1970s. What is new, though, is the amount of data that companies have access to.
Traditional BI tools have some fantastic uses, but they were all built around classical statistical methods of querying, sampling, and modeling data. The classical models were fantastic when data wasn’t so unruly. Business data now comes in many shapes, sizes, and speeds, none of which kindly behave like the standard numeric data of yore. Chris Anderson’s 2008 statement about correlation and causation underlines the need for BI tools to be augmented by some other tool, particularly one that automatically detects patterns in data.
Standard BI and statistical analysis tools require all the trappings of classical statistical methods to find needed information in a database: query testing, sampling, and more. Moreover, classical methods require data to be in table form, and require those tables to be populated by numeric data. The fact about data today is that much of it is not numeric, nor is it structured in a neat database. These data types are best served in the analysis process by pattern detection.
When a pattern detection tool is paired with traditional BI and statistical analysis tools, many of the pains listed above are automatically eliminated. Automated pattern analysis is data type agnostic. Data doesn’t have to be translated and manipulated into numeric tables.
For instance, Emcien’s pattern detection software graphs data regardless of type or format, locates the most interesting patterns, ranks them by importance, and reports those patterns to the business analyst. Automated analysis shrinks the time taken to find correlations in data down to a fraction of the time it takes to sample and manually query data, not to mention the uncertainty of the luck of the draw.
Analysts that bypass manual data analysis are able to perform actions with traditional BI tools much sooner and much more efficiently than without. Instead of forming models and hypotheses around data samples, as one would have to do with traditional BI tools, automated data analysis finds all of the correlations in a set of data and reports them. When the right correlations are highlighted for the worker, he or she can form a query and go directly to the native database to locate the right information for the project the first time-not after 1 or 30 or 1,000 incorrect queries.
Sampling is also a hurdle that can be bypassed with automated analysis. BI tools and other classical statistical analysis methods have relied on sampling due to the lack of information available and the time and cost associated with collecting it. Businesses now frequently have so much data on their hands that sampling is moot: there is usually enough diversity in the data to be representative of the target population or data set as a whole, and taking a sample with data that large leaves more room for margin of error than not.
Automatic pattern analysis can locate patterns across all of your data at once, allowing it to stay in native form and not adulterate any implications across the data. You get the results that are there, not the ones you’ve accidentally created by impartial or faulty sampling.
Once patterns are detected and reported, BI tools better fulfill the promise that, as Anderson said, “correlation supersedes causation.” When analysts know automatically that x,y, and z are correlated, they can find why faster than ever by forming targeted queries and get to the bottom of things easier than ever before-even with mountains of data to sort through.