If you’re a data or IT professional who is responsible for managing data or ensuring data quality, then you already know a great deal about data profiling. The min/max, standard deviation, uniqueness, etc. help you understand your data and even discover problems like missing values and duplicate data.
But data profiling can be, and can do, so much more than that.
Making the Invisible Visible
For most of history, we could perceive only a very small part of the electromagnetic spectrum. This is because our technology of the time–the human eyeball–was limited, and could only detect, or “see” a sliver of the total spectrum.
And so our knowledge of the energy around us in the world, and our ability to leverage it, and protect ourselves against potential harms, was equally limited.
Then along came new technologies and with infrared, x-rays, UV, radio waves, microwaves, gamma rays, we could see the whole world of energy around us.
And we got to work immediately, using our brand new visibility into this previously hidden layer of the world. It felt like we had superpowers and we used them to help us achieve our goals and make our lives better.
What Profiling Can’t See
For the longest time, data profiling has only been able to capture a sliver of our data’s unique signature, or fingerprint.
The reality and totality of our data–the full spectrum–is much broader and richer than the handful of features, or descriptive statistics (min/max, standard deviation, mean), that are captured during typical profiling.
This is because our current technology is limited. Like the eyeball, it can only see certain things.
Being blind to many features of our data’s fingerprint is a real problem for the data and IT workers who need to detect problems in their data to ensure quality. Duplicate data and missing values are quality issues, but changes in your data’s fingerprint can signal a critical business or data operations problem that must be immediately addressed. If you can’t see part of the fingerprint, then you don’t know if that part is changing.
For example, if you don’t know that a feature of your data is that customer marital status is the leading indicator of late payments, then you won’t notice when, overnight, marketing channel replaces marital status as the leading indicator.
A New Era of Profiling
There’s a new breed of data profiling, fueled by automation and cognitive technologies, that’s unlocking enormous value for enterprises, by making the previously invisible features of your data’s fingerprint visible, and detecting any changes in any features in real-time.
These profiling technologies detecting more than 45 unique features of your data, capturing its total fingerprint.
Then, as new data streams in, its compared to your data’s fingerprint. Unlike conventional technologies that only detect changes in a single feature (ex: mean), new technologies detect any change in any feature, and send proactive alerts to those who manage quality.
How many metrics do you capture on your data?