Traditional statistical software is built for numeric data. Historically, that’s exactly the kind of data everyone worked with because data analysis meant working with records of trade. How many items were sold this month? How much revenue was made last quarter? What products were shipped to each customer?
Today our data is much more complex, but the techniques that we use to make sense of that data are by and large built for the numeric data of the past. Modern data isn’t limited to numbers, but increasingly a mix of human and machine generated text and numbers, even numbers that aren’t intended to be used as numbers, like social security numbers, area codes, or IP addresses.
And yet, if you were to ask the average analyst or even data scientist how they analyze the complex and varied data they encounter they will almost certainly tell you that they must first transform each of the qualitative (non-numeric) data points into numeric before they can begin to identify any relationships in the data. Many analysts are so inundated with data that they can only focus on numeric data and don’t take advantage of the non-numeric data they have access to.
Today our data is much more complex, but the techniques that we use to make sense of that data are by and large built for the numeric data of the past.
It’s an obstacle and an additional step in between you and the results you’re seeking in the data. Breaking that barrier will require the ability to quickly analyze numeric and non-numeric data at the same time to show you the relationships across both data types, and how they affect each other. Users then get an easy, repeatable process without extra steps or additional concerns with preparing the data.
Converting qualitative data into numeric data changes its characteristics and its relationship to the rest of the data. But that’s a blog for another day.
What approaches are you taking to address qualitative data? What are your challenges?