In the real word data is messy. Not just messy in the sense of missing data or data entry errors, but mixes of the numeric data where traditional statistical approaches excel and the non-numeric or categorical data that doesn’t quite fit within the framework of statistical analysis. It’s the combination of these two very different data types that present difficulties and opportunities.
At Emcien we’re so sure of the opportunity that there is value in the combinations of data types that all of our software is built around analyzing across both data types at once.
On one hand, it is difficult to address combinations of data types together, but on the other hand, there is enormous potential in the analytic and predictive outcomes that combined data can contain.
Procedures for addressing mixtures of numeric and non-numeric data are workable in research, but often in the restraints of professional data analysts the data prep required just isn’t feasible. For example, converting a column of right-hand dominance vs left-hand dominance isn’t so difficult, but what about representing customers across all 50 states? What happens when you want to break your analysis down to county by county? And at the end of all of this data prep a regression analysis will only reveal simple relationships between variables you specify in your model.
The advantage of including that data into a predictive model however can lead to some impressive results. In marketing for example, the combination of numeric information like customer age, frequency of items purchased, etc. along with preferences in style or musical tastes, create a much richer picture of each customer. And because increasingly the kinds of data that we create is non-numeric, there is more opportunity than ever to discover more value.