In the minds of most people that work with data, there isn’t much that can happen before data prep. For them, discovery is a process that happens after data is cleansed and prepped. That’s because for those data workers this kind of exploratory analysis requires pristine data. Most analysis tools just can’t handle missing values, categorical data, or alternating date formats.

While this has long been the process going from data to data analysis, it means a lot of work up front in order to determine which data is relevant to the job and which data is not. If you can’t discover value from data until it’s all been cleaned and prepped, you have to clean and prep a lot of data that won’t have any impact on your final output. In fact, it’s estimated that up to 80% of all the work of data scientists and analysts falls into this high cost and low value work.

One of the most useful features of Emcien Scan is that it shifts data discovery to the beginning of the data workflow, before any cleansing or prepping takes place. This is actually an enormous advantage for data practitioners. A quick scan of data, as it exists inside the database, tells the user what data will be beneficial and what data can be avoided. The connected strength even gives users an idea of how difficult a given outcome will be to predict and what other outcomes could be predicted or investigated more easily.

By contrast, discovery without Scan means that users are preparing data that will never be used. This would be like packing week’s worth of food for an afternoon hike, or mixing five gallons of paint to paint a single park bench, but without the right tools this level of effort isn’t clear until the project nears completion. With Scan it’s immediately clear what can be accomplished and what effort is needed to get the job done.

Not having to first cleansing data means that users can cast a wider net for relevant data without making any judgements about which data merits cleansing and prep. Connecting directly to the database, Scan automatically discovers what data should be considered. The result is both faster turnaround and better outcomes, all without prepping data in advance.