From Enterprise CRM to the Data Lake, anyone managing diverse data will inevitably have to do data exploration. Database administrators are asked to deliver more and better information to the right people, and to do this they need the ability to discover what’s really in the data. EmcienScan transforms the process of data discovery from hours of coding and querying to an automated and repeatable process.

  • What’s in the data?
  • How good is the data?
  • What can you do with the data?
  • What data do the end users need?

Answering these questions used to mean an involved project, lots of coding, and not much repeatability. Profiling the data column by column meant writing queries to identify all of the metadata for each column. There’s a query to find the distribution of each column, and a query to discover the outliers in each column, etc. Querying for this metadata represents a significant amount of effort. Even when automated, this process may take a long time and can span days for every run.

Now this is where much of today’s data discovery ends. Merely knowing what the data is and having an idea of how it is distributed can be a full time job for data stewards. However, with advanced data discovery the next step is identifying all of the connections across columns for each data set. The conventional, code-heavy method for discovering connections across data meant querying for relationships, column by column, to find what columns are relevant to each group or user.

Automating data profiling and discovery can be transformative. It enables the database administrator to have immediate access to information that would ordinarily take days to access when queried column by column. With EmcienScan data discovery is no longer performed ad-hoc, but simply becomes a feature of the data.