As with many emerging data technologies, it’s not always clear how to describe data discovery. In many cases data discovery and data analysis overlap. Sometimes data discovery just means figuring out what kind of data is in each column. For some, data discovery includes the process of identifying data that needs to be cleansed before it can be useful. Some organizations have so much data that discovery for them just means finding the right data.
With the speed and scale of data today, discovery projects can easily turn into open-ended explorations of the data lake that return very little in the way of results.
Whether it’s a business user, an analyst, or a data scientist, people who work with data need to know what data will help them reach their objective. For us, data discovery is all about finding that value quickly. That’s why EmcienScan is designed to push value to the user without any input other than the data itself.
There are many tools that can be used for data discovery, but very few can actually perform data discovery for you. That’s where EmcienScan differentiates itself from other discovery tools. There are no inputs for what constitutes an anomaly, or search functions to test for relationships in the data because Scan automatically finds all of that metadata for you.
True data discovery can’t be an open-ended process, where each new project starts with unknown data and can go on indefinitely, or end without getting to a conclusive yes or no. At the speed and scale of data today, we have a choice. We can continue slog through data manually or we can find new ways to make data work faster and more efficient so it’s the value that comes through and not the effort that it takes to get there.