In a recent article, Gartner was cited as saying that IoT will bring about an enormous data crunch. All those machines and sensors, they predict, will create so much diverse and streaming data that the task of managing it will be beyond the ability of our current processes and tools.

In this instance, Gartner might actually be right. The way that we currently work with data, and machine data in particular, simply isn’t ready for the kinds of data that will be created when all of the machines around us are connected to the internet. Already we see companies with enormous stores of data packed away in Hadoop clusters because size and complexity leave analysts and data scientists stretched thin. It turns out that analytics is hard work, and analytics on truly large data sets takes time. 

When connected devices are everywhere this data will go mostly untouched, and any potential value it contains will quickly fade away. Although distributed file systems like Hadoop have been heralded for what they will do for data science and analytics, in practice what we’ve seen is that Hadoop’s biggest value has been its ability to act as cheap and easy storage (Apache Hadoop enables distributed data processing, but it doesn’t offer unique analytics functionality). If we are going to get a handle on the impending deluge of data, there will have to be new methods for analyzing and managing data.

Emcien’s approach to this impending problem is a loosely-coupled two tier architecture, where data is collected and analyzed centrally and decisions are made closer to the device or even on the device itself. This approach has two distinct advantages. The first is that it vastly reduces the volumes of data that have to be transmitted and stored. The second is that it completely transforms how we think about response to analysis. Both alerts and actions can be triggered by a predicted outcome, but it’s the fact that these are triggered without waiting for data transfer-analysis-response that is key. There’s no requirement for consistent connections between the source and the analysis.

The result is a complete system that doesn’t have to wait for a centralized analysis to make a decision, alert an analyst, or trigger a maintenance call. There doesn’t even have to be a consistent data connection. Solutions like these and others will be necessary to help industries realize the potential of the internet of things without suffering from the volume of new data.