Delivering

Multivariate Time
Series Prediction

EmcienPatterns delivers multivariate time series predictions, helping enterprises enact preventative interventions when critical events are on the horizon.

Knowing when something will happen – not simply that it will happen – can better enable you to take the right action at the right time in response to a prediction. As a result, you can make a bigger impact to the outcome you want to improve. For this reason, it’s quite valuable to analyze time series data and make time series predictions.

This article outlines Emcien’s approach to multivariate time series prediction, and explores the following subjects:
 

  1. Time series prediction methodologies
  2. An example of TTL: machine downtime
  3. A simple data prep step: adding a TTL column
  4. Boosting predictive power with data binning
  5. Grouping and analyzing data
  6. Predicting when machines will fail

Time Series Prediction Methodologies

“Forecasting” is somewhat ill-defined. But it typically refers to a type of time series prediction in which you estimate or project the value of a single variable, or multiple variables, at various points in time in the future.

For example, many businesses forecast sales (variable) figures (value) each week for the quarter (time). And, governments forecast the GDP (variable) number (value) for the next five years (time).

But what if you want to know when a specific event or outcome will occur? For example, what if you want to know when each piece of equipment will fail? Or when each network will go down? Or when each employee will churn?

Traditional forecasts cannot answer these granular questions about outcomes.

Current statistical methods are able to predict an outcome with time series using a single variable, but most all business problems are multivariate problems.

And, unraveling multivariate prediction problems becomes quite complex using these more traditional, linear methods.

EmcienPatterns delivers multivariate time series predictions about outcomes quickly and easily using its powerful engine and a time series format known as “time to live” or TTL.

In analytics, TTL is a prediction format in which you identify the amount of time left until each next event occurs, where the event is often a failure or negative outcome of some kind.

An Example of TTL: Machine Downtime

In the case of machine downtime, Emcien uses TTL to identify how much time each machine has until it experiences the next failure event. This is how much time the machine has left “to live.”

What follows is an example use case illustrating how Emcien uses TTL.

Imagine a company has several critical machines – like oil drills – located at several sites across a state. Each machine has 3 different sensors that capture data about the machine’s status and performance.

Several machines located at different sites have data-capturing sensors

In reality, sensors capture and transmit data from mission-critical machines like oil drills very frequently – as often as every second or minute – because close monitoring is necessary to prevent and mitigate risks.

But in this simple example, the sensors capture data just once each day and then transmit that data to corporate headquarters for review.

Occasionally a machine will unexpectedly fail, causing significant loss for the company. This failure event is captured in the data that the machine’s sensors collect, shown below:

Data set of a single machine’s sensor readings and failure events over 2 weeks

In this machine’s data set, a row contains all the sensor readings collected and transmitted on a particular day, and whether or not the machine failed. Failure is marked with a “1.”

The company wants to know when each machine is going to fail before it does, so they can attempt to prevent the failure with proactive maintenance.

A Simple Data Prep Step: Adding a TTL Column

Emcien can deliver this multivariate time series prediction easily by adding only a simple data preparation step to the standard analysis and prediction process with EmcienPatterns.

The company must first add a special column to their data set – the historical data set containing past sensor readings, failure events, and timestamps that EmcienPatterns will analyze.

A new Time to Live column must be added and values calculated for each row

The column can be named anything, but is named Time to Live in the above example. Its purpose is to indicate the time until the next failure event – expressed in days in this particular case.

In order to achieve this, the column calculates the difference between the timestamp of every row (rows being daily sensor readings and failure events) and the timestamp of the failure event that follows behind it most closely in time.

For example, the row for the daily sensor reading on 10/16/2017 has a “1” in the Time to Live column. This is because the closest failure event after that date occurs on 10/17/2017, and the difference between those two timestamps is exactly “1” day.

The company is able to add this column easily and use a macro to quickly make the calculation for each row.

Now, instead of predicting the failure using the Failure column, EmcienPatterns will predict time until next failure using the Time to Live column.

In this way, the company has effectively added a time dimension into what would otherwise be a basic failure prediction.

Boosting Predictive Power with Data Binning

Before EmcienPatterns analyzes the data, it automatically converts values in the Time to Live outcome column into value ranges, or “bins,” shown below:

Values in Time to Live column are converted into the most predictive value ranges

In this example, “1” day before failure did not change. But “2” days and “3” days before failure were converted to a new “2-3” days before failure range. And, “4,” “5,” “6,” and “7” values were converted to a combined “4-7” days before failure range.

This conversion of values into certain value ranges – the data binning process – may appear random, but it is not. Rather, for every unique data set, EmcienPatterns automatically uncovers the value ranges that will boost that data set’s predictive power, heightening prediction accuracy.

It then bins the outcomes accordingly. The “1” day before failure value did not change because EmcienPatterns determined that the “1” day value was already optimally predictive, and binning it into a range with other values would dilute its predictive power. The “2” and “3” day values were combined into a “2-3” day range because together, prediction accuracy would be improved.

Grouping & Analyzing Data

When EmcienPatterns analyzes the data, it groups together all the rows associated with each non-zero value or value range in the Time to Live column, shown below:

All rows associated with each value range in the TTL column are grouped together

The two rows associated with “1” day before a failure event are grouped together. The four rows with “2-3” days before a failure event are grouped together, and the six rows associated with “4-7” days before failure are grouped together.

The sensor readings on days the machine failed are outlier data points that typically cannot produce reliable insight about the machine’s failure. However, the sensor readings taken prior to the failure event are helpful and predictive.

Therefore, EmcienPatterns learns the predictive patterns in each group prior to failure and uses the patterns to generate a model that predicts machine failure.

This data analysis process is performed on the master data set that combines data from all of the company’s machines, not simply on the small data set for a single machine.

Predicting When Machines Will Fail

When Emcien accesses new data from the machines’ sensors, it compares all the data to the predictive model.

It then identifies which machines are displaying patterns associated with a particular time window to failure (like 1 day before failure), and expresses any matches as predictions – each delivered with a likelihood number.

The company knows when machines will fail so they can implement maintenance

For example, EmcienPatterns may predict that machine #1 in Laird has an 87% likelihood of failure in 2-3 days.

EmcienPatterns also provides remedies with every prediction so the company knows what parts of each machine to address, and how, so that proactive maintenance efforts to prevent failure are most effective.

The company has remedies for every failure so they know how to prevent them

And, Emcien sends predictions and remedies to the company’s enterprise applications so they have this critical information when and where they need it in order to act quickly.

Trusted by Leading Brands

Experience analytics like you
never thought possible.

Experience analytics like you never thought possible.