Finding the Connections in Data

EmcienPatterns's automated analysis allows you to identify and see how all of the data points in your data connect. Understanding the way that individual data points connect is helpful when analyzing the trends and patterns in data without needing to make predictions. The connections within data can be used for everything from managing seldom-used parts to reduce the complexity in a manufacturing environment to powering recommendation engines to make retail recommendations more powerful.

For this example we will be working with the Customer Spend data, found in our sample data sets.

Before you can process an analysis report you will first need to upload a data file.  Go to the uploading data files link to learn more.  Click on the Analyze Data button from the Home Screen.  Select the file, input a Report Name and click the Start Analysis button.  That's all there is to it.

The self-querying analysis engine will now connect all of the individual points in the data together to see how every data point co-occurs within the data set.

By clicking “View Analysis” after the analysis is completed, you will be taken to the analysis dashboard. The dashboard is a quick look at how connected the data set is, which means how often and how densely data points form connection patterns with other data points.

However, for most analyses, what is typically most interesting is how certain individual data points interact with the rest of the data. This will give you insight into the data and answer the typical questions that present themselves after analysis. For instance, in this data set we will ask the simple question “What are they key traits of customers with a university degree?”

To answer this, click on the categories link on the left hand side of the page, which takes you to a list of the columns within the dataset, and click on the category “Education”, taking you to the Category Details page, pictured below.

Here we can see the distribution of the different data points within the education column, as well as their frequency and how well connected they are. On this page, we can see that the majority of customers have either a university degree or stopped their education after high school. What will be interesting is to see how the customers with a university degree connect with all of the other data points in the data. To find out, start by clicking on “university degree”.

This page shows us how all of the other data points connect with customers with a university degree as their education. The immediate take aways we can see are that people with a university education are:

  • Likely to have a job in administration
  • Usually choose their cell phone as the preferred method of contact
  • Likely not to be in default and not to have a personal loan

The system finds these connections by creating a graph representation of the data. You can see a visualization of this representation by scrolling down and clicking on 'Explore Graph' on the right-hand side of the screen.

The same data points in the list are visualized here in the graph, where we can see all of the key traits of customers with a university education. Now, with this information we can start to build bigger and bigger profiles of customers with a university degree. To see how, close out the graph and click on the “view clusters” link on the upper right-hand side of the page. Here, you can drag and drop different items onto the filter on the right hand side, or manually filter to see how different data points come together and group in the data set. For instance, look at all the groups that form in the data when we filter down to customers with a university degree that are not in default and also have jobs in the administration industry (you can see the filter on the right hand side):

This group, or “cluster”, building can be useful in many applications. Imagine we would like to market to these customers, we can now burrow down into a more exact microsegmentation to understand the products that are perfect for them. All of the results are available as .csv files by clicking the download button at the top right of the screen.

Rare and strongly connected data patterns are useful in many different types of data, but are oftentimes hard to detect through manual analysis due to how infrequently they appear. To find the Non-Obvious patterns in your data, click on the “Perspectives” link found at the top of the analysis dashboard, and select the “Non-Obvious Clusters”

These pairs of data points occur together very strongly (see the 100% conditional probabilities on the right side), but very infrequently (these patterns only appeared 3 times). In this case, these are customer attributes that pair together strongly and occur rarely. However, your data may show very different patterns:

  • Retail data may identify items that don't sell often, but always sell together- showing the possible hidden costs if one of the products wasn't sold
  • Manufacturing parts data would identify parts ordered together that may want to be bundled as one SKU
  • Demographics data might identify segments of customers that don't appear often but always share the same characteristics

Data points that do not form any patterns with other parts of the data indicate opportunities to possibly remove items from the data set going forward. To find the data points with no connections in your data, click on the “Perspectives” link found at the top of the analysis dashboard, and select the “Disconnected Items”

These data points occur infrequently, but also don't form patterns with anything else- indicating places where reductions may be useful in the future. For instance:

  • Retail data may identify items that don't sell often and don't bolster the sales of any other items either, showing opportunities to cut items with little risk
  • Manufacturing parts data would identify parts ordered so infrequently that they add more complexity cost than revenue and should be removed
  • Demographics data might identify customer attributes that are very rare and don't easily fit into a microsegmentation

These Disconnected Items might also identify issues with data preparation, as in many cases having a multitude of data points that don't form patterns in the data will show an opportunity to re-format the data for an analysis.