Use Case: Predicting Titanic Survivors
This article outlines how Emcien can predict the outcome of a situation given historical data. The objective of this use case will be to discover patterns in data from the sinking of the Titanic that would allow us to accurately predict whether a passenger survived or not, given the passenger’s demographic data.
Align your data with a use case
Note: This procedure uses the Titanic example data set. Your data should be similar to the example data, with columns for attribute or demographic data for a population. To download, right click the link and click “Save Link As”
Data Prep
Optimize your data.
The wide format is the more universal and most commonly used Emcien format. You can use the wide format for multi-dimensional data, such as demographics or configurable products. In the wide format, each transaction is identified by a single row of data.
In our example, the columns are attributes (e.g. age, home town) for passengers on the RMS Titanic. Emcien works well with any kinds of demographics data, and is able to provide patterns that lead to discrete outcomes. If your data does not look similar to the wide format data here, don’t worry. Emcien Patterns also works with ‘receipt’ type data. For more information about data preparation and different data types supported, see our Data Prep Guide.
Note: Emcien's engine can also automatically band numeric data given a numeric data source. For more information, see the link above.
Uploading Data
Upload your data for analysis.
This article covers how to upload data files to Emcien using File Transfer Protocol (FTP).
Make sure you've prepared your data before uploading to Emcien. Check out the Preparing Your Data article for more information. You can also email us at [email protected] for help getting started.
Upload using Drag and Drop
In Emcien version 2.14 you can now drag and drop files into the application for processing. If you are using a previous version you can use one of the other methods listed below.
Uploading Data to the Emcien Cloud
You can upload data files to Emcien using SSH File Transfer Protocol (SFTP) and your preferred FTP client, such as FileZilla or Cyberduck.
To connect to Emcien using your FTP client, use the following credentials:
Host: feeds.emcien.com
Username: {Your Emcien Feeds server Username}
Password: {Your Emcien Feeds server Password}
Uploading Data on Mac or Linux
Launch your preferred FTP client. If you do not have a preferred FTP client, we recommend Cyberduck.
For this article, we use screenshots of Cyberduck.
Click Open Connection.
Enter the below credentials and click Connect.
Your FTP client will then display the files on your server.
Server: Your Emcien URL. This URL was set by your IT resource during installation.
Username
Password
Drag and drop your data file to the server files in your FTP application.
On the Emcien home page click Analyze Data. You’ll see the data files on this screen.
Uploading Data on Windows
Click the Start button . The Start menu is displayed.
On the Start menu, click Computer. The Computer folder is displayed.
Right-click anywhere in the folder and click Add a network location.
The Add Network Connection wizard is displayed. Click Next.
On the next tab,select Choose a custom network location. Click Next.
On the next tab, enter your Emcien URL.
Then click Next.
Clear the Log on anonymously checkbox.
Type data in the username field. Click Next.
On the next tab, name your shortcut Emcien by typing it in the Type a name for this network location field. Click Next.
On the next tab, click Finish.
Type feeds1 in the password field. Check the Save password checkbox so you can connect directly in the future.
Click Log On. You can now drag data files into this folder.
On the Emcien home page, click Analyze Data. You’ll see the data files on this screen.
Analysis
Begin your analysis.
-
Using your preferred Internet Browser, navigate to the Emcien Sign In page:
-
local VM users, go to: http://localhost:5115
-
For cloud users, go to: http://patterns.emcien.com/
The Analyze Data button will bring up all of your uploaded data. Select your data set.
Click on Predictions, and type in the predictive category. For this data set we will be predicting the Category “Survival”.
Once your project is organized, click Start Analysis.
The load screen will take you through each stage of the analysis.
When the analysis is complete you will hear a chime and the View Analysis button will be highlighted in green. Just click the button to see your results.
Results
See your results.
Begin by clicking “View Analysis”
The Dashboard page for the Titanic analysis data set is displayed below. The interactive graphic on the home screen is the Connections Map, a representation of the analyzed data displayed by how demographics were correlated (or connected) together.
Each color represents the connections identified as relevant patterns in the data set. Mousing over each section will display examples of the corresponding clusters (or groupings of items occurring together). These include Core and Typical Connections (attribute data frequently found together), and Low Volume Disconnected Items (passenger data with no patterns associated with them). Notice at the right of the screen the link to our predictions.
Clicking on the “Survival” button will bring us to the outcomes of the prediction category “Survival”.
The category details page shows a list of the different items in the category predicted. In this case there are two outcomes: Died or Survived. To see the patterns associated with Survival, click “Survived”
The Item Details page shows the strongest patterns related to the item “Survived”. Shown here are the other items (and groups of items) in the data set that connected most strongly to the item we are viewing. To get a better sense of what this page is telling us, click on the “Tell Me” Button to the right of any line.
To find the combination of attributes that would have the strongest ability to predict our item (Survival), click on the “View Predictors” link on the right side of the screen.
The predictions screen is the combinations of attributes (or set of rules) that leads to the outcome item.
The way to interpret these rules is “if the items on the left occur together, then the item on the right will have the given probability of occurring”. Click the Tell Me for a better understanding of a row.
Emcien’s rules and patterns can be easily exported through the “Download CSV” button at the top of each page, and can be used for any predictive scenario.
While Emcien is useful for identifying historical patterns in predictions, it can also make real-time predictions given new information. For more information about our real-time prediction engine, see Our Guide to Making Predictions.
For more information or questions about Emcien’s predictive engine, you can contact your Services team member directly or email us at [email protected] and we will connect you with a services professional.