Implementing Band and Predict

What is band_and_predict?

band_and_predict is an example of a command line script that creates a repeatable process of banding and making predictions on incoming data. This is done by automatically banding a test file, making predictions on that file, and then returns a results file containing those predictions.

 

Note: band_and_predict requires Bandit v50 or later. When installing Bandit it must be copied to a directory in the user's $PATH variable (such as /usr/bin/, $HOME/bin, etc).

What needs to be completed before using band_and_predict?

Before running band_and_predict make sure the training data file has been banded and analyzed in Emcien. The Bandit breaks file output will be used to band the incoming data for predictions.  

What inputs are needed when calling band_and_predict?

 

:

  • Allows the user to view and make predictions

  1. From the Emcien home page, click on ‘Admin’.
  2. Click on your email under users.
  3. Generate API token and copy that token to use in the band_and_predict command.
  4. Your API key will be a 24 character string.  Example: ur390s4ju5foh2s0n989mnp7
  5. Click Update User to save information.

  • Identifies which training data has the predictive rules to apply to new data
  1. From the the Emcien home page, click on ‘Admin’.     

  2. Hover over settings next to your report and click details.

  1. Copy ID information, in our example the ID is 68356633.

:

  • Incoming data that is banded and predictions will be made on.

Save the unbanded test file to the same path as the band_and_predict script. For example our test file could be something like ‘sample_test_file.wide.csv’

The test file must have the .wide.csv extension.

  • Contains the numerical bands to use on the incoming test data to match the same ranges from the training data.

The breaks file was created while banding the training data. This file will be used to band the test file. Save this breaks file in the same path as the band_and_predict script. An example of this file would be something like ‘training_file_breaks.csv’.

  • Allows band_and_predict to send data to the server
  1. From the Emcien home page, click on ‘Admin’.

  2. Then click ‘Edit Company’.

  1. Find the SFTP username, host, and path in Data Server.

  1. Open the band_and_predict script in a text editor.

  2. Insert the information about your company’s SFTP client into the quotations.

sftp_user=${EMCIEN_SFTP_USER:-"acme"}
sftp_host=${EMCIEN_SFTP_HOST:-"disk.internal.acme.com"}
sftp_path=${EMCIEN_SFTP_PATH:-"make-predictions"}

  • The password for a secure connection with the data server.
  1. From the Emcien home page, click on ‘Admin’.

  2. Then click ‘Edit Company’.

Copy the password, it is the string after the “:” but before the “@” in the Data Server field.

Note: If the password has non-alphanumeric characters, be sure to escape them with a ‘’.

: 

  • This is the URL for the prediction engine instance that allows band_and_predict to connect to the server
  1. Use the base URL for your Emcien prediction engine instance (This may be the same URL as your Emcien instance). Be sure to specify http or https.

 

How do I use band_and_predict?

 

Begin by opening your Command Line tool.

 

To see an overview of the command line parameters simply call band_and_predict with the “-h” parameter (band_and_predict -h).  It will return:

 
Emcien Band and Predict (BETA) Usage:
band_and_predict [-h] -a AUTH_TOKEN -r REPORT_ID -p PRED_CSV -b BREAKS -f PASSWD -u URL
Upload and make predictions on a test data set using a specified stamper.
-h  Display help and exit 
-a  A Patterns authentication token for a user with permissions to view and make predictions 
-r  The Patterns Report ID containing the training data with which to make predictions 
-p  The CSV file to band and make predictions on 
-b  The Bandit breaks file that was created when banding the training data file (and will be used to band the prediction CSV) 
-f  Password for the SFTP user associated with the Patterns client that will be used to make predictions 
-u  The URL of the Emcien server. Include the protocol("http" or "https")  

How Do I band_and_predict a file?

 

Every band_and_predict call/command requires a number of inputs to connect the results of the Train analysis with the incoming Test file. These calls/commands can be run manually through the terminal each time or can be automated as part of a larger implementation of recurring analysis and predictions.

 

Here is an example:

band_and_predict -a ‘ur390s4ju5foh2s0n989mnp7’ -r ‘68356633’ -p ‘sample_test_file.wide.csv’ -b ‘breaks_training_file.wide.csv’ -f ‘my_sftp_password’ -u ‘http://ip.address.of.server’
 

The above example will:

  1. Allows access to Emcien API without logging in (-a ‘ur390s4ju5foh2s0n989mnp7’).

  2. Identify in Emcien which training analysis has the rules to make predictions (-r ‘68356633’)

  3. Ingest the test file (-p ‘sample_test_file.wide.csv’)

  4. Use the training data breaks file (-b ‘breaks_training_file.wide.csv’) to specify how the test data should be transformed.

  5. Allows band_and_predict to send data to the SFTP server associated with the desired Emcien client (-f ‘my_sftp_password’).

  6. Directs band_and_predict to the Emcien server URL to use (-u ‘http://ip.address.of.server’).

  7. Returns a results file (sample_test_file.results.csv) containing the predictions on your test data.

 

Note: You can edit the band_and_predict script to be automated on your system or create a similar script that automatically bands and predicts new incoming data.  

Advanced Environmental Variable Options:

If you would like to use environment variables instead of parameters, set the flags using the following variables:

  • “EMCIEN_AUTH_TOKEN”
  • “EMCIEN_REPORT_ID”
  • “EMCIEN_PRED_CSV”
  • “EMCIEN_BREAKS_CSV”
  • “EMCIEN_SFTP_PASS”
  • “EMCIEN_URL”
 

Using a flag for an option will always take precedence over environmental variables.

What if I need to adjust polling intervals?

Adjust polling interval (in seconds) according to the duration of a typical prediction made with this script by editing the “polling_duration” variable near the top of the band_and_predict script’s text.  For example, to poll every 5 seconds you would use “polling_duration=5”.

What additional tools will I need?

While most of the utilities in band_and_predict are typically available on most unix-like systems (OSX and Linux variants), some non-standard tools must be present.  To install missing utilities described below, you must be an admin on your system to provide sudo access. If this is not available to you, consult your IT staff.

  • lftp
    • OSX:  http://lftp.yar.ru/get.html
      • brew install lftp (if homebrew is installed on system)
    • Ubuntu: sudo apt-get install lftp
    • RedHat: sudo rpm install lftp
  • curl
  • Emcien Bandit v50 (or greater)
    • copy bandit to directory in the user’s $PATH variable (such as /usr/bin, $HOME/bin, etc)
 

Any missing tools can be installed using APT, RPM, Homebrew (if available), or the package manager that is available on your system.  Contact your system admin for assistance in doing this or reach out to Emcien’s support site and team for help.