What is Bandit?
Emcien Bandit is a command line tool that provides automated, high-speed, smart numeric data banding (or binning). Bandit automatically transforms raw numeric data into categorical data for analysis projects and pipelines. Emcien customers analyzing numeric data can easily pre-process data using Bandit’s built-in formulas or create their own bands. You can reference the full list of Bandit Commands here: http://support.emcien.com/help/article/link/bandit-commands-reference
What input file do I need?
A delimited text file in the Emcien wide format containing the input data you want to band. One example is this diabetes research data.
How do I use Bandit?
Download the appropriate version of Bandit for your operating system:
Unzip the package and save the Bandit executable file to a location on your computer. We recommend that the file be renamed to “Bandit” to simplify use of the processing commands. Renaming is not required, but you must use the correct file name in the processing commands.
-h help: Emcien Bandit version 50 (www.emcien.com) -d CHAR dependent: name of the dependent category -f CHAR form: form of output file (W for wide or R for receipt) -s CHAR separator: Separator character -t CHAR table: name of table file -b CHAR bands: name of bands file -r INT randomize: 1 for 10pct test; 2 for 20pct test -v INT verbosity: Print out lots of extra information on stderr
How do I band a file?
Processing data using Bandit involves creating the input files, then passing them along into Bandit using a command line parameter.
Here is an example using the diabetes data set:
bandit -t diabetes_input_data_file.csv
The above example will:
- Ingests the input file (-t “diabetes_input_data_file.csv”).
- Returns a banded file (banded_diabetes_input_data_file.csv) optimized for analysis and a breaks file (breaks_diabetes_input_data_file.csv) containing bands and methods used.
What if I have a defined outcome category?
When using Emcien Bandit to process data that contains a defined outcome (dependent) category, you will use the -d parameter to pass in the dependent category name.
Here is an example:
bandit -t diabetes_input_data_file.csv -d "Diabetes"
The above example will:
- Ingests the input file (-t “diabetes_input_data_file.csv”)
- Use the category “Diabetes” as the dependent (-d “Diabetes”) variable.
- Returns the following output files:
- Banded File (banded_diabetes_input_data_file.csv)
- Auto-Banded file optimized for analysis.
- Breaks File (breaks_diabetes_input_data_file.csv)
- Contains the bands and methods used during the auto banding process.
- Mutual Information file (mi_diabetes_input_data_file.csv)
- Provides a measurement of the relationship between each category and the dependent category.
- Banded File (banded_diabetes_input_data_file.csv)
What if I want to control how the banding works?
If the banded file has:
- columns you would like to exclude
- columns that were banded that should not have been
- columns that you want to override with another system-defined band, your own bands, or strings,
follow these simple instructions to create a user bands file.
Open up the breaks_diabetes_input_data_file.csv file that was just created. Sample:
Column A: Categories within your data
Column B: Method of Banding used
Column C: Ignore
Column D-E: Band range
Column F: Number of rows in the data that contain that band
Column G: Renaming the Bands
Options to create user bands file:
- To apply a system defined band:
- delete all but one line for each category you want to apply a system defined band
- in column B of the category you want to change, type one of the system defined banding options** and delete the remaining contents of that row
- Save this file as something other than breaks_diabetes_input_data_file.csv
- Run bandit again with your new user-defined bands file as the -b bands file
- Example: bandit -t diabetes_input_data_file.csv -d “Diabetes” -b user_input_file.csv
- bandit will apply the specified banding method to your data. In this case, the freq banding method, artificially limited to three bands
- To create a user defined band: open the original breaks_diabetes_input_data_file.csv and adjust the max and min of each band range
- replace the banding method in column B with user
- type in the new min and max in columns D and E respectively
- To apply a label to your band:
- replace the banding method in column B with user
- in column G type in the numerical or text string you want to represent your band
Save the modified breaks file (user bands file) under a different name. It is important to use a name other than bandit's format: breaks_input_file.wide.csv.
Sample user bands file:
Here is an example:
bandit -t diabetes_input_data_file.csv -d "Diabetes" -b user_input_file.csv
The above example will:
- Ingest the input file (-t “diabetes_input_data_file.csv”)
- Use the category “Diabetes” (-d “Diabetes”) as the dependent variable.
- Uses the user bands file (-b “user_input_file.csv”) to specify how the data should be transformed.
- Returns the following output files:
- Banded file (banded_diabetes_input_data_file.csv)
- User banded file optimized for analysis
- Breaks file (breaks_diabetes_input_data_file.csv)
- Contains the bands and methods used during user banding process.
- Mutual Information file (mi_diabetes_input_data_file.csv)
- Provides a measurement of the relationship between each category and the dependent category.
- Banded file (banded_diabetes_input_data_file.csv)