Data Types and Requirements

This article covers how to format and save your data before uploading to EmcienPatterns.


Choosing a File Format

To use EmcienPatterns, you need to save the dataset as a CSV file using one of the following EmcienPatterns formats:


Wide Format

Wide format is the most commonly used and easy to start with. Wide format consists mostly of user-defined columns and does not require other specific column headings.


Long Format

Long format is typically used for transaction data containing many variables and combinations. In long format, categories and their corresponding items are represented vertically and grouped based on a unique transaction ID.


Tagged Format

Tagged format consists of two required columns (which may be empty) and no header. There may be a maximum of 1,000 categories per data file, with a maximum of 1,000 items per line. 


JSON Format

JSON format contains no headers and is an industry standard format for data storage. JSON files are great for transactional data containing many variables and combinations, and supports large item “baskets” that have up to 1,000 items per transaction.


Compression

EmcienPatterns supports CSV files compressed using GZip (.gz) compression. Compressed CSV files must still use either the wide or receipt format.

Character Encoding

EmcienPatterns supports the following character encoding: 

Line Endings

EmcienPatterns uses either Unix-style line endings n or DOS line endings rn.

Naming Your Data File

EmcienPatterns uses a file naming convention to identify the format used.  The filename is limited to 200 characters or less. Files should be named using the following structure: 

<filename>.<file type>.<extension>

 filename Filenames should use only the following characters:
  • ASCII alphanumeric characters in upper (A-Z), lower (a-z) case, 0-9
  • Whitespace
  • Periods .
  • Underscores _
  • Hyphens –
  • Parentheses ( )

The following characters are not supported:

  • Backslash ()
  • Single quote (')
  • Double quotes ()
 file type long or wide
 extension

.csv or .csv.gz

File Name Examples

 area2-sales.long.csv 

 Uncompressed sales data from area 2 in the long format 

 members.us.wide.csv

 Uncompressed U.S. membership data in the wide format   

 area3-sales.csv.gz

 Compressed sales data from area 3 in the long format   

 clinical.all.wide.csv.gz

 Compressed clinical test data in the wide format