This article covers how to format and save your data before uploading to EmcienPatterns.
Choosing a File Format
To use EmcienPatterns, you need to save the dataset as a CSV file using one of the following EmcienPatterns formats:
Wide format is the most commonly used and easy to start with. Wide format consists mostly of user-defined columns and does not require other specific column headings.
Long format is typically used for transaction data containing many variables and combinations. In long format, categories and their corresponding items are represented vertically and grouped based on a unique transaction ID.
Tagged format consists of two required columns (which may be empty) and no header. There may be a maximum of 1,000 categories per data file, with a maximum of 1,000 items per line.
JSON format contains no headers and is an industry standard format for data storage. JSON files are great for transactional data containing many variables and combinations, and supports large item “baskets” that have up to 1,000 items per transaction.
EmcienPatterns supports CSV files compressed using GZip (.gz) compression. Compressed CSV files must still use either the wide or receipt format.
EmcienPatterns supports the following character encoding:
EmcienPatterns uses either Unix-style line endings n or DOS line endings rn.
Naming Your Data File
EmcienPatterns uses a file naming convention to identify the format used. The filename is limited to 200 characters or less. Files should be named using the following structure:
|filename||Filenames should use only the following characters:
The following characters are not supported:
|file type||long or wide|
.csv or .csv.gz
File Name Examples
|Uncompressed sales data from area 2 in the long format|
|Uncompressed U.S. membership data in the wide format|
|Compressed sales data from area 3 in the long format|
|Compressed clinical test data in the wide format|