Long format is typically used for transaction data containing many variables and combinations. In long format, categories and their corresponding items are represented vertically and grouped and ordered based on a unique transaction ID.
An example of long format is displayed below.
In the above example, two transactions are shown. Each transaction is identified using a unique transaction_id :101001 and 101002. Transaction 101001 is associated with the purchase of 2 items. Transaction 101002 is associated with the purchase of 3 items. These transactions must be contiguous.
Long format does not support user-defined columns.
Using Long Format for Non-Transaction Data
In addition to transaction data, long format can be used for any data containing unique identifiers and repeating vertical rows.
An example of network data is displayed below.
In the above example, each network event is tracked by a unique identifier. This identifier is used to associate the event with different information in the item_id column.
Long Format Details
Headers are required for each column in long format. The minimum columns required are transaction_id and item_id. The acceptable columns are transaction_id, transaction_time, item_id, item_name, item_category, transaction_volume, transaction_price, item_volume, item_price.
The first column must be transaction_id.
While the data is allowed to contain any UTF-8 characters the header must be in lower ASCII.
Long format does not support user-defined columns. The required and optional columns are described in the table below.
Column Header | Required/Optional | Format | Description |
---|---|---|---|
transaction_id | Required | String |
A unique identifier for the transaction. This identifier can appear in multiple rows but should be different for each transaction. NOTE: Transactions must be contiguous. If you use the same transaction_id in two places in the file, then it will be interpreted as two transactions with the same ID. Rows sharing the same transaction_id must be adjacent. Field values must be wrapped in double quotes (“) and cannot be longer than 255 characters. All double quotes inside any string should be escaped with another double quote. For example: “String 3″ Roll” should be changed to “String 3”“ Roll” |
item_id | Required | String |
A unique identifier for the item. Typically, this column is used for SKU numbers and product numbers. Field values must be wrapped in double quotes (“) and cannot be longer than 255 characters. All double quotes inside any string should be escaped with another double quote. For example: “String 3″ Roll” should be changed to “String 3”“ Roll” |
transaction_date OR transaction_time OR transaction_time_unix |
Optional | Date/ Time |
The date and time when the transaction occurred. Field values can also be formatted using ISO or AUS standards. (See “Date and Time” section below) Field values can contain date only, or date and time. The column heading indicates the date and time format used. |
item_category | Optional | String |
A text identifier used to categorize the item. Typically, this column is used for product category. Field values must be wrapped in double quotes (“) and cannot be longer than 255 characters. All double quotes inside any string should be escaped with another double quote. For example: “String 3″ Roll” should be changed to “String 3”“ Roll” |
item_name | Optional | String |
A descriptive label of the item used in the item_id column. For example, an item_id of RTJ00345 has the associated item_name of Green Striped Lawn Chair. Field values must be wrapped in double quotes (“) and cannot be longer than 255 characters. All double quotes inside any string should be escaped with another double quote. For example: “String 3″ Roll” should be changed to “String 3”“ Roll” |
item_volume | Optional | Float |
The volume of item associated with the transaction. Typically, this volume is the quantity of a specific item purchased. Field values must be in the standard float format: a string of digits with a decimal point optionally among them. Precision beyond 4 decimal places is not used. If the field value is NULL or blank, a default value of 1 is used. Negative values and 0's are not allowed. |
item_price | Optional | Price |
The individual price of the item associated with the transaction. Typically, this price is the manufacturer's suggested retail price (MSRP) or other undiscounted price. Field values must be in dollars, and may begin with an optional $ sign. NULL or blank values are supported. Negative prices are not supported. |
transaction_volume | Optional | Integer |
The total volume of the entire transaction. Note: This column should not be confused with the item_volume column, which is used for the volume associated with specific items. Field values must be non-negative values consisting only of digits. If the field value is NULL or blank, a default value of 1 is used. |
transaction_price | Optional | Price |
The total price of the entire transaction. Note: This column should not be confused with the item_price column, which is used for the price associated with specific items. Field values must be in dollars, and may begin with an optional $ sign. NULL or blank values are supported. Negative prices are not supported. |
Date and Time
When using the transaction_date column, dates should be recorded using in the MM-DD-YYYY or MM/DD/YYYY format. Month and day values can be padded with zeros.
To use the ISO data format (YYYY-MM-DD or YYYY/MM/DD), add _ISO or -ISO to the end of the transaction_date column header.
To use the AUS data format (D-M-YY or D/M/YY), add _AUS or -AUS to the end of the transaction_date column header.
When using the transaction_time, or transaction_time_unix column, dates should be recorded using the ISO-8601 or UNIX time formats.
The ISO-8601 format is YYYY-MM-DDTHH:MM:SSZ
T and the Z are literal characters and can be upper or lower case.
The UNIX time format is the number of seconds since 00:00:00 Coordinated Universal Time (UTC), Thursday, 1 January 1970.
Naming Data File
EmcienPatterns uses a file naming convention to identify the format used. The filename is limited to 200 characters or less. Files should be named using the following structure:
<filename>.<file type>.<extension>
filename | Filenames should use only the following characters:
The following characters are not supported:
|
file type | long |
extension |
.csv or .csv.gz |
File Name Examples
area2-sales.long.csv |
Uncompressed sales data from area 2 in the receipt format |
clinical.all.long.csv.gz |
Compressed clinical test data in the long format |