The Tagged Format

This article covers how to format and save your data in the tagged format before uploading to Emcien.

The Tagged Format

The tagged format contains no headers and is great for unstructured data. The tagged format is most commonly used for analyses containing free-form text.

Formatting

The tagged format consists of two required columns (which may be empty) and no headers. There may be a maximum of 1000 categories per data file, with a maximum of 1000 items per line. An example of the tagged format is below:

Some important things to note about the Tagged Format:

  • The first column is the date, or date and time. Although the column is required, it is allowed to be empty. Details on the date format are below.
  • The second column is the transaction ID. As with the date, this column is required, but may be blank as well. The transaction id can be any unique identifier to a transaction.
  • An item may be represented as “Category::item” or as “item”.
  • Any cells containing commas must be surrounded by double quotes

Date Format Details

The date column can have 4 different formats, but the format must be consistent throughout the data file. Below are possible examples of what they may be.

Format

Example
YYYY-MM-DD 2012-07-15
YYYY-MM-DDTHH:MM:SS                        2012-07-15T02:23:44 (T, t or space must be present between date and time; Z, z, or nothing after time)
unixtime 1431209618 (assumed time zone is UTC+0; 1-10 digit number)
none No value is requred

Not sure which format is best for you? The Emcien team can help prepare your data. Contact us at [email protected].