Basic Examples
Here is a simple dataset:
Here is a long form conversion with specified identifier column and variable columns:
Here is another long form derived with automatically determined identifier column and variable columns:
Here is a conversion to long form with identifier columns "a" and "b":
Scope
The first argument can be a matrix:
The first argument can be a list of associations:
The second and third arguments can be Automatic:
The options "IdentifierColumns" and "VariableColumns" can be used instead of second argument and third argument respectively:
The second and third arguments can be column indexes:
Automatic keys are derived if the dataset argument has row keys. Here is dataset with row keys:
Here is automatic conversion to long form:
The row keys are ignored if the identifier column is specified:
Options
AutomaticKeysTo
The option "AutomaticKeysTo" can be used to specify the name of the column that corresponds to the automatically determined identifier:
IdentifierColumns
It might be more convenient to specify the identifier columns with an option setting:
VariableColumns
It might be more convenient to specify the variable columns with an option setting:
VariablesTo
The option "VariablesTo" specifies the name of long form's column that has as values the names of the variable columns:
ValuesTo
The option "ValuesTo" specifies the name of long form's column that has as values the values in the variable columns:
Applications
Column names as data
The main advantage of the long form conversion is that variable column names become data. (I.e. values in a certain column.)
This advantage can be demonstrated by making an association of time series objects for the rows of multiple time series data given in wide form. (I.e. a table in which most or all of the column names correspond to time axis values.)
Here is a randomly generated table (dataset) with multiple time series data:
Each row of that random time series dataset corresponds to a yearly sequence of monthly values.
The interpretation of the dataset reveals heterogeneous semantics of its values: (1) each row corresponds to a year which is specified with row’s key, and (2) the columns have short month column names.
Convert the dataset into long form using the row keys as identifiers:
Add the column “ObservationTime” derived from the columns “Year” and “Month”:
Split the long form dataset by year and make time series with the columns "ObservationTime" and "Value":
Plot the obtained time series:
Combinations of the heterogenous data
Using long forms makes easier the programmatic manipulation of heterogenous data.
Get datasets with different number of rows and columns that correspond to items and variables of different kinds:
Convert all datasets into long form datasets:
For each long form dataset change the automatic key column to include dataset's name:
Join all long form datasets into one dataset and show a sample:
Properties and Relations
Sparse matrices (sparse arrays) have representation very similar to that of long form. Here is a random sparse matrix:
Compare the array rules of that sparse matrix with its long form representation.
Here are the array rules:
Here is the long form representation:
Note that since using Normal on the sparse matrix makes a dense matrix with zeroes, those zeroes are filtered out from the displayed long form.
Cross tabulation can be seen under certain conditions as conversion to wide form. Therefore, certain long form conversions can be "reverted" into the original forms using cross tabulation.
Here is an example using the resource function CrossTabulate over a subset of the Lake Mead elevation levels data:
Possible Issues
If the first argument is a dataset without column names the second argument and third argument are expected to consist of column indexes.
Here is a correct specification:
Here is a couple of incorrect specifications:
If the identifier columns produce multiple corresponding rows for a given combination of identifier values then only the last row is put in the long form result.
For this dataset:
the row <|"ID"→1, "a"→ x, "b" → 5|> is "lost":