User Guide
This guide describes how one can use IterableTables
as a julia user.
Overview
Any of the types that supports the iterable tables interface does so by loading the IterableTables
package, nothing else needs to be done.
To convert things into a destination type one sometimes needs to obey some special conventions, depending on the destination type, but in general these conversions follow a simple pattern. The following sections describe how to convert and use an iterable table with various packages.
DataFrames, DataTables, TypedTables
For all three packages one can simply pass an iterable table to a constructor call to construct a new instance of that type that holds a copy of the data that was stored in the iterable table. For example, assuming the data source is called ds
, one can use the following code:
# Construct a DataFrame
df = DataFrame(ds)
# Construct a DataTable
dt = DataTable(ds)
# Construct a TypedTable
tt = Table(ds)
TimeSeries
To construct a TimeArray
instance, one needs a source that follows a number of rules: 1) it must have a column that is of type TimeType
and 2) all other columns must be of one type. With such a source, one can use the following code to create a TimeArray
, assuming that ds
is an iterable table:
ta = TimeArray(ds, timestamp_column=:name_of_timestamp_column)
If the column with the timestamp information is named timestamp
in the source, one can use a single argument constructor call:
ta = TimeArray(ds)
Temporal
To construct a TS
instance, one needs a source that follows a number of rules: 1) it must have a column that is of type TimeType
and 2) all other columns must be of one type. With such a source, one can use the following code to create a TS
, assuming that ds
is an iterable table:
ta = TS(ds, timestamp_column=:name_of_timestamp_column)
If the column with the timestamp information is named Index
in the source, one can use a single argument constructor call:
ta = TS(ds)
IndexedTables
The simplest way to construct an IndexedTable
is to call the one argument constructor on an iterable table ds
:
it = IndexedTable(ds)
In this case the last column in the source will be the data column in the IndexedTable
, and all other columns will be index columns.
One can manually select the index and data columns by using the keyword arguments idxcols
and datacols
. Both take a vector of Symbol
s as arguments. For example, to make the time
and region
column in a data source the index columns, one would use the following command:
it = IndexedTable(ds, idxcols=[:time, :region])
In this case all remaining columns will be turned into data columns. If one only specifies the datacols
argument, one will create an IndexedTable
in which all columns that are not listed in the datacols
argument will be turned into index columns. Finally, one can also specify both the idxcols
and datacols
argument at the same time (and thus even drop columns by noth listing them in either argument list).
JuliaDB
The simplest way to load any iterable table ds
into JuliaDB is to call the distribute
function:
jdb = distribute(ds)
In addition to the arguments that distribute
accepts in its normal JuliaDB definition, it also accepts named arguments idxcols
and datacols
, which have the same meaning as in the InexedTable
case.
DataStreams (CSV, Feather)
To write an iterable table into a CSV or Feather file is slightly more involved. In particular, one must call the function IterableTables.get_datastreams_source
to create a DataStream.Source
instance that can then be passed to either the CSV.write
or Feather.write
function.
To write an iterable table ds
to a CSV file, one would therefor use the following code:
CSV.write("filename.csv", IterableTables.get_datastreams_source(ds))
And to write an iterable table to a Feather file, one would use the following code:
Feather.write("filename.csv", IterableTables.get_datastreams_source(ds))
VegaLite
VegaLite can plot any iterable table. Here is a simple example:
ds |> @vlplot(:line, x=:a, y=:b)
StatsModels (and statistical models in DataFrames)
For statistical models one can use an iterable table instead of a DataFrame
. Under the hood this is achieved by providing a constructor for ModelFrame
that takes an iterable table, and by providing methods for the fit
function that accept an iterable table instead of a DataFrame
. For most users this implies that one can e.g. simply pass an iterable table to the lm
and glm
function in the GLM package (assuming ds
is any iterable table):
OLS = glm(@formula(Y ~ X), ds, Normal(), IdentityLink())
CSVFiles, FeatherFiles, ExcelFiles and StatFiles
See the README for CSVFiles.jl, ExcelFiles.jl, FeatherFiles.jl and StatFiles.jl for documentation.