Introduction
Overview
IterableTables defines a generic interface for tabular data.
The package currently has support for the following data sources: DataFrames, DataStreams (including CSV, Feather, SQLite, ODBC), DataTables, IndexedTables, TimeSeries, TypedTables, DifferentialEquations (any DESolution
) and any iterator who produces elements of type NamedTuple.
The following data sinks are currently supported: DataFrames (including things like ModelFrame
etc.), DataStreams (including CSV, Feather), DataTables, IndexedTables, TimeSeries, TypedTables, StatsModels, Gadfly (currently not working) and VegaLite.
The package is tightly integrated with Query.jl: Any query that creates a named tuple in the last @select
statement (and doesn't @collect
the results into a data structure) is automatically an iterable table data source, and any of the data sources mentioned above can be queried using Query.jl.
Installation
This package only works on julia 0.5 and newer. You can add it with:
Pkg.add("IterableTables")
Getting started
IterableTables
makes it easy to conver between different table types in julia. It also makes it possible to use any table type in situations where packages traditionally expected a DataFrame
.
For example, if you have a DataFrame
using DataFrames
df = DataFrame(Name=["John", "Sally", "Jim"], Age=[34.,25.,67.], Children=[2,0,3])
you can easily convert this into any of the supported data sink types by simply constructing a new table type and passing your source df
:
using DataTables, TypedTables, IndexedTables
# Convert to a DataTable
dt = DataTable(df)
# Convert to a TypedTable
tt = Table(df)
These conversions work in pretty much any direction. For example you can convert a TypedTable
into a DataFrame
:
new_df = DataFrame(tt)
Or you can convert it to a DataTable
:
new_dt = DataTable(t)
The general rule is that you can convert any sink into any source.
IterableTables
also adds methods to a number of packages that have traditionally only worked with DataFrame
s that make these packages work with any data source type defined in IterableTables
.
For example, you can run a regression on any of the source types:
using GLM, DataFrames
# Run a regression on a TypedTable
lm(@formula(Children~Age),tt)
# Run a regression on a DataTable
lm(@formula(Children~Age),dt)
Or you can plot any of these data sources with VegaLite
:
using VegaLite
# Plot a TypedTable
tt |> @vlplot(:point, x=:Age, y=:Children)
# Plot a DataTable
dt |> @vlplot(:point, x=:Age, y=:Children)
Again, this will work with any of the data sources listed above.