- Exporting database tables to compressed CSV files.
- Transferring tables from from one database server to another.
- Loading database data (from both files and Python)
- Creating/Managing Postgresql/TimescaleDB tables, views, materialized views, functions, procedures, continuous aggregates, scheduled tasks.
- Checking for mismatched attributes between SQLAlchemy tables/models and actual tables in a database.
Currently only Postgresql and Postgresql-based databases (e.g. TimescaleDB) are supported.
pip install dbflows
If using the export functionality (export database tables to compressed CSV files), then you will additionally need to have the psql
executable available.
To install psql
:
# enable PostgreSQL package repository
sudo sh -c 'echo "deb http://apt.postgresql.org/pub/repos/apt $(lsb_release -cs)-pgdg main" > /etc/apt/sources.list.d/pgdg.list'
wget -qO- https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo tee /etc/apt/trusted.gpg.d/pgdg.asc &>/dev/null
# replace `15` with the major version of your database
sudo apt update && sudo apt install -y postgresql-client-15
Features:
- File splitting. Create separate export files based on a 'slice column' (an orderable column. e.g. datetime, integer, etc) and/or 'partition column' (a categorical column. e.g. name string).
- Incremental exports (export only data not yet exported). This works for both single file and multiple/split file output.
from dbflows import export_table
import sqlalchemy as sa
# the table to export data from
my_table = sa.Table(
"my_table",
sa.MetaData(schema="my_schema"),
sa.Column("inserted", sa.DateTime),
sa.Column("category", sa.String),
sa.Column("value", sa.Float),
)
# one or more save locations (2 in this case)
save_locs = ["s3://my-bucket/my_table_exports", "/path/to/local_dir/my_table_exports"]
# database URL
url = "postgres://user:password@hostname:port/database-name"
export_table(
table=my_table,
engine=url, # or sa.engine
save_locs=save_locs
)
CLI equivalent:
db export table \
my_table.my_schema \
postgres://user:password@hostname:port/database-name` \
s3://my-bucket/my_table_exports \
/path/to/local_dir/my_table_exports
export_table(
table=my_table,
engine=url, # or sa.engine
save_locs=save_locs,
slice_column=my_table.c.inserted,
file_max_size="500 MB"
)
export_table(
table=my_table,
engine=url, # or sa.engine
save_locs=save_locs,
partition_column=my_table.c.category
)
CLI equivalent:
db export table \
my_table.my_schema \
postgres://user:password@hostname:port/database-name` \
# save to one or more locations (s3 paths or local)
s3://my-bucket/my_table_exports \
/path/to/local_dir/my_table_exports \
--partition-column category # or "-p category"
export_table(
table=my_table,
engine=url, # or sa.engine
save_locs=save_locs,
slice_column=my_table.c.inserted,
file_max_size="500 MB",
partition_column=my_table.c.category,
)
Create a PgLoader
instance for your table and use the load
method to load batches of rows.
Use import_csvs to load CSV with parallel worker threads. This internally uses timescaledb-parallel-copy which can be installed with: go install github.com/timescale/timescaledb-parallel-copy/cmd/timescaledb-parallel-copy@latest