ameingast / postgresql-simple-migration Goto Github PK

View Code? Open in Web Editor NEW

85.0 6.0 48.0 82 KB

PostgreSQL Schema Migrations for Haskell

License: Other

Haskell 100.00%

postgresql-simple-migration's Introduction

PostgreSQL Migrations for Haskell

Welcome to postgresql-simple-migrations, a tool for helping you with PostgreSQL schema migrations.

This project is an open-source database migration tool. It favors simplicity over configuration.

It is implemented in Haskell and uses the (excellent) postgresql-simple library to communicate with PostgreSQL.

It comes in two flavors: a library that features an easy to use Haskell API and as a standalone application.

Database migrations can be written in SQL (in this case PostgreSQL-sql) or in Haskell.

Why?

Database migrations should not be hard. They should be under version control and documented both in your production systems and in your project files.

What?

This library executes SQL/Haskell migration scripts and keeps track of their meta information.

Scripts are be executed exactly once and any changes to scripts will cause a run-time error notifying you of a corrupted database.

The meta information consists of:

an MD5 checksum of the executed script to make sure already existing scripts cannot be modified in your production system.
a time-stamp of the date of execution so you can easily track when a change happened.

This library also supports migration validation so you can ensure (some) correctness before your application logic kicks in.

How?

This utility can be used in two ways: embedded in your Haskell program or as a standalone binary.

Standalone

The standalone program supports file-based migrations. To execute all SQL-files in a directory $BASE_DIR, execute the following command to initialize the database in a first step.

CON="host=$host dbname=$db user=$user password=$pw"
./dist/build/migrate/migrate init $CON
./dist/build/migrate/migrate migrate $CON $BASE_DIR

To validate already executed scripts, execute the following:

CON="host=$host dbname=$db user=$user password=$pw"
./dist/build/migrate/migrate init $CON
./dist/build/migrate/migrate validate $CON $BASE_DIR

For more information about the PostgreSQL connection string, see: libpq-connect.

Library

The library supports more actions than the standalone program.

Initializing the database:

main :: IO ()
main = do
    let url = "host=$host dbname=$db user=$user password=$pw"
    con <- connectPostgreSQL (BS8.pack url)
    withTransaction con $ runMigration $
        MigrationContext MigrationInitialization True con

For file-based migrations, the following snippet can be used:

main :: IO ()
main = do
    let url = "host=$host dbname=$db user=$user password=$pw"
    let dir = "."
    con <- connectPostgreSQL (BS8.pack url)
    withTransaction con $ runMigration $
        MigrationContext (MigrationDirectory dir) True con

To run Haskell-based migrations, use this:

main :: IO ()
main = do
    let url = "host=$host dbname=$db user=$user password=$pw"
    let name = "my script"
    let script = "create table users (email varchar not null)";
    con <- connectPostgreSQL (BS8.pack url)
    withTransaction con $ runMigration $
        MigrationContext (MigrationScript name script) True con

Validations wrap MigrationCommands. This means that you can re-use all MigrationCommands to perform a read-only validation of your migrations.

To perform a validation on a directory-based migration, you can use the following code:

main :: IO ()
main = do
    let url = "host=$host dbname=$db user=$user password=$pw"
    con <- connectPostgreSQL (BS8.pack url)
    withTransaction con $ runMigration $ MigrationContext
        (MigrationValidation (MigrationDirectory dir)) True con

Database migrations should always be performed in a transactional context.

The standalone binary takes care of proper transaction handling automatically.

The library does not make any assumptions about the current transactional state of the system. This means that the caller of the library has to take care of opening/closing/rolling-back transactions. This way you can execute multiple migration-commands or validations in sequence while still staying in the transaction you opened.

The tests make use of this. After executing all migration-tests, the transaction is rolled back.

Compilation and Tests

The program is built with the cabal build system. The following command builds the library, the standalone binary and the test package.

cabal configure --enable-tests && cabal build -j

To execute the tests, you need a running PostgreSQL server with an empty database called test. Tests are executed through cabal as follows:

cabal configure --enable-tests && cabal test

To build the project in a cabal sandbox, use the following code:

cabal sandbox init
cabal install -j --only-dependencies --enable-tests --disable-documentation
cabal configure --enable-tests
cabal test

To remove the generated cabal sandbox, use:

cabal sandbox delete

To Do

Collect executed scripts and check if already executed scripts have been deleted.

postgresql-simple-migration's People

Contributors

Stargazers

Watchers

postgresql-simple-migration's Issues

Convert tests to use Github actions instead of travis

https://mail.haskell.org/pipermail/haskell-cafe/2020-November/132948.html

Travis is no longer good for open source projects, would you be willing to ok with moving tests to Github actions instead?

Support time 1.8

commercialhaskell/stackage#2678

Raise upper bounds on base64-bytestring dependency

It looks like this change is OK:

From 079c5300001a0c3b16855b4d84f510eba0e7cc4e Mon Sep 17 00:00:00 2001
From: Magnus Therning <[email protected]>
Date: Tue, 5 Jan 2021 16:29:21 +0100
Subject: [PATCH] Allow newer base64-bytestring versions

---
 postgresql-simple-migration.cabal | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/postgresql-simple-migration.cabal b/postgresql-simple-migration.cabal
index f56dafc..49c0a4e 100644
--- a/postgresql-simple-migration.cabal
+++ b/postgresql-simple-migration.cabal
@@ -38,7 +38,7 @@ Library
     default-extensions:     OverloadedStrings, CPP, LambdaCase
     default-language:       Haskell2010
     build-depends:          base                        >= 4.6      && < 5.0,
-                            base64-bytestring           >= 1.0      && < 1.1,
+                            base64-bytestring           >= 1.0      && < 1.2,
                             bytestring                  >= 0.10     && < 0.11,
                             cryptohash                  >= 0.11     && < 0.12,
                             directory                   >= 1.2      && < 1.4,
@@ -52,7 +52,7 @@ Executable migrate
     default-extensions:     OverloadedStrings, CPP, LambdaCase
     default-language:       Haskell2010
     build-depends:          base                        >= 4.6      && < 5.0,
-                            base64-bytestring           >= 1.0      && < 1.1,
+                            base64-bytestring           >= 1.0      && < 1.2,
                             bytestring                  >= 0.10     && < 0.11,
                             cryptohash                  >= 0.11     && < 0.12,
                             directory                   >= 1.2      && < 1.4,
-- 
2.30.0

I believe it could also be done via Hackage directly and thus not requiring a new release.

Support for running Haskell in migrations

First, let me thank you for this library. It looks good and solves most of my issues regarding sql migrations with postgres. Again thank you for sharing this code with the ecosystem.

In the readme you state that postgresql-simple-migration can run Haskell scripts. I'm fairly new to Haskell but reading the documentation it seems that I can only run SQL scripts. Is that right or is it something I have missed? I want to be able to run haskell code to verify certain migrations went as expected and not only execute SQL scripts.

Replace verbose context field with function

Would you be open to these two changes:

Use MonadIO instead of IO, and
Replace the verbose context field with a function (String -> m ())

These changes would allow integration with existing logging frameworks so migration messages don't go to stdout when logs are going somewhere else.

If these changes sound acceptible I'd be happy to supply a PR.

Migrations should be executed within a transaction

User can forget to use begin and commit in migration scripts. In case of error database will come to inconsistent state. To prevent this execute_ call should be covered by withTransaction.

More verbosity would help track script issues

Currently, errors on running script will get you a pretty printed sql exception, which is nice. However, the particular query that failed is not printed, nor the script that was run. I'd be more than happy to write a PR to display these in case of error, if that's ok with you.

Finer-grained transaction control

The CLI tool wraps the whole migration sequence in a single transaction. When using the library, it's up to the user to decide whether or not to wrap the sequence in a transaction. So it's really an all-or-nothing choice: a single transaction, or none at all.

Some operations (like adding a variant to an enum type, for instance) require a commit before being usable, which prevent them from being present in a composite migration sequence (unless the whole sequence is run outside of a transaction).

From the CLI tool standpoint, I think having a simple way to choose the transaction strategy (no transaction, single transaction, per-file transaction) would be a great improvement.

From the lib standpoint this would imply having a way to optionally wrap each migration in a transaction.

Question: Why not include timezone in executed_at?

maintained?

is this package being maintained? the version bounds refer to an old version of postgresql-simple.

Can MigrationCommand be a Semigroup?

Would it make sense to add another constructor to MigrationCommand. This constructor would be MigrationCommands :: [MigrationCommand] -> MigrationCommand. This would represent a sequence of commands to run one after another.

question: should the MigrationDirectory read only .sql files?

Regarding https://github.com/ameingast/postgresql-simple-migration/blob/master/src/Database/PostgreSQL/Simple/Migration.hs#L95-L98 should this function read only .sql files rather than reading all the files?

The reason why I am asking this is, in one occasion it was reading the original file and backup/autosave(e.g. #foo.sql#, foo.sql~) files and was trying to run the same migration twice exiting with an error.

Now coming to think about it, in general whenever a program is trying to read from some specific files which has "configuration" stuff, there is a tendency to add some sort of restrictions. e.g read from a specific files, specific format etc. Maybe here we can put a restriction on file extension so that, unnecessary errors don't creep in. Since migration expects raw sql, using .sql extension could be a good idea.

Add to Stackage?

Of all the migration packages out there, this one fits my needs the best. Besides my question about timezones, the only other downside is that this package isn't on Stackage. Would you be interested in adding it? I volunteer to be the stackage liaison (i.e. the listed maintainer) if that would make your life any simpler.

Thanks!

`migrate` doesn't see `schema_migrations`

So migrate init works. I checked in psql and the table is there but I get back a complaint that the relation doesn't exist.

>> ./migrate init "$CON"

Initializing schema
NOTICE:  relation "schema_migrations" already exists, skipping

>> psql
bstearns=# \c mydb
mydb=# \d schema_migrations 
                      Table "public.schema_migrations"
   Column    |            Type             | Collation | Nullable | Default 
-------------+-----------------------------+-----------+----------+---------
 filename    | character varying(512)      |           | not null | 
 checksum    | character varying(32)       |           | not null | 
 executed_at | timestamp without time zone |           | not null | now()

>> ./migrate migrate "$CON" ./migrations/

SqlError:
  sqlState: 
42P01
  sqlExecStatus: 
FatalError
  sqlErrorMsg: 
relation "schema_migrations" does not exist
  sqlErrorDetail: 

  sqlErrorHint:

Please choose a more descriptive name for the "migrate" utility

While packaging postgresql-simple-migration for openSUSE, our reviewers complained about the tool migrate, which feels like a somewhat poor naming choice because the name gives no clue at all about what it's migrating. Do you think it would be possible to rename that executable to something more descriptive, like postgresql-simple-migrate maybe?

MigrationInitialization can be noisy

When MigrationInitialization is run when schema_migrations already exists it prints out NOTICE: relation "schema_migrations" already exists, skipping.

Perhaps I've misunderstood how to use this library, but I've got a list of MigrationCommands which I'm sequencing over during program startup, the first of which is MigrationInitialization.

GHC 8.0.1 compat / time 1.6

building with stack resolver: 2016-06-15

--  Failure when adding dependencies:    
      time: needed (>=1.4 && <1.6), 1.6.0.1 found (latest applicable is 1.5.0.1)
    needed for package postgresql-simple-migration-0.1.5.0

Is exit code always 0?

I started using this program to do migrations as it's very simple to use and does the job, thank you for this.

I run migrations on my deployment phase and I just changed one of existing migrations to see what happens. Nothing happened, my deployment went as usual. I was surprised that validation didn't throw an error. After checking logs (and adding some additional error logs to show exit codes) I found that program returns 0 exit code even if there are any errors:

# validate
Sep 05 00:58:42 dev db-start[6489]: Checksum mismatch:        0001.sql
Sep 05 00:58:42 dev db-start[6489]: 0
# migrate
Sep 05 00:58:42 dev db-start[6489]: Fail:        0001.sql
Sep 05 00:58:42 dev db-start[6489]: 0

Would it be possible to change exit code in case of errors? In other way I have to parse the output which is probably not the best way to communicate problems to the other process.