allenap / rust-postgresfixture Goto Github PK

A fixture for creating PostgreSQL clusters and databases, and tearing them down again, intended for use during development and testing.

License: Apache License 2.0

Rust 100.00%

rust-lang rust-library testing testing-tools

rust-postgresfixture's People

Contributors

Stargazers

Watchers

Forkers

ekleog

rust-postgresfixture's Issues

Default timezone to UTC

On database create the timezone from the local machine appears to be inherited. For consistency, UTC could be chosen instead. I think this only actually affects clients – as in, it's used as a default for client connections, which may override with their own timezone. Docs.

createdb fails

Well, still on my way to try using postgresfixture, I'm hitting issues in createdb: it's returning the following error:

DatabaseError(Error { kind: Db, cause: Some(DbError { severity: "ERROR", parsed_severity: Some(Error), code: SqlState(E42601), message: "syntax error at or near \"-\"", detail: None, hint: None, position: Some(Original(21)), where_: None, schema: None, table: None, column: None, datatype: None, constraint: None, file: Some("scan.l"), line: Some(1176), routine: Some("scanner_yyerror") }) })

I have literally no idea how to interpret that, but my guess would be it's an issue coming from somewhere inside the postgres crate?

FWIW, here is the beginning of my test, which triggers the panic on the last quoted line, and postgres --version returns 14.6 on my (nixos) machine:

fn build_pg_cluster(data: &Path) -> postgresfixture::cluster::Cluster {
    let mut runtime = None;
    let mut best_version = None;
    for r in postgresfixture::runtime::Runtime::find_on_path() {
        if let Ok(v) = r.version() {
            match (&mut runtime, &mut best_version) {
                (None, None) => {
                    runtime = Some(r);
                    best_version = Some(v);
                }
                (Some(runtime), Some(best_version)) => {
                    if *best_version < v {
                        *runtime = r;
                        *best_version = v;
                    }
                }
                _ => unreachable!(),
            }
        }
    }
    postgresfixture::cluster::Cluster::new(
        data,
        runtime.expect("postgresql seems to not be installed in path"),
    )
}

// ..., inside a macro that generates my tests
        #[test]
        fn $name() {
            if std::env::var("RUST_LOG").is_ok() {
                tracing_subscriber::fmt::init();
            }
            let lockfile = tempfile::tempfile().expect("creating tempfile");
            let datadir = tempfile::tempdir().expect("creating tempdir");
            let datadir_path: &Path = datadir.as_ref();
            let cluster = build_pg_cluster(datadir_path);
            postgresfixture::coordinate::run_and_destroy(&cluster, lockfile.into(), || {
                cluster.createdb("test-db").expect("creating test-db db");

BTW, would it make sense to include this build_pg_cluster in postgresfixture itself, maybe as a Runtime::find_latest_in_path() -> Option<Runtime>?

Don't default to `fsync = off`

The fsync = off setting is dangerous. Maybe it shouldn't be the default, even in this tool.

Replace shell-escape with shell-quote

shell-quote is, I believe, more accurate/precise, plus it's one of mine. I may need to implement rules for escaping strings in sh however – currently shell-quote only covers bash.

Match runtime to that in a preexisting cluster

When running postgresfixture shell or postgresfixture exec with a preexisting cluster it might be useful if it could try and find a compatible runtime, e.g. by reading the PG_VERSION file.

Improve error handling in main.rs

Right now it panics if there's a problem (see below). However, for most of the error conditions below we just want to exit non-zero, perhaps with a short error message.

fn shell(database_dir: PathBuf, database_name: &str) -> i32 {
    let cluster = postgresfixture::Cluster::new(
        match database_dir.is_absolute() {
            true => database_dir,
            false => env::current_dir()
                .expect("could not get current working directory")
                .join(database_dir),
        },
        postgresfixture::Runtime::default(),
    );
    cluster.start().expect("could not start cluster");
    if !cluster
        .databases()
        .expect("could not list databases")
        .contains(&database_name.to_string())
    {
        cluster
            .createdb(database_name)
            .expect("could not create database");
    }
    cluster.shell(database_name).expect("shell failed");
    cluster.stop().expect("could not stop cluster");
    0
}

Default database should be `postgres`

A cluster comes with three default databases: template0, template1, and postgres. Instead of creating a new database we could use the postgres database.

Command to list discovered runtimes

For example:

$ postgresfixture runtimes
Version  Path
12.1     /usr/lib/postgresql/12.1
...

Explain why/where/when you would use `postgresfixture`

This is a fairly simple tool that on the face of it can be replaced by a little shell scripting. But the details matter, so it's worth explaining why one would use this rather than DIY scripting, and also why one would not use this tool.

Cluster not stopped if command killed by a signal

Killing an execed command with a signal, like SIGINT by pressing Ctrl-C, causes postgresfixture to leave the cluster running. It can be cleaned up readily by postgresfixture exec true, but it shouldn't happen.

$ postgresfixture exec sleep 99
^C

$ lsof +D cluster/
COMMAND    PID  USER   FD   TYPE DEVICE SIZE/OFF       NODE NAME
postgres 35716 gavin  cwd    DIR    1,4      928 4618892182 cluster
postgres 35716 gavin    1w   REG    1,4    19287 4618893174 cluster/backend.log
postgres 35716 gavin    2w   REG    1,4    19287 4618893174 cluster/backend.log
...
postgres 35725 gavin  cwd    DIR    1,4      928 4618892182 cluster
postgres 35725 gavin    1w   REG    1,4    19287 4618893174 cluster/backend.log
postgres 35725 gavin    2w   REG    1,4    19287 4618893174 cluster/backend.log
postgres 35725 gavin    4u   REG    1,4        0 4618892297 cluster/global/6100

$ postgresfixture exec true

$ lsof +D cluster/
... nothing ...

Add logging

There's not a lot of information of what's going on. In general I think this is good: stay silent unless there's something to say. However, it would be useful to have --verbose and/or --debug flags that can make postgresfixture more chatty, e.g. to explain when locks are being waited for, etc.

Fix bad example shell session output in README

I copied the skeleton of the README from my petname project, but accidentally left some bits around:

Disable `synchronous_commit` when going "faster"

When running with --faster-but-less-safe, try setting synchronous_commit = 'off' too. Allegedly it'll speed things up.

Add some docs on how to actually use this

Perhaps you could write a Getting Started thing for the README on how to actually use this. I think it'd be a helpful tool to have in the ecosystem.

Allow for cluster destruction from the command-line

Both the shell and exec subcommands leave the cluster in place when they exit, even if they're the last consumer of the cluster. An option to also remove/destroy the cluster would be useful.

Add an `exec` subcommand

This should execute an arbitrary command, e.g. pg_dump, a test suite, etc.

Disable `full_page_writes` too

When setting fsync = off, the PostgreSQL docs recommend setting full_page_writes = off too

Cluster is not stopped or destroyed on drop

I think it would be useful to have a mechanism that will automatically stop and, optionally, destroy a cluster when the Cluster is dropped, i.e. something with a Drop implementation.

Thoughts:

Stop a cluster on drop.
Destroy a cluster on drop.
Panic if that fails, or something else? Panicking is probably not a good idea because it'll cause an abort.
Configurable when creating the Cluster.
Configurable while using the Cluster.
Separate ClusterGuard struct instead of bundling Drop implementation into Cluster itself? The Python version has a separate ClusterFixture class for example.

Does not work with PostgreSQL >= 11

Implement new locking scheme

At present, a Cluster's lock file is stored in the parent directory of the cluster, and has a static name, .cluster.lock. This means that using Cluster to manage multiple clusters with the same parent directory could be problematic as locks conflict. Also, the name .cluster.lock does not give away much information as to what might have created the lock file.

Idea for new locking scheme:

If the DATA directory (into which the cluster will be created) does not exist, create it.
- This may fail because a concurrent process has created it. Detect this and continue.
Lock file DATA/.postgresfixture.lock
- exclusively when creating, starting, stopping, or destroying the cluster.
- shared when using the cluster.
- Is it possible to downgrade from exclusive flock to shared? Yes, but ⚠️ might necessitate multiple lock files:
  - From flock(2) on Linux:
    
    A process may hold only one type of lock (shared or exclusive) on a file. Subsequent flock() calls on an already locked file will convert an existing lock to the new lock mode.
  - However, from flock(2) on MacOSX:
    
    A shared lock may be upgraded to an exclusive lock, and vice versa, simply by specifying the appropriate lock type; this results in the previous lock being released and the new lock applied (possibly after other processes have gained and released the lock).
When destroying cluster, move DATA to DATA.XXXXXX (where XXXXXX is random) after stopping the cluster but before deleting all files.
- What happens when process A is destroying a cluster and process B comes to, say, also destroy the cluster? The following would be bad:
  - B blocks, waiting for an exclusive lock on DATA/.postgresfixture.lock.
  - A moves DATA to DATA.XXXXXX.
  - A destroys DATA.XXXXXX and releases all locks.
  - Meanwhile, process C creates a new cluster in DATA and uses it.
  - B still has a file-descriptor for lock file. That lock has been deleted from the filesystem,
    but B can nevertheless now exclusively lock it.
  - B then goes about destroying DATA because it thinks it has an exclusive lock.
  - C gets confused, angry.
  - FIX: Ensure that the lock's File refers to the same file as a newly-opened File for the lock. Use the same-file crate for this: same_file::Handle instances are equal when they refer to the same underlying file.

Default locale to `C`

Locale settings appear to be inherited from the local environment, e.g. if LC_ALL=en_GB.UTF-8 in the calling environment, a cluster will be initialised with the the en_GB.UTF-8 locale. This might only affect client connections, which can override with their own. However, given that locale is also an option when creating a database, it seems prudent to ensure that new databases are created with the same consistent and predictable locale, e.g. C, unless overridden.

Switch to structopt

The experience in petname is that structopt is a much cleaner development experience than clap alone.

Result<bool> that cannot return Ok(false)

Hey!

I was reading some more the docs, and noticed that eg. createdb and dropdb return Result<bool, Error>, but do not document what the bool actually means and upon reading the source code to figure it out can only ever return Ok(true).

Why not make them just return Ok(())?