Giter Club home page Giter Club logo

pg-upgrade-data-check's Introduction

pg-upgrade-data-check

Installing

go install github.com/processout/pg-upgrade-data-check@latest or use one of the binaries available in https://github.com/processout/pg-upgrade-data-check/releases

Description

pg-upgrade-data-check will compare Postgres table contents of two databases (source and target) to ensure there are no differences in data.

It will detect discrepancies like;

  • Row missing in the target database table
  • Extra row in the target database table
  • Difference in any column value for the same id

Note: It's designed to work purely on tables content (it will not check indexes, etc). Note 2: It will only work for tables that have numeric id that is incremented for each row.

Configuration

All configurations related to tables are contained in the examples/config.yaml file. You must specify the table that needs to be checked in .tables and then configure a function for collect and compare:

  • collect: is executed in first and second stage in order to identify range of id's to compare, usually the max of the current table.
  • compare: should return two values: id and hash; hash is used to compare rows of the same id between two databases.

A full example would be:

 tables:
   users:
     collect: select max(id) from %s limit 1;
     compare: select id, md5(%s::TEXT) as hash from %s where id>=%d and id<=%d order by id asc;

Each command will require the database DSN, which might be indicated by --source-url, --target-url or both.

Usage

As pg-upgrade-data-check has been designed primarily for verifying effects of cloning & replication, it operates in three stages: before, after and compare. Each of these are commands in the cli itself, e.g: pg-upgrade-data-check before --source-url postgres://postgres@localhost:6001/postgres

Stage 1: before

pg-upgrade-data-check before --source-url postgres://postgres@localhost:6001/postgres --config config.yaml

For each configured table find the starting id. This query is executed on the source database, and output is saved to data/before_ids.json file. For safety reasons it is not possible to execute this step if the destination file already exist. Delete that file first if you don't care about previous result.

When using pg-upgrade-data-check during Postgres upgrade this stage should be executed before you create a replication slot on the source database.

Stage 2: after

pg-upgrade-data-check after --target-url postgres://postgres@localhost:6002/postgres --config config.yaml

For each configured table find the ending id. Again, this is executed on the source database, with output written this time to data/after_ids.json file. It has the same safety mechanism as previous stage.

When using pg-upgrade-data-check during Postgres upgrade this stage should be executed after target database has been restored from snapshot and logical replication caught up to source.

Stage 3: compare

pg-upgrade-data-check compare --source-url postgres://postgres@localhost:6001/postgres --target-url postgres://postgres@localhost:6002/postgres --config config.yaml

Load the contents of after_ids.json (generated by the previous stages) and for all tables from configuration compare rows between startId and stopId, detecting additionally any missing or extra rows on the target db.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.