Giter Club home page Giter Club logo

chipster-openshift's Introduction

chipster-openshift

Bash scripts and Dockerfiles for building and deploying Chipster's Rest API backend on OpenShift Origin.

Maintenance

Restore backup

Sequential

Let's assume we are restoring the sesssion-db backup.

  • Scale down session-db and backup deployments to 0
  • Remove the old database. Use session-db-postgres pod Terminal to run dropdb session_db_db. If postgres (and thus the pod) refuses to start, use the debug terminal and remove the postgres data folder)
dropdb session_db_db
  • Create a new database in the terminal of the session-db-postgres pod
createdb session_db_db
psql session_db_db
alter system set synchronous_commit to off;
# press Ctrl+D to quit psql
pg_ctl reload -D /var/lib/pgsql/data/userdata
  • Scale the backup deployment to 1. Run on you laptop
oc rsh dc/backup
export PGPASSWORD="$(cat /opt/chipster-web-server/conf/chipster.yaml | grep db-pass-session-db | cut -d " " -f 2)"
PGURL="$(cat /opt/chipster-web-server/conf/chipster.yaml | grep db-url-session-db | cut -d " " -f 2 | sed s/jdbc://)"
pushd db-backups
BACKUP_FILE=""

# check the uncompressed file size
cat $BACKUP_FILE | lz4 -d | pv > /dev/null
cat $BACKUP_FILE | lz4 -d | pv | psql --dbname $PGURL --username user > ../logs/session-db-restore.log

Parallel

Running psql to restore a sql dump processes the dump file about 70 kB/s when it's creating large objects and 800 kB/s when it's creating blobs. With that speed it takes several hours to restore a database of few gigabytes. pg_restore would support parallel restore, but it works only for custom and directory formats, whereas we have a sql dump.

It's seems to be possible to speed this up by splitting the dump and creating large objects and blobs in parallel. At the moment this produced log warnings of files extending beyond their eof mark, let's try again later when the postgres data is not stored on GlusterFS.

# calculate row numbers of each section

lo_start="$(cat all.sql | grep "SELECT pg_catalog.lo_create" -n | head -n 1 | cut -d ":" -f 1)"
table_start="$(cat all.sql | grep "Data for Name: " -n | head -n 1 | cut -d ":" -f 1)"
blobs_start="$(cat all.sql | grep "Data for Name: BLOBS" -n | cut -d ":" -f 1)"
commit_end="$(cat all.sql | grep "COMMIT;" -n | cut -d ":" -f 1)"
eof="$(cat all.sql | wc -l)"

# split the file sections to separate files
# these numbers happened to work for one file, calculate more specific row numbers above if necessary  

sed -n "1,$(($lo_start - 5))p" all.sql > start.sql
sed -n "$(($lo_start - 4)),$(($table_start - 2))p" all.sql > lo.sql
sed -n "$(($table_start - 1)),$(($blobs_start - 2))p" all.sql > tables.sql
sed -n "$(($blobs_start - 1)),$(($blobs_start + 4))p" all.sql > blobs_begin.sql
sed -n "$(($blobs_start + 5)),$(($commit_end - 1))p" all.sql > blobs.sql
sed -n "$(($commit_end - 0)),$(($commit_end + 1))p" all.sql > blobs_commit.sql
sed -n "$(($commit_end + 2)),$(($eof + 1))p" all.sql > end.sql

# split the large object and blob files to smaller peaces
# each record is 9 rows
mkdir -p lo
pushd lo 
split -l $((9 * 10000)) ../lo.sql lo_
popd

# there is a varying number of rows in each record, split from the empty lines
# collect line numbers of empty lines first
mkdir -p blobs
pushd blobs
cat ../blobs.sql | grep -n "^$" | cut -d ":" -f 1 > empty-lines
# then split the the line numbers to suitably sized pieces
split -l $((4 * 10000)) empty-lines blobs_

# finally, split the sql based on the last line number in each file
start=1
for f in $(ls blobs_*); do
	end="$(tail -n 1 $f)"
	echo $f $start $end
	sed -n "$(($start)),$(($end))p" ../blobs.sql > sql_${f}
	cat ../blobs_begin.sql sql_${f} ../blobs_commit.sql > trans_${f}
	rm sql_${f} ${f} 
	start=$(($end + 1))
done
popd

# restore
time cat start.sql | pv | psql --dbname $PGURL --username user

time ls lo/* | parallel -j 10 "cat {} | psql --dbname $PGURL --username user"

time cat tables.sql | pv | psql --dbname $PGURL --username user

time ls blobs/trans_blobs_* | parallel -j 10 "cat {} | psql --dbname $PGURL --username user"

time cat end.sql | pv | psql --dbname $PGURL --username user

chipster-openshift's People

Contributors

klemela avatar hupponen avatar rquazi avatar akallio1 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.