Giter Club home page Giter Club logo

mbslave's Introduction

MusicBrainz Database Mirror

pypi badge

This repository now contains a collection of scripts for managing a replica of the MusicBrainz database.

The main motivation for these scripts is to be able to customize your database. If you don't need such customizations, it might be easier to use the replication tools provided by MusicBrainz itself.

Installation

You need to have Python 3.x installed on your system. You can use pipx to install this package:

sudo apt install python3 pipx
pipx install 'mbslave'

There are two ways to configure the application.

  1. You can use a config file:

    curl https://raw.githubusercontent.com/acoustid/mbslave/main/mbslave.conf.default -o mbslave.conf
    vim mbslave.conf
    

    By default, the mbslave script will look for the config file in the current directory. If you want it to find it from anywhere, either save it to /etc/mbslave.conf or set the MBSLAVE_CONFIG environment variable. For example::

    export MBSLAVE_CONFIG=/usr/local/etc/mbslave.conf
    
  2. Alternativelly, you can use using environment variables:

    export MBSLAVE_DB_HOST=127.0.0.1
    export MBSLAVE_DB_PORT=5432
    export MBSLAVE_DB_NAME=musicbrainz
    export MBSLAVE_DB_USER=musicbrainz
    export MBSLAVE_DB_PASSWORD=XXX
    export MBSLAVE_DB_ADMIN_USER=postgres
    export MBSLAVE_DB_ADMIN_PASSWORD=XXX
    

Database Setup

If you are starting from scratch and want a full copy of the MusicBrainz database, you can use the mbslave init command. This will create a new database and populate it with the latest data from the MusicBrainz database:

mbslave init --create-user --create-database

This requires that you have PostgreSQL running and configured in a way, so that mbslave can connect to it both using a regular account as well as superuser account. How you do that depends on your environment.

The other option is to create the database manually and use the mbslave psql to apply the scripts from MusicBrainz. In this case you are expected to know what you are doing.

Database Replication

You can also keep the database up-to-date by applying incrementa changes.

You need get an API token from the MetaBrainz website and you need to either add it to mbslave.conf or set the MBSLAVE_MUSICBRAINZ_TOKEN environment variable.

After that, you can use the mbslave sync command to download the latest updates:

mbslave sync

Schema Upgrade

When the MusicBrainz database schema changes, the replication will stop working. This is usually announced on the MusicBrainz blog. When it happens, you need to upgrade the database.

Release 2024-05-13 (29)

Run the upgrade scripts:

mbslave psql -f updates/schema-change/29.all.sql
echo 'UPDATE replication_control SET current_schema_sequence = 29;' | mbslave psql

Release 2023-05-22 (28)

Run the upgrade scripts:

mbslave psql -f updates/schema-change/28.all.sql
echo 'UPDATE replication_control SET current_schema_sequence = 28;' | mbslave psql

Release 2022-05-16 (27)

Run the upgrade scripts:

mbslave psql -f updates/schema-change/27.mirror.sql
echo 'UPDATE replication_control SET current_schema_sequence = 27;' | mbslave psql

Release 2021-05-17 (26)

Run the upgrade scripts:

mbslave psql -f updates/schema-change/26.slave.sql
echo 'UPDATE replication_control SET current_schema_sequence = 26;' | mbslave psql

2020-05-18 Upgrade to PostgreSQL 12

These steps are recommended even if you were already running on Postgres 12 before MusicBrainz moved to make PostgreSQL 12 the minimal supported version.

Run the pre-upgrade script:

mbslave psql -f updates/20200518-pg12-before-upgrade.sql

If not already on PostgreSQL 12, upgrade your cluster now (depending on your OS, using pg_upgradecluster or pg_upgrade)

After upgrading, or if already on PostgreSQL 12, run:

mbslave psql -f updates/20200518-pg12-after-upgrade.sql

Release 2019-05-14 (25)

Run the upgrade scripts:

mbslave psql -f updates/schema-change/25.slave.sql
echo 'UPDATE replication_control SET current_schema_sequence = 25;' | mbslave psql

Release 2017-05-25 (24)

Run the upgrade scripts:

mbslave psql -f updates/schema-change/24.slave.sql
echo 'UPDATE replication_control SET current_schema_sequence = 24;' | mbslave psql

Tips and Tricks

Single Database Schema

MusicBrainz uses a number of schemas by default. If you are embedding the MusicBrainz database into an existing database for your application, it's convenient to merge them all into a single schema. That can be done by changing your config like this:

[schemas]
musicbrainz=musicbrainz
statistics=musicbrainz
cover_art_archive=musicbrainz
wikidocs=musicbrainz
documentation=musicbrainz

After this, you only need to create the "musicbrainz" schema and import all the tables there.

Full Import Schema Upgrade

You can use the schema mapping feature to do zero-downtime upgrade of the database with full data import. You can temporarily map all schemas to e.g. "musicbrainz_NEW", import your new database there and then rename it:

echo 'BEGIN; ALTER SCHEMA musicbrainz RENAME TO musicbrainz_OLD; ALTER SCHEMA musicbrainz_NEW RENAME TO musicbrainz; COMMIT;' | mbslave psql -S

mbslave's People

Contributors

alastair avatar amcap1712 avatar dependabot[bot] avatar felix avatar freso avatar gerion0 avatar lalinsky avatar maxetmoritz avatar mjpieters avatar mwiencek avatar reosarevok avatar sfussenegger avatar wsovine avatar yvanzo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

mbslave's Issues

Database creation no longer works after a new schema is released

Commands: pip install 'mbdata[replication]' --break-system-packages, mbslave init --create-database
Error log:

...
2024-05-26 20:47:17 INFO:mbdata.replication:Loading link_type to musicbrainz.link_type
2024-05-26 20:47:17 Traceback (most recent call last):
2024-05-26 20:47:17   File "/usr/local/bin/mbslave", line 8, in <module>
2024-05-26 20:47:17     sys.exit(main())
2024-05-26 20:47:17              ^^^^^^
2024-05-26 20:47:17   File "/usr/local/lib/python3.11/dist-packages/mbdata/replication.py", line 803, in main
2024-05-26 20:47:17     args.func(config, args)
2024-05-26 20:47:17   File "/usr/local/lib/python3.11/dist-packages/mbdata/replication.py", line 297, in mbslave_auto_import_main
2024-05-26 20:47:17     load_tar(url, fileobj, db, config, config.schemas.ignored_schemas, config.tables.ignored_tables)
2024-05-26 20:47:17   File "/usr/local/lib/python3.11/dist-packages/mbdata/replication.py", line 258, in load_tar
2024-05-26 20:47:17     cursor.copy_expert('COPY {} FROM STDIN'.format(fulltable), tar.extractfile(member))
2024-05-26 20:47:17 psycopg2.errors.InvalidTextRepresentation: invalid input syntax for type integer: "2014-05-18 09:46:23.72719+00"
2024-05-26 20:47:17 CONTEXT:  COPY link_type, line 1, column priority: "2014-05-18 09:46:23.72719+00"
2024-05-26 20:47:17 
2024-05-26 20:47:17 Traceback (most recent call last):
2024-05-26 20:47:17   File "/usr/local/bin/mbslave", line 8, in <module>
2024-05-26 20:47:17     sys.exit(main())
2024-05-26 20:47:17              ^^^^^^
2024-05-26 20:47:17   File "/usr/local/lib/python3.11/dist-packages/mbdata/replication.py", line 803, in main
2024-05-26 20:47:17     args.func(config, args)
2024-05-26 20:47:17   File "/usr/local/lib/python3.11/dist-packages/mbdata/replication.py", line 656, in mbslave_init_main
2024-05-26 20:47:17     run_script('mbslave auto-import')
2024-05-26 20:47:17   File "/usr/local/lib/python3.11/dist-packages/mbdata/replication.py", line 609, in run_script
2024-05-26 20:47:17     subprocess.run(['bash', '-euxc', script], check=True)
2024-05-26 20:47:17   File "/usr/lib/python3.11/subprocess.py", line 571, in run
2024-05-26 20:47:17     raise CalledProcessError(retcode, process.args,
2024-05-26 20:47:17 subprocess.CalledProcessError: Command '['bash', '-euxc', 'mbslave auto-import']' returned non-zero exit status 1.

It seems like the PyPI package has not been updated yet.

pipx install failing

I tried to install mbslave with pipx as per README, but failed, with pipx log saying:

ERROR: Could not find a version that satisfies the requirement mbslave (from versions: none)
ERROR: No matching distribution found for mbslave

Well โ€ฆ?

mbslave sync fails with missing dbmirror2.pending_data relation

I had this error after upgrading to the latest version:

psycopg2.errors.UndefinedTable: relation "dbmirror2.pending_data" does not exist
cursor.execute('TRUNCATE dbmirror2.pending_data')

I've tried creating a fresh copy and go from there, but it still wouldn't work. Now the whole dbmirror2 schema was missing:

psycopg2.errors.InvalidSchemaName: schema "dbmirror2" does not exist
cursor.execute('TRUNCATE dbmirror2.pending_data')

Seems like the changes introduced by pull-request #5 are causing this error.

Should mbslave/sql/dbmirror2/ReplicationSetup.sql be executed after init to fix this?

I tried that to but ran right into the next problem:

  File "/usr/local/lib/python3.12/site-packages/mbslave/replication.py", line 432, in process
    cursor.execute(sql, params)
psycopg2.errors.UndefinedTable: relation "artist_release_group_pending_update" does not exist
LINE 1: INSERT INTO artist_release_group_pending_update VALUES (NEW....
                    ^
QUERY:  INSERT INTO artist_release_group_pending_update VALUES (NEW.id)
CONTEXT:  PL/pgSQL function musicbrainz.a_ins_release_group_mirror() line 3 at SQL statement

This one was a bit surprising as musicbrainz.artist_release_group_pending_update existed. So I'm a bit lost now. May that be a permission issue?

Thanks!

Init fails due to violated check constraints

I'm initing the database with these commands:

pipx install 'mbslave'
mbslave init

and running into this error:

...
INFO:mbslave.replication:Loading l_series_work to musicbrainz.l_series_work
INFO:mbslave.replication:Loading l_url_work to musicbrainz.l_url_work
INFO:mbslave.replication:Loading l_work_work to musicbrainz.l_work_work
INFO:mbslave.replication:Loading label to musicbrainz.label
Traceback (most recent call last):
  File "/opt/pipx_bin/mbslave", line 8, in <module>
    sys.exit(main())
  File "/opt/pipx/venvs/mbslave/lib/python3.10/site-packages/mbslave/replication.py", line 787, in main
    args.func(config, args)
  File "/opt/pipx/venvs/mbslave/lib/python3.10/site-packages/mbslave/replication.py", line 297, in mbslave_auto_import_main
    load_tar(url, fileobj, db, config, config.schemas.ignored_schemas, config.tables.ignored_tables)
  File "/opt/pipx/venvs/mbslave/lib/python3.10/site-packages/mbslave/replication.py", line 258, in load_tar
    cursor.copy_expert('COPY {} FROM STDIN'.format(fulltable), tar.extractfile(member))
psycopg2.errors.CheckViolation: new row for relation "label" violates check constraint "label_label_code_check"
DETAIL:  Failing row contains (294731, dacec7dc-806e-4f1a-ab41-cc0c46b297e0, beau by Republic, 2022, 10, 9, null, null, null, 202210, 1, 7741, , 0, 2024-05-14 21:06:04.842073+00, f).
CONTEXT:  COPY label, line 39973: "294731	dacec7dc-806e-4f1a-ab41-cc0c46b297e0	beau by Republic	2022	10	9	\N	\N	\N	202210	1	7741		0	202..."

Traceback (most recent call last):
  File "/opt/pipx_bin/mbslave", line 8, in <module>
    sys.exit(main())
  File "/opt/pipx/venvs/mbslave/lib/python3.10/site-packages/mbslave/replication.py", line 787, in main
    args.func(config, args)
  File "/opt/pipx/venvs/mbslave/lib/python3.10/site-packages/mbslave/replication.py", line 640, in mbslave_init_main
    run_script('mbslave auto-import')
  File "/opt/pipx/venvs/mbslave/lib/python3.10/site-packages/mbslave/replication.py", line 593, in run_script
    subprocess.run(script, check=True, shell=True)
  File "/usr/lib/python3.10/subprocess.py", line 526, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command 'mbslave auto-import' returned non-zero exit status 1.

mbslave synce does not ignore replication data for ignored schemas

executing mbslave sync after initializing the DB with ignore=... configured leads to constraint violations when importing data to pending_data:

$ mbslave sync
INFO:mbslave.replication:Downloading https://metabrainz.org/api/musicbrainz/replication-163114-v2.tar.bz2?token=***
INFO:mbslave.replication:Packet was produced at 2023-11-09 00:17:25.761114+00
Traceback (most recent call last):
  File "/usr/local/bin/mbslave", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/usr/local/lib/python3.12/site-packages/mbslave/replication.py", line 795, in main
    args.func(config, args)
  File "/usr/local/lib/python3.12/site-packages/mbslave/replication.py", line 504, in mbslave_sync_main
    process_tar(packet, db, config, ignored_schemas, ignored_tables, schema_seq, replication_seq, hook)
  File "/usr/local/lib/python3.12/site-packages/mbslave/replication.py", line 453, in process_tar
    importer.load_pending_data(member_file)
  File "/usr/local/lib/python3.12/site-packages/mbslave/replication.py", line 349, in load_pending_data
    cursor.copy_expert('COPY dbmirror2.pending_data FROM STDIN', fp)
psycopg2.errors.CheckViolation: new row for relation "pending_data" violates check constraint "tablename_exists"
DETAIL:  Failing row contains (124502041, statistics.statistic, i, 1511151662, null, {"id" : 17644772, "name" : "count.event.type.5", "value" : 40, "..., null, null).
CONTEXT:  COPY pending_data, line 2: "124502041	statistics.statistic	i	1511151662	\N	{"id" : 17644772, "name" : "count.event.type.5", "val..."

config:

[schemas]
musicbrainz=musicbrainz
statistics=statistics
cover_art_archive=cover_art_archive
event_art_archive=event_art_archive
wikidocs=wikidocs
documentation=documentation
ignore=statistics,wikidocs,documentation

This error might be caused by pull-request #5

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.