enterprisedb / barman Goto Github PK
View Code? Open in Web Editor NEWBarman - Backup and Recovery Manager for PostgreSQL
Home Page: https://www.pgbarman.org/
License: GNU General Public License v3.0
Barman - Backup and Recovery Manager for PostgreSQL
Home Page: https://www.pgbarman.org/
License: GNU General Public License v3.0
Got issue with a lot of old files in /var/lib/postgresql//main/pg_xlog/archive_status/.done (>100k)
"archive_command": "rsync -a --remove-source-files %p barman@server:/var/lib/barman/client/incoming/%f",
We have a PostgreSQL replication cluster consisting of a primary server and several standby servers, and we use Barman 1.6 on a separate server to manage our backups. The primary PG server is configured to archive WAL files to the barman server. We also run base backups of the primary server from the barman server.
Recently, we upgraded the PG replication cluster from 9.4 to 9.5. After the upgrade to 9.5, WAL file archiving resumed without any issues, and base backups completed successfully. We validate our backups by performing "barman recover" operations from the barman server to a remote server. Although the barman recover command ran successfully, PostgreSQL failed to start on the remote host. Error from the PG log: FATAL,XX000,"could not locate required checkpoint record"
I checked the output from the barman recover command, and couldn't find the Begin WAL or End WAL files anywhere on the barman server. Investigating further, I checked the barman wals folder on the barman server, and discovered that 9.5 WAL files reported as being archived in our barman.log file no longer exist in the wals folder. However, all of the WAL files that were archived for 9.4 exist in the wals folder. Apparently, barman deleted the 9.5 WAL files after we take our nightly base backup of the primary server.
barman check z-prod
Server z-prod:
PostgreSQL: OK
wal_level: OK
directories: OK
retention policy settings: OK
backup maximum age: OK (no last_backup_maximum_age provided)
compression settings: OK
failed backups: OK (there are 0 failed backups)
minimum redundancy requirements: OK (have 18 backups, expected at least 0)
ssh: OK (PostgreSQL server)
not in recovery: OK
archive_mode: OK
archive_command: OK
continuous archiving: OK
archiver errors: OK
Following issue #65, make backup copy process via rsync identical to pg_basebackup by excluding files like:
Add to the output of backup command a line containing start and end time of a backup delete operation
Starting with PostgreSQL 10 the PostgreSQL conversioning scheme will switch from 3 to 2 component.
Check the Barman code for incorrect assumptions on PostgreSQL version string format.
when I try to recover backup i get following error:
[root@barman-qa-01 ~]# barman recover --remote-ssh-command 'ssh postgres@barman-qa-02' test-01 20161107T124401 /var/lib/pgsql/9.5/data/
Starting remote restore for server test-01 using backup 20161107T124401
Destination directory: /var/lib/pgsql/9.5/data/
Copying the base backup.
Copying required WAL segments.
Generating archive status files
Identify dangerous settings in destination directory.
EXCEPTION: [Errno 13] Permission denied: '/tmp/barman_recovery-aYBKVi/postgresql.conf'
See log file for more details.
barman.log:
2016-11-08 15:38:30,903 [6715] barman.command_wrappers DEBUG: Command: "ssh postgres@barman-qa-02 'test -d /var/lib/pgsql/9.5/data/pg_xlog/archive_status'"
2016-11-08 15:38:31,062 [6715] barman.command_wrappers DEBUG: Command return code: 0
2016-11-08 15:38:31,063 [6715] barman.command_wrappers DEBUG: Command stdout:
2016-11-08 15:38:31,063 [6715] barman.command_wrappers DEBUG: Command stderr:
2016-11-08 15:38:31,065 [6715] barman.recovery_executor INFO: Identify dangerous settings in destination directory.
2016-11-08 15:38:31,067 [6715] barman.cli ERROR: [Errno 13] Permission denied: '/tmp/barman_recovery-Yd5gHN/postgresql.conf'
See log file for more details.
Traceback (most recent call last):
File "/usr/lib/python2.6/site-packages/barman/cli.py", line 1022, in main
p.dispatch(pre_call=global_config)
File "/usr/lib/python2.6/site-packages/argh/helpers.py", line 47, in dispatch
return dispatch(self, *args, **kwargs)
File "/usr/lib/python2.6/site-packages/argh/dispatching.py", line 121, in dispatch
for line in lines:
File "/usr/lib/python2.6/site-packages/argh/dispatching.py", line 197, in _execute_command
for line in result:
File "/usr/lib/python2.6/site-packages/argh/dispatching.py", line 153, in _call
result = args.function(args)
File "/usr/lib/python2.6/site-packages/barman/cli.py", line 411, in recover
remote_command=args.remote_ssh_command)
File "/usr/lib/python2.6/site-packages/barman/server.py", line 1164, in recover
target_xid, target_name, exclusive, remote_command)
File "/usr/lib/python2.6/site-packages/barman/backup.py", line 445, in recover
exclusive, remote_command)
File "/usr/lib/python2.6/site-packages/barman/recovery_executor.py", line 229, in recover
self._analyse_temporary_config_files(recovery_info)
File "/usr/lib/python2.6/site-packages/barman/recovery_executor.py", line 884, in _analyse_temporary_config_files
"%s.origin" % conf_file)
File "/usr/lib/python2.6/site-packages/barman/recovery_executor.py", line 946, in _pg_config_mangle
with open(filename, 'w') as f:
IOError: [Errno 13] Permission denied: '/tmp/barman_recovery-Yd5gHN/postgresql.conf'
file:
[root@barman-qa-01 barman_recovery-Yd5gHN]# ls -la /tmp/barman_recovery-Yd5gHN/postgresql.conf
-r--r--r-- 1 barman barman 758 Nov 7 12:44 /tmp/barman_recovery-Yd5gHN/postgresql.conf
my system:
[root@barman-qa-01 ~]# cat /etc/redhat-release
CentOS release 6.5 (Final)
[root@barman-qa-01 ~]# python --version
Python 2.6.6
[root@barman-qa-01 ~]# rpm -aq *barman*
barman-2.0-1.rhel6.noarch
Hi,
I don't really know if it is an issue regarding barman, but each time we run a full backup, we get the following in the logs at the end:
2016-06-26 06:31:43 CEST [10475-1] postgres@postgres LOG: restore point "barman_20160626T040001" created at 1CC/2C0000C8
2016-06-26 06:31:43 CEST [10475-2] postgres@postgres STATEMENT: SELECT pg_create_restore_point('barman_20160626T040001')
2016-06-26 06:31:43 CEST [10475-3] postgres@postgres LOG: could not receive data from client: Connection reset by peer
2016-06-26 06:31:43 CEST [10475-4] postgres@postgres LOG: unexpected EOF on client connection with an open transaction
As per https://groups.google.com/forum/#!topic/pgbarman/HYbsFdnzcJk, improve management of BadXlogSegmentName in archive-wal command
When WAL files are obsolete for some reason barman emits the following log message:
barman.backup INFO: Older than first backup. Trashing file <filename> from <servername>
this doesn't give enough information on the real reason of the trashing.
Investigate on possible improvements
Barman seems to include the pg_replslot
directory in the base backups. This is not good as it can lead to issues of WAL retention after restore. See the PostgreSQL documentation at https://www.postgresql.org/docs/9.6/static/continuous-archiving.html#BACKUP-BASE-BACKUP
The build is OK with pytest-catchlog==1.1 (built 8 days ago).
But it fails with latest pytest-catchlog==1.2.0
$ tox -e py34
GLOB sdist-make: /srv/proj/barman/setup.py
py34 recreate: /srv/proj/barman/.tox/py34
py34 installdeps: pytest, mock, pytest-catchlog, pytest-timeout
py34 inst: /srv/proj/barman/.tox/dist/barman-1.5.1b1.zip
py34 installed: argcomplete==1.0.0,argh==0.26.1,barman==1.5.1b1,mock==1.3.0,pbr==1.8.1,psycopg2==2.6.1,py==1.4.30,pytest==2.8.2,pytest-catch
log==1.2.0,pytest-timeout==0.5,python-dateutil==2.4.2,six==1.10.0,wheel==0.24.0
py34 runtests: PYTHONHASHSEED='1069482987'
py34 runtests: commands[0] | py.test tests
=========================================================== test session starts ============================================================
platform linux -- Python 3.4.3+, pytest-2.8.2, py-1.4.30, pluggy-0.3.1
rootdir: /srv/proj/barman, inifile:
plugins: catchlog-1.2.0, timeout-0.5
collected 265 items
...
========================================= 2 failed, 263 passed, 66 pytest-warnings in 1.72 seconds =========================================
ERROR: InvocationError: '/srv/proj/barman/.tox/py34/bin/py.test tests'
_________________________________________________________________ summary __________________________________________________________________
ERROR: py34: commands failed
The documentation says that both 'streaming_archiver_name' and 'streaming_backup_name' are global options, but code defines them as server only. Please make them global too.
As noted by Thiago Ivan in the mailing list, streaming archiver with 9.2 is not working:
main: /usr/pgsql-9.2/bin/pg_receivexlog: unrecognized option '--dbname=host=test user=postgres'
I'm still learning barman and postgres backups and I'm lost on how to validate that everything is running fine. I've everything setup with docker containers and cron jobs that run barman cron and barman backup every 1 and 15 minutes respectively.
Since the clusters are large (+- 50GB each) the backup part takes a long time and uses a lot of resources. So, after reading more docs I realized I don't need full backups every 15 minutes since the WAL files have everything for PITR, however I do not understand exactly how do you know if your "backup chain" is complete...
If I had barman backup running each 24h but I stopped barman for a few hours, and Postgres deleted some wal files, what happens when I restart barman? Will it warn me or let me know about the issue somehow?
If I understand it correctly, wal_keep_segments=150, would always keep 150 wal files just in case, but if postgres needed to create (eg) 200 WAL files it wouldn't keep the extra 50 right?
I'm automating installation using a chef cookbook, and this unexpectedly breaks working backup functionality when upgrading from 1.6.0 to 1.6.1.
To be clear, barman show-server, barman check, and barman backup would all execute as expected on version 1.6.0. Now on 1.6.1, I cannot get past this check. The postgres server settings are as documented.
On server to be backed up:
postgres=# show wal_level;
wal_level
-------------
hot_standby
(1 row)
postgres=# show archive_mode;
archive_mode
--------------
on
(1 row)
postgres=# show archive_command;
archive_command
----------------------------------------------------------------------
rsync -a %p [email protected]:/var/lib/barman/master/incoming/%f
(1 row)
From barman show-server:
archive_command: rsync -a %p [email protected]:/var/lib/barman/master/incoming/%f
incoming_wals_directory: /var/lib/barman/master/incoming
I also found this thread:
https://groups.google.com/forum/#!topic/pgbarman/M-eFUCA1nHA
The barman check command still fails even after running:
barman switch-xlog --force <server>
barman cron
WAL timeline history file is also compressed (it is intentional?), but show-backup doesn't decompress it:
$ barman show-backup servername 20161106T152222
EXCEPTION: /backup/barman/servername/wals/00000004.history
$ mv /backup/barman/servername/wals/00000004.history /backup/barman/servername/wals/00000004.history.gz
$ gunzip /backup/barman/servername/wals/00000004.history.gz
$ barman show-backup servername 20161106T152222
Backup 20161106T152222:
Server Name : servername
Status : DONE
PostgreSQL Version : 90409
...
log contents:
2016-11-07 00:39:41,205 [5707] barman.cli ERROR: /backup/barman/servername/wals/00000004.history
See log file for more details.
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/barman/cli.py", line 1022, in main
p.dispatch(pre_call=global_config)
File "/usr/lib/python2.7/site-packages/argh/helpers.py", line 55, in dispatch
return dispatch(self, *args, **kwargs)
File "/usr/lib/python2.7/site-packages/argh/dispatching.py", line 174, in dispatch
for line in lines:
File "/usr/lib/python2.7/site-packages/argh/dispatching.py", line 277, in _execute_command
for line in result:
File "/usr/lib/python2.7/site-packages/argh/dispatching.py", line 231, in _call
result = function(namespace_obj)
File "/usr/lib/python2.7/site-packages/barman/cli.py", line 539, in show_backup
server.show_backup(backup_info)
File "/usr/lib/python2.7/site-packages/barman/server.py", line 1687, in show_backup
backup_ext_info = self.get_backup_ext_info(backup_info)
File "/usr/lib/python2.7/site-packages/barman/server.py", line 1673, in get_backup_ext_info
forked_after=backup_info.end_xlog)
File "/usr/lib/python2.7/site-packages/barman/server.py", line 1822, in get_children_timelines
history_info = xlog.decode_history_file(history_path)
File "/usr/lib/python2.7/site-packages/barman/xlog.py", line 354, in decode_history_file
raise BadHistoryFileContents(path)
BadHistoryFileContents: /backup/barman/servername/wals/00000004.history
Hi.
I have 6 files in barman error folder.
All of them have similar content. Example vi 00000003.history.20160705T220101Z.unknown
has content "1 F9F/2861E770 before 2016-07-05 15:46:06.743379+03"
Barman check gives result
archiver errors: FAILED (unknown: 6)
Where should i start looking for error? pg 9.5.3 barman 1.6.1
Thanks,
Indrek
Hi,
I am using 1.6.0a1 to get the pg_receivexlog working with postgresql 9.3 on centos7. I have my configuration attached. I manually attempted the pg_receivexlog as the barman user with success. I can 'barman check server0' - I see a successful connection in the postgres log and the output shows: "pg_receivexlog: OK" and "pg_receivexlog compatible: OK".
'barman cron' does not make a connection, though it outputs: "Starting WAL archiving for server server0".
Thanks,
-dkw
In case of timeline switch between two base backups, Barman is unable to follow the new timeline while associating WAL files to a backup.
A notable symptoms is that during "barman show-backup" the "Last available" WAL is not updated.
This issue might be related to #29
Hi,
is it intended behavior, that barman list-files only shows one timeline?
I have a master/slave postgres cluster running and thus after a failover the timeline changes. The wal shipping is working fine and is processed by barman cron correctly. However
barman list-files --target wal fsmpostgres 20160223T070002
lists just the wal files from the previos timeline, e. g.
/var/lib/barman/fsmpostgres/wals/0000000200000021/000000020000002100000089
/var/lib/barman/fsmpostgres/wals/0000000200000021/00000002000000210000008A
/var/lib/barman/fsmpostgres/wals/0000000200000021/00000002000000210000008B
/var/lib/barman/fsmpostgres/wals/0000000200000021/00000002000000210000008C
/var/lib/barman/fsmpostgres/wals/0000000200000021/00000002000000210000008D
/var/lib/barman/fsmpostgres/wals/0000000200000021/00000002000000210000008E
/var/lib/barman/fsmpostgres/wals/0000000200000021/00000002000000210000008F
/var/lib/barman/fsmpostgres/wals/00000003.history
The existing directory with new timelines /var/lib/barman/fsmpostgres/wals/0000000300000021 is obviously being ignored. I tried to rebuild the xlog, but this didn't change anything.
As a result the icinga check for monitoring the last-wal is failing, because wal files of the new timeline are not taken into consideration.
Barman 1.6.0-1.pgdg14.04+1
fails to create file data/backup_label
:
2016-04-20 16:25:54,556 [3939] barman.backup INFO: Starting backup for server standby in /var/lib/barman/standby/base/20160420T162554
2016-04-20 16:25:54,584 [3939] barman.backup_executor INFO: 1471071, tb_hdd_main, /var/lib/postgresql/tb_hdd
2016-04-20 16:30:24,061 [3939] barman.backup_executor INFO: Backup start at xlog location: 3B8/11410FC0 (00000001000003B800000011, 00410FC0)
2016-04-20 16:30:24,061 [3939] barman.backup_executor INFO: This is the first backup for server standby
2016-04-20 16:30:24,071 [3939] barman.backup_executor INFO: Copying files.
2016-04-20 16:30:24,072 [3939] barman.command_wrappers INFO: Smart copy: ':/var/lib/postgresql/tb_hdd/' -> '/var/lib/barman/standby/base/20160420T162554/147171' (ref: None, safe before None)
2016-04-20 16:30:24,072 [3939] barman.command_wrappers INFO: Smart copy step 1/4: preparation
2016-04-20 16:30:24,395 [3939] barman.command_wrappers INFO: Smart copy step 2/4: create directories and delete/copy unknown files
2016-04-20 16:30:24,559 [3939] barman.command_wrappers INFO: Smart copy step 3/4: safe copy
2016-04-20 16:34:17,863 [3939] barman.command_wrappers INFO: Smart copy finished: :/var/lib/postgresql/tb_hdd/ -> /var/lib/barman/standby/base/20160420T16255/1471071 (safe before None)
2016-04-20 16:34:17,883 [3939] barman.backup ERROR: Backup failed writing backup label.
DETAILS: [Errno 2] No such file or directory: '/var/lib/barman/standby/base/20160420T162554/data/backup_label'
Looks like mkdir /var/lib/barman/standby/base/20160420T162554/data
would fix that.
If README was renamed to README.md, the readme on github would render nicely.
Original: https://github.com/2ndquadrant-it/barman
Beautiful: https://github.com/ckeeney/barman/tree/69-rename-readme
I use a custom path for configuration file. Everything works (backup, list-servers, ..) except the cron
feature
$ barman -v
1.5.1
$ barman -c ~/etc/barman.conf list-server
# ok
$ barman -c ~/etc/barman.conf backup all
# fine, too
$ barman -c ~/etc/barman.conf cron
Starting WAL archiving for server example.net
$ Could not find any configuration file at default locations.
Check Barman's documentation for more help
It seems barman cron
starts a sub process which doesn't correctly handle configuration file detection; As in the screenshot, the error message comes after the shell prompts (the last $
)
If you use barman 2.0 to do backup for postgresql 9.6 It will block for a long time the bakcup status always is :"STARTED",
barman check is OK .
barman archiver error
barman check output :
Server pg92:
PostgreSQL: OK
superuser: OK
PostgreSQL streaming: OK
wal_level: OK
directories: OK
retention policy settings: OK
backup maximum age: OK (no last_backup_maximum_age provided)
compression settings: OK
failed backups: OK (there are 0 failed backups)
minimum redundancy requirements: OK (have 2 backups, expected at least 2)
pg_basebackup: OK
pg_basebackup compatible: OK
pg_basebackup supports tablespaces mapping: OK (pg_basebackup can be used as long as tablespaces support is not required)
archive_mode: OK
archive_command: OK
pg_receivexlog: OK
pg_receivexlog compatible: OK
receive-wal running: OK
archiver errors: FAILED (unknown: 1)
disanose info :
"pg92": {
"backups": {
"20161008T121110": {
"backup_id": "20161008T121110",
"backup_label": null,
"begin_offset": 32,
"begin_time": "Sat Oct 8 08:10:47 2016",
"begin_wal": "000000010000000A000000E3",
"begin_xlog": "A/E3000020",
"config_file": "/s2/postgres/data/postgresql.conf",
"deduplicated_size": 1890979321,
"end_offset": 16777216,
"end_time": "Sat Oct 8 16:10:43 2016",
"end_wal": "000000010000000A000000E3",
"end_xlog": "A/E4000000",
"error": null,
"hba_file": "/s2/postgres/data/pg_hba.conf",
"ident_file": "/s2/postgres/data/pg_ident.conf",
"included_files": null,
"mode": "postgres",
"pgdata": "/s2/postgres/data",
"server_name": "pg92",
"size": 1890979321,
"status": "DONE",
"tablespaces": null,
"timeline": 1,
"version": 90203
},
"20161012T105632": {
"backup_id": "20161012T105632",
"backup_label": null,
"begin_offset": 32,
"begin_time": "Wed Oct 12 02:56:35 2016",
"begin_wal": "000000010000000A000000ED",
"begin_xlog": "A/ED000020",
"config_file": "/s2/postgres/data/postgresql.conf",
"deduplicated_size": 1895742096,
"end_offset": 16777216,
"end_time": "Wed Oct 12 10:56:30 2016",
"end_wal": "000000010000000A000000ED",
"end_xlog": "A/EE000000",
"error": null,
"hba_file": "/s2/postgres/data/pg_hba.conf",
"ident_file": "/s2/postgres/data/pg_ident.conf",
"included_files": null,
"mode": "postgres",
"pgdata": "/s2/postgres/data",
"server_name": "pg92",
"size": 1895742096,
"status": "DONE",
"tablespaces": null,
"timeline": 1,
"version": 90203
}
},
"config": {
"active": true,
"archiver": true,
"archiver_batch_size": 0,
"backup_directory": "/backdisk/barman/pg92",
"backup_method": "postgres",
"backup_options": "concurrent_backup",
"bandwidth_limit": null,
"barman_home": "/backdisk/barman",
"barman_lock_directory": "/backdisk/barman",
"basebackup_retry_sleep": 300,
"basebackup_retry_times": 3,
"basebackups_directory": "/backdisk/barman/pg92/base",
"check_timeout": 30,
"compression": null,
"conninfo": "host=192.168.92.236 port=5432 user=lhs dbname=postgres password=pgpass",
"custom_compression_filter": null,
"custom_decompression_filter": null,
"description": "pg92 Postgresql Database (Streaming-Only)",
"disabled": false,
"errors_directory": "/backdisk/barman/pg92/errors",
"immediate_checkpoint": false,
"incoming_wals_directory": "/backdisk/barman/pg92/incoming",
"last_backup_maximum_age": null,
"minimum_redundancy": 2,
"msg_list": [],
"name": "pg92",
"network_compression": false,
"path_prefix": "/usr/lib/postgresql/9.2/bin",
"post_archive_retry_script": null,
"post_archive_script": null,
"post_backup_retry_script": null,
"post_backup_script": "/backdisk/barman_script/post_backup_script.sh",
"pre_archive_retry_script": null,
"pre_archive_script": null,
"pre_backup_retry_script": null,
"pre_backup_script": null,
"recovery_options": "",
"retention_policy": "window 7 w",
"retention_policy_mode": "auto",
"reuse_backup": null,
"slot_name": null,
"ssh_command": null,
"streaming_archiver": true,
"streaming_archiver_batch_size": 0,
"streaming_archiver_name": "barman_receive_wal",
"streaming_backup_name": "barman_streaming_backup",
"streaming_conninfo": "host=192.168.92.236 port=5432 user=lhs dbname=postgres password=pgpass",
"streaming_wals_directory": "/backdisk/barman/pg92/streaming",
"tablespace_bandwidth_limit": null,
"wal_retention_policy": "simple-wal 7 w",
"wals_directory": "/backdisk/barman/pg92/wals"
},
"status": {
"archive_command": "test ! -f /s2/postgres/archive/%f && cp %p /s2/postgres/archive/%f",
"archive_mode": "on",
"config_file": "/s2/postgres/data/postgresql.conf",
"connection_error": null,
"current_size": 1898866308.0,
"current_xlog": "000000010000000A000000F9",
"data_directory": "/s2/postgres/data",
"hba_file": "/s2/postgres/data/pg_hba.conf",
"ident_file": "/s2/postgres/data/pg_ident.conf",
"is_superuser": true,
"pg_basebackup_bwlimit": false,
"pg_basebackup_compatible": true,
"pg_basebackup_installed": true,
"pg_basebackup_path": "/usr/lib/postgresql/9.2/bin/pg_basebackup",
"pg_basebackup_tbls_mapping": false,
"pg_basebackup_version": "9.2.18",
"pg_receivexlog_compatible": true,
"pg_receivexlog_installed": true,
"pg_receivexlog_path": "/usr/lib/postgresql/9.2/bin/pg_receivexlog",
"pg_receivexlog_supports_slots": false,
"pg_receivexlog_synchronous": false,
"pg_receivexlog_version": "9.2.18",
"pgespresso_installed": false,
"replication_slot": null,
"replication_slot_support": false,
"server_txt_version": "9.2.3",
"streaming": true,
"streaming_supported": true,
"synchronous_standby_names": [
""
],
"systemid": "5851801377755352795",
"timeline": 1,
"wal_level": "hot_standby",
"xlogpos": "A/F92DE440"
}
},
In case of tablespaces, Barman 1.5.0 does not correctly calculate the backup size. Please review the backup_fsync_and_set_sizes() function.
Thanks to Matthew Oldham for pointing it out.
I've noticed that within the "barman check " output you have a "failed backups" check. However this doesn't seem to calculate correctly, but i think that the problem stems from the barman list-backup and the fact that a backup that loses it PID, never fails.
See backup:
[barman@em-vus-pgbuilder ~]$ barman list-backup ros_management
ros_management 20160531T165506 - STARTED
Even though i deliberately killed the PID here (I'm trying to write some backup failure alerts), the barman backup status remains as "started" indefinite and never turns to a "failed" state. I assume that this is because there's no longer the communication, but I'd expect a timeout or perhaps some kind of PID failure detection?
Thanks and keep up the great work
We had a backup reported as FAILED, with the following error in the logs:
rsync: read errors mapping "/our/pgdata/dir/base/16389/4452859": No data available (61)
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1518) [generator=3.0.9]
As far as I can see, rsync can throw this error on files that are truncated during transfer. Our setup has a lot of volatile (bursts of) data, where it is likely that (auto)vacuum can truncate a file once in a while.
During the backup, the number of dead tuples decreased from 130K to about 30K, which makes it more likely it was actually a truncate due to a vacuum.
I don't have logs explicitly stating that it was a vacuum however.
To provide a more secure setup we changed from the trust authentication to password based authentication for our backup user using a .pgpass
file to store the credentials. This apparently isn't recognized by barman, or at least I have not configured barman to take advantage of this.
Running barman check dbhost
returns an error:
Server dbhost:
ssh: OK
PostgreSQL: FAILED
directories: OK
retention policy settings: OK
backup maximum age: OK (no last_backup_maximum_age provided)
compression settings: OK
minimum redundancy requirements: OK (have 1 backups, expected at least 1)
barman backup dbhost
states:
Starting backup for server dbhost in /home/barman/dbhost/base/20160822T162300
ERROR: Backup failed issuing start backup command.
DETAILS: Cannot connect to postgres: fe_sendauth: no password supplied
The documentation indicates support for using trust, but is password based authentication not at all supported?
Barman does not work with archive_mode = always setting. It is needed to archive WAL for cascading slave or warm standby setting from a slave.
I had the following message with this setting:
2016-03-08 05:56:31 UTC [2181-1] postgres@postgres ERROR: invalid input syntax for type boolean: "always"
2016-03-08 05:56:31 UTC [2181-2] postgres@postgres STATEMENT: SELECT *, current_setting('archive_mode')::BOOLEAN AND (last_failed_wal IS NULL OR last_failed_wal LIKE '%.history' AND substring(last_failed_wal from 1 for 8) <= substring(last_archived_wal from 1 for 8) OR last_failed_wal <= last_archived_wal) AS is_archiving, CAST (archived_count AS NUMERIC) / EXTRACT (EPOCH FROM age(now(), stats_reset)) AS current_archived_wals_per_second FROM pg_stat_archiver
2016-03-08 05:56:31 UTC [2181-3] postgres@postgres ERROR: current transaction is aborted, commands ignored until end of transaction block
2016-03-08 05:56:31 UTC [2181-4] postgres@postgres STATEMENT: SELECT count(*) FROM pg_extension WHERE extname = 'pgespresso'
This is because trying to convert archive_mode into boolean. Archive_mode changed from boolean to enum at PG 9.5.
I'd suggest to correct postgres.py, line from 380 to 391, as well as its caller.
It is simmiliar as this issue: https://sourceforge.net/p/pgbarman/tickets/77/, which was resolved by this commit: dcb22e8.
Environment:
There is some info from my pg_stat_archiver:
postgres=# select last_archived_wal, last_archived_time, last_failed_wal, last_failed_time from pg_stat_archiver;
last_archived_wal | last_archived_time | last_failed_wal | last_failed_time
--------------------------+-------------------------------+--------------------------+-------------------------------
000000010000006D0000004D | 2016-05-31 02:04:20.222315+02 | 000000010000006700000094 | 2016-05-22 02:23:37.379004+02
But when I try part of your select (last_failed_wal <= last_archived_wal
), it gives me FALSE:
postgres=# select last_failed_wal <= last_archived_wal as result from pg_stat_archiver;
result
--------
f
You can also try select 000000010000006700000094 <= 000000010000006D0000004D;
Same as select '7' < 'D';
I think this is wrong comparation because in fact 000000010000006700000094
is older than 000000010000006D0000004D
Should this comparation last_failed_wal <= last_archived_wal
be replaced with last_failed_time <= last_archived_time
?
Hello,
We are trying to prepare PostgreSQL 9.5 as a new production environment. But we are encountering issues with barman, which we don't get on PostgreSQL 9.4.
After every backup we automatically restore the database and run some tests on it. Because we are running multiple database instances, all installed via puppet, we need to relink the tablespaces. (else multiple databases would try to deploy in the same directories).
To enable this we use the '--tablespace' command: --tablespace=audit_null:/pgdata/dwhdevonhw/audit/null
This seems to work correctly, because this is what we see when restoring for said tablespace:
16422, audit_null, /pgdata/dwhdevonhw/audit/null
The restore creates the correct dir, and restores the data correctly:
$ ls -la /pgdata/dwhdevonhw/audit/null/
total 0
drwx------. 3 barman barman 37 Jun 27 16:41 .
drwxrwxr-x. 7 barman barman 87 Sep 27 12:52 ..
drwx------. 3 barman barman 26 Jun 27 16:42 PG_9.5_201510051
But the relinking didn't happen:
ls -la /mnt/data/restoretests/dwhdevonhw/pg_tblspc/16422
lrwxrwxrwx. 1 barman barman 18 Sep 27 13:45 /mnt/data/restoretests/dwhdevonhw/pg_tblspc/16422 -> /pgdata/audit/null
The odd thing is that this relinking does work on PostgreSQL 9.4
Anyone has an idea what is going wrong?
Bert
Another failing backup problem. We're seeing this pretty often, too. This is backing up a postgres 9.1.9 instance with barman 1.5.1 running with Python 2.6.6 (system python on Centos6) installed using pip.
2015-11-23 01:00:54,147 [8642] barman.backup_executor INFO: Copy done.
2015-11-23 01:00:54,151 [8642] barman.backup_executor INFO: Asking PostgreSQL server to finalize the backup.
2015-11-23 01:00:54,839 [8642] barman.backup ERROR: Backup failed issuing start backup command.
DETAILS: Cannot terminate exclusive backup. You might have to manually execute pg_stop_backup() on your PostgreSQL server
2015-11-23 01:00:55,455 [32660] barman.cli ERROR: 'NoneType' object has no attribute 'rfind'
See log file for more details.
Traceback (most recent call last):
File "/usr/local/barman/lib/python2.6/site-packages/barman-1.5.1-py2.6.egg/barman/cli.py", line 865, in main
p.dispatch(pre_call=global_config)
File "build/bdist.linux-x86_64/egg/argh/helpers.py", line 55, in dispatch
return dispatch(self, *args, **kwargs)
File "build/bdist.linux-x86_64/egg/argh/dispatching.py", line 174, in dispatch
for line in lines:
File "build/bdist.linux-x86_64/egg/argh/dispatching.py", line 277, in _execute_command
for line in result:
File "build/bdist.linux-x86_64/egg/argh/dispatching.py", line 231, in _call
result = function(namespace_obj)
File "/usr/local/barman/lib/python2.6/site-packages/barman-1.5.1-py2.6.egg/barman/cli.py", line 470, in list_files
for line in backup_id.get_list_of_files(args.target):
File "/usr/local/barman/lib/python2.6/site-packages/barman-1.5.1-py2.6.egg/barman/infofile.py", line 524, in get_list_of_files
for x in self.get_required_wal_segments():
File "/usr/local/barman/lib/python2.6/site-packages/barman-1.5.1-py2.6.egg/barman/xlog.py", line 163, in enumerate_segments
end_tli, end_log, end_seg = decode_segment_name(end)
File "/usr/local/barman/lib/python2.6/site-packages/barman-1.5.1-py2.6.egg/barman/xlog.py", line 125, in decode_segment_name
name = os.path.basename(path)
File "/usr/lib64/python2.6/posixpath.py", line 111, in basename
i = p.rfind('/') + 1
AttributeError: 'NoneType' object has no attribute 'rfind'
We've configured barman-wal-restore
for a standby server (loosely coupled to the production system).
We noticed some tracebacks in barman.log because get-wal
tries to decompress a WAL file which is not complete.
2015-10-25 12:22:03,265 [14064] barman.cli ERROR: {'err': u'\ngzip: stdin: unexpected end of file\n', 'ret': 1, 'out': u''}
See log file for more details.
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/barman/cli.py", line 828, in main
p.dispatch(pre_call=global_config)
File "/usr/lib/python2.7/dist-packages/argh/helpers.py", line 53, in dispatch
return dispatch(self, *args, **kwargs)
File "/usr/lib/python2.7/dist-packages/argh/dispatching.py", line 124, in dispatch
for line in lines:
File "/usr/lib/python2.7/dist-packages/argh/dispatching.py", line 200, in _execute_command
for line in result:
File "/usr/lib/python2.7/dist-packages/argh/dispatching.py", line 156, in _call
result = args.function(args)
File "/usr/lib/python2.7/dist-packages/barman/cli.py", line 530, in get_wal
output_directory=output_directory)
File "/usr/lib/python2.7/dist-packages/barman/server.py", line 1180, in get_wal
wal_compressor.decompress(source_file, uncompressed_file.name)
File "/usr/lib/python2.7/dist-packages/barman/command_wrappers.py", line 118, in __call__
self.getoutput(*args, **kwargs)
File "/usr/lib/python2.7/dist-packages/barman/command_wrappers.py", line 158, in getoutput
ret=self.ret, out=self.out, err=self.err))
CommandFailedException: {'err': u'\ngzip: stdin: unexpected end of file\n', 'ret': 1, 'out': u''}
The partial log files should be saved with a .tmp
extension and mv
to their final location once they are compressed.
I know barman has list-files, which sort of almost works for exporting (it doesn't list empty dirs, so they don't get exported). But there is no way to import a backup back into barman.
It would be awesome if there was an easy way to export and import backups in/out of barman.
WHY?
Maybe we want to keep yearly backups, or for some business purpose we need a snapshot of the DB as of some particular day. Currently to do that you have to turn off all the retention policy stuff and manually delete backups.
it would be awesome if aside from retention policy you can either mark a backup 'special'(so it won't be auto-deleted from retention policy) or a much more generic solution, is just allow export and import of backups (and then you could dump them into S3 or Amazon Glacier or whatever and keep them forever if desired).
Hi guys,
While testing beta version of the 1.6.0 release I've found that WALs are removed from the archive if WAL compression is activated (independently from the chosen algorithm) if they are shipped through a streaming connection.
I attach here what Barman log reported in the case pigz was chosen as compression algorithm:
2016-02-08 10:06:01,564 [26066] barman.wal_archiver INFO: Archiving master/00000001000000000000005C 2016-02-08 10:06:01,565 [26066] barman.command_wrappers DEBUG: Command: 'command(){ pigz -c > "$2" < "$1";}; command \'/srv/master/incoming/00000001000000000000005C\' \'/srv/master/wals/0000000100000000/00000001000000000000005C.tmp\'' 2016-02-08 10:06:02,720 [26066] barman.command_wrappers DEBUG: Command return code: 0 2016-02-08 10:06:02,721 [26066] barman.command_wrappers DEBUG: Command stdout: 2016-02-08 10:06:02,721 [26066] barman.command_wrappers DEBUG: Command stderr: 2016-02-08 10:06:02,733 [26066] barman.wal_archiver INFO: Archiving master/00000001000000000000005C 2016-02-08 10:06:02,733 [26066] barman.command_wrappers DEBUG: Command: 'command(){ pigz -c -d > "$2" < "$1" && rm -f "$1";}; command \'/srv/master/wals/0000000100000000/00000001000000000000005C\' \'/srv/master/wals/0000000100000000/00000001000000000000005C.uncompressed\'' 2016-02-08 10:06:02,844 [26066] barman.command_wrappers DEBUG: Command return code: 0 2016-02-08 10:06:02,845 [26066] barman.command_wrappers DEBUG: Command stdout: 2016-02-08 10:06:02,845 [26066] barman.command_wrappers DEBUG: Command stderr:
It looks like Barman compress the WAL, then decompress and remove it, each time a WAL is archived: we are investigating about the issue.
Trying to remove a backup from a server, barman exit with the following exception:
EXCEPTION: 'str' object has no attribute 'backup_id'
How to replicate the error:
When run barman replication-status server-id
error:
EXCEPTION: 'Record' object has no attribute 'slot_name'
See log file for more details.
The log:
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/barman/cli.py", line 1022, in main
p.dispatch(pre_call=global_config)
File "/usr/lib/python2.7/dist-packages/argh/helpers.py", line 53, in dispatch
return dispatch(self, _args, *_kwargs)
File "/usr/lib/python2.7/dist-packages/argh/dispatching.py", line 125, in dispatch
for line in lines:
File "/usr/lib/python2.7/dist-packages/argh/dispatching.py", line 202, in _execute_command
for line in result:
File "/usr/lib/python2.7/dist-packages/argh/dispatching.py", line 158, in _call
result = args.function(args)
File "/usr/lib/python2.7/dist-packages/barman/cli.py", line 287, in replication_status
server.replication_status(args.target)
File "/usr/lib/python2.7/dist-packages/barman/server.py", line 1791, in replication_status
standby_info)
File "/usr/lib/python2.7/dist-packages/barman/output.py", line 254, in result
_dispatch(_writer, 'result', command, _args, *_kwargs)
File "/usr/lib/python2.7/dist-packages/barman/output.py", line 127, in _dispatch
return handler(_args, *_kwargs)
File "/usr/lib/python2.7/dist-packages/barman/output.py", line 839, in result_replication_status
if standby.slot_name:
AttributeError: 'Record' object has no attribute 'slot_name'
Fix the barman status command output:
Last archived WAL
appears twice# barman status main
Server main:
Description: main PostgreSQL Database
Active: True
Disabled: False
PostgreSQL version: 9.3.10
pgespresso extension: Not available
PostgreSQL Data directory: /pgdata
PostgreSQL 'archive_command' setting: rsync -a %p barman@backup:/main/incoming/%f
Last archived WAL: 00000003.history
Current WAL segment: 00000004000038E0000000A5
Retention policies: enforced (mode: auto, retention: REDUNDANCY 1, WAL retention: MAIN)
No. of available backups: 1
First available backup: 20160120T000004
Last available backup: 20160120T000004
Minimum redundancy requirements: satisfied (1/1)
Last archived WAL: 00000003.history
Last archived wal
could be inaccurate, not showing the latest archived WAL but an older file.barman 2.0 base backup not working for postgrersql 9.6
as the server version is 9.6 (server pg9.6)
and run :
barman backup pg9.6
will block , and never return !
We copied our PostgreSQL server to a 2nd machine to do tests and troubleshooting.
Unfortunately, we forgot to turn off the archive_mode
before starting PostgreSQL.
The consequence is that both servers rsync
the WALs files to the same /incoming/
directory and the incremental archive becomes unusable.
Maybe barman
can give more prominent warnings in such case.
Alternatively, the documentation could suggest an archive_command
which prevents such mistake.
Barman 1.5.0 does not work correctly with PostgreSQL 8.4, due to a check of the wal_level
parameter.
The parameter does not exists on PostgreSQL 8.4.
Manage PostgreSQL checks differently, always considering older versions.
Hello.
I would like to discuss page-level incremental backups.
I’ve created proof-of-concept fork of barman here
There is no docs and unit-tests right now, but this will be fixed in near future.
Motivation:
We have large number of databases with pgdata size about 3 terabytes and changes about 1% of data per 24h.
Unfortunately barman backups with hardlinks gives us about 45% deduplication ratio (there are small changes in many data-files, so many data-files changes between backups, but page changed ratio is about 2%)
Solution to this problem seems simple: take only changed pages to backup.
I’ve created simple script named barman-incr (it is in bin dir of source code). It handles backup and restore operations. Barman runs it on database host and passes LSN, timestamp and list of files from previous backup. Then we just open each datafile and read every page in it (if it turns out that file we opened is not datafile, we’ll take it all). If page is lsn >= provided lsn we take this page to backup.
Some tests:
Database with pgdata size 2.7T, 120G wals per 24h.
Full backup size is 537G (compressed with gzip -3), time to take backup - 7h.
Incremental backup size is 14G (also compressed with gzip -3), time to take backup - 30m.
I’ve also tested restore consistency (restored database to some point of time and compared pg_dump result with paused replica).
Block change tracking (Oracle DBAs should be familiar with this, here is white paper about this) implementation will require some changes in wal archiving process. I’ll present some thoughts and test results on this in Q1 2016.
Backups are failing alarmingly often for us because of this error:
2015-11-23 21:40:19,148 [21291] barman.backup ERROR: Backup failed copying files.
DETAILS: data transfer failure on directory '/usr/local/pgsql/data'
rsync error:
... list of files here ...
rsync: link_stat "/usr/local/pgsql/data/base/pgsql_tmp/pgsql_tmp14179.115567" failed: No such file or directory (2)
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1505) [generator=3.0.6]
We are using Postgres 9.4.5, rsync 3.0.6, and barman 1.5.1. I see in the code that this is supposed to be ignored except that our backup definitely failed according to barman:
postgres@xxx:/usr/lusers/plockaby$ barman list-backup xxx
xxx 20151124T211803 - Tue Nov 24 21:50:25 2015 - Size: 38.5 GiB - WAL Size: 897.7 MiB
xxx 20151123T211803 - FAILED
xxx 20151122T211803 - Sun Nov 22 21:39:20 2015 - Size: 33.8 GiB - WAL Size: 20.0 GiB
xxx 20151121T211804 - Sat Nov 21 21:38:43 2015 - Size: 33.4 GiB - WAL Size: 6.4 GiB
xxx 20151120T211803 - FAILED
xxx 20151119T211803 - FAILED
xxx 20151119T171220 - Thu Nov 19 17:44:21 2015 - Size: 32.7 GiB - WAL Size: 15.6 GiB
xxx 20151119T112806 - Thu Nov 19 11:44:58 2015 - Size: 32.5 GiB - WAL Size: 2.8 GiB
Could we have a F.A.Q entry about PostgreSQL major upgrades?
We are on the way to upgrade some server from 9.1 to 9.4, and I've found no hint how to be safe on the Barman side.
Found this similar question on SF : http://sourceforge.net/p/pgbarman/tickets/34/
When executing replication-status I get a misleading exception message (unable to connect).
The reason is that replication-status code uses pg_xlog_location_diff which has been introduced in 9.2.
2016-07-12 16:06:02,790 [7734] barman.postgres DEBUG: Error retrieving status of standby servers: function pg_xlog_location_diff(text, text) does not exist
LINE 1: ...te , CASE WHEN pg_is_in_recovery() THEN NULL ELSE pg_xlog_lo...
^
HINT: No function matches the given name and argument types. You might need to add explicit type casts.
2016-07-12 16:06:02,812 [7734] barman.server ERROR: Unable to connect to server XXXXX
Hello does it give a complete example setup for postgres and barman to backup?
We have kubernetes in use and would like to backup our postgres with a sidecar container with barman. Have someone hints or examples for us?
Regards, Josef
EDIT: we use a postgres 9.5
I ran the same backup recover
command again, after 30 minutes, and I noticed that the rsync command transfer only the changed files, except for the pg_xlog/
content, which is purged and rewritten on the destination server.
The recovery could be faster, if we preserved the WALs when launching the command again.
Maybe the pg_xlog
content could be appended to the exclude_and_protect
variable for rsync.
For standard PostgreSQL connections, it might be useful to check if the specified user has superuser privileges, in order to improve output messages - as reported by Grzegorz Polek (https://sourceforge.net/p/pgbarman/tickets/79/).
We could use a query like the following to check that:
SELECT usesuper FROM pg_user WHERE usename = CURRENT_USER;
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.