Giter Club home page Giter Club logo

pglookout's People

Contributors

aiqin-aiven avatar alexole avatar alxric avatar aris-aiven avatar carobme avatar docemmetbrown avatar egor-voynov-aiven avatar facetoe avatar ivanyu avatar jankatins avatar jlprat avatar kathia-barahona avatar kmichel-aiven avatar melor avatar narsimoes avatar nkchern-avn avatar oikarinen avatar ojarva avatar orange-kao avatar ormod avatar packi avatar peekeri avatar rdunklau avatar rikonen avatar rushidave avatar saaros avatar sjamgade avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pglookout's Issues

No alert when pglookout fails to connect to postgres

It would be great if pglookout could create an alert file when it has problems connecting to postgres, either because of configuration error or network problem. Maybe even separate alerts if it is a configuration error (like authentication error) or a network problem?

Currently this just results in errors in logs that no-one might be looking at.

2015-01-19T10:06:29.330464+00:00 skyblue-fsio-ddsbe02a local2.warning<148> pglookout WARNING: No knowledge on if: u'1.2.3.4' {u'connection': False, u'fetch_time': u'2015-01-19T10:06:23.412502Z'} from observer: '1.2.3.5' is in recovery
2015-01-19T10:06:29.330472+00:00 skyblue-fsio-ddsbe02a local2.warning<148> pglookout WARNING: No known master node, disconnected masters: []
2015-01-19T10:06:29.330476+00:00 skyblue-fsio-ddsbe02a local2.warning<148> pglookout WARNING: No standby nodes set, master node: None
2015-01-19T10:06:29.993308+00:00 skyblue-fsio-ddsbe02a local2.warning<148> ClusterMonitor WARNING: Problem in connecting to DB at: '1.2.3.4'
2015-01-19T10:06:30.026400+00:00 skyblue-fsio-ddsbe02a local2.warning<148> ClusterMonitor WARNING: Problem in connecting to DB at: '1.2.3.6'

Slave promotion

I am currently evaluating pglookout for our needs. One scenario is as follows:
We have three sites, one with the master database (1), two with one slave database each: (2) and (3).
On all three sites pglookout is running. The pglookout on site (3) is also used as observer for sites (1) and (2).

Starting from a stable situation, we make communication between the sites (1) and (2) impossible. Site (3) can still communicate with both site (1) and (2).

Now on site (2) pglookout promotes the slave to master, even though it knows from the observer that it still can see a master on site (1). But it seems it does not take the information from the observer into account at all when making the decision to promote.

Should the promotion decision not be based on observer data too?

Unable to install pglookout-2.0.2.tar.gz on Rocky 9.2 Linux server with python39 version

Hi Team,

I am trying to install pglookout-2.0.2.tar.gz on Rocky 9.2 OS where we have python39 installed:

python3-3.9.16-1.el9_2.2.x86_64


> python --version
> Python 3.9.16

Setuptools version installed in the server is:

rpm -qa | grep python3-setuptools
python3-setuptools-wheel-53.0.0-12.el9.noarch
python3-setuptools-53.0.0-12.el9.noarch

While trying to install pglookout i am getting the below error:

/usr/bin/python3.9 -m pip install pglookout-2.0.2.tar.gz
Processing ./pglookout-2.0.2.tar.gz
Installing build dependencies ... error
ERROR: Command errored out with exit status 1:
command: /usr/bin/python3.9 /tmp/pip-standalone-pip-0ww8orgl/env_pip.zip/pip install --ignore-installed --no-user --prefix /tmp/pip-build-env-x41fhgec/overlay --no-warn-script-location --no-binary :none: --only-binary :none: -i https://pypi.org/simple -- 'setuptools>=40.8.0' wheel
cwd: None
Complete output (7 lines):
WARNING: Retrying (Retry(total=4, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<pip._vendor.urllib3.connection.HTTPSConnection object at 0x7f032e6d4c70>: Failed to establish a new connection: [Errno 101] Network is unreachable')': /simple/setuptools/
WARNING: Retrying (Retry(total=3, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<pip._vendor.urllib3.connection.HTTPSConnection object at 0x7f032e98aa30>: Failed to establish a new connection: [Errno 101] Network is unreachable')': /simple/setuptools/
WARNING: Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<pip._vendor.urllib3.connection.HTTPSConnection object at 0x7f032e98a550>: Failed to establish a new connection: [Errno 101] Network is unreachable')': /simple/setuptools/
WARNING: Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<pip._vendor.urllib3.connection.HTTPSConnection object at 0x7f032e6d4f40>: Failed to establish a new connection: [Errno 101] Network is unreachable')': /simple/setuptools/
WARNING: Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<pip._vendor.urllib3.connection.HTTPSConnection object at 0x7f032e6d4970>: Failed to establish a new connection: [Errno 101] Network is unreachable')': /simple/setuptools/
ERROR: Could not find a version that satisfies the requirement setuptools>=40.8.0 (from versions: none)
ERROR: No matching distribution found for setuptools>=40.8.0


WARNING: Discarding file:///var/pginstaller16/pglookout-2.0.2.tar.gz. Command errored out with exit status 1: /usr/bin/python3.9 /tmp/pip-standalone-pip-0ww8orgl/env_pip.zip/pip install --ignore-installed --no-user --prefix /tmp/pip-build-env-x41fhgec/overlay --no-warn-script-location --no-binary :none: --only-binary :none: -i https://pypi.org/simple -- 'setuptools>=40.8.0' wheel Check the logs for full command output.
ERROR: Command errored out with exit status 1: /usr/bin/python3.9 /tmp/pip-standalone-pip-0ww8orgl/env_pip.zip/pip install --ignore-installed --no-user --prefix /tmp/pip-build-env-x41fhgec/overlay --no-warn-script-location --no-binary :none: --only-binary :none: -i https://pypi.org/simple -- 'setuptools>=40.8.0' wheel Check the logs for full command output.

python3.9 -m pip list | grep setuptool
setuptools               53.0.0
python3.9 -m pip list | grep pip
pip                      21.2.3

I have a higher version of setuptools , but still it keeps looking for setuptool version. Any suggestion would be helpful

No install debian bullsey

dpkg-checkbuilddeps: error: Unmet build dependencies: dh-systemd (>= 1.2.2)

No dh-systemd debian bullsey

pglookout: OperationalError (could not connect to server: Connection refused

Hi Team,

Need your suggestion on why pglookout is throwing the warning message OperationalError.

pglookout logs:

Nov 18 09:14:49 vm-db1 pglookout: 2021-11-18 09:14:49,630#011ClusterMonitor#011Thread-1#011INFO#011Connecting to 'vm-db2.nokia.local' (dbname='postgres' host='vm-db2.nokia.local' user='pglookout'; hidden password)
Nov 18 09:14:49 vm-db1 pglookout: 2021-11-18 09:14:49,632#011ClusterMonitor#011Thread-1#011WARNING#011OperationalError (could not connect to server: Connection refused
Nov 18 09:14:49 vm-db1 pglookout: Is the server running on host "vm-db2.nokia.local" (192.9.0.12) and accepting
Nov 18 09:14:49 vm-db1 pglookout: TCP/IP connections on port 5432?) connecting to vm-db2.nokia.local ('vm-db2.nokia.local' (dbname='postgres' host='vm-db2.nokia.local' user='pglookout'; hidden password))
Nov 18 09:14:49 vm-db1 pglookout: 2021-11-18 09:14:49,634#011ClusterMonitor#011ThreadPoolExecutor-3_0#011WARNING#011OperationalError (server closed the connection unexpectedly

NoneType error on observer nodes

I'm getting a constant NoneType error on observer nodes. Looks like its trying to get the state of own_db which is none for observers.

Here's the stack trace:
Jul 26 14:53:04 ip-10-0-152-46 pglookout MainThread ERROR: Failed to check cluster state
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/pglookout-1.3.1-py2.7.egg/pglookout/pglookout.py", line 583, in main_loop
self.check_cluster_state()
File "/usr/local/lib/python2.7/dist-packages/pglookout-1.3.1-py2.7.egg/pglookout/pglookout.py", line 303, in check_cluster_state
self.emit_stats(own_state)
File "/usr/local/lib/python2.7/dist-packages/pglookout-1.3.1-py2.7.egg/pglookout/pglookout.py", line 278, in emit_stats
if self.is_restoring_or_catching_up_normally(state):
File "/usr/local/lib/python2.7/dist-packages/pglookout-1.3.1-py2.7.egg/pglookout/pglookout.py", line 259, in is_restoring_or_catching_up_normally
replication_start_time = state.get("replication_start_time")
AttributeError: 'NoneType' object has no attribute 'get'

pglookout instance getting killed due to " PermissionError: [Errno 13] Permission denied: '/usr/local/lib/python3.8/site-packages/pglookout-2.0.2-py3.8.egg/EGG-INFO/requires.txt'"

Hi ,

i have installed pglookout-2.0.2.egg file on Rocky 8.6 OS with python 3.8 version. pglookout gets installed but while starting the service i am getting the below error . Any suggestion would be helpful.

Feb 14 11:38:28 pg1 systemd[1]: Started PostgreSQL streaming backup service.
Feb 14 11:38:28 pg1 pglookout[56197]: Traceback (most recent call last):
Feb 14 11:38:28 pg1 pglookout[56197]:  File "/usr/lib/python3.8/site-packages/pkg_resources/__init__.py", line 2700, in _dep_map
Feb 14 11:38:28 pg1 pglookout[56197]:    return self.__dep_map
Feb 14 11:38:28 pg1 pglookout[56197]:  File "/usr/lib/python3.8/site-packages/pkg_resources/__init__.py", line 2815, in __getattr__
Feb 14 11:38:28 pg1 pglookout[56197]:    raise AttributeError(attr)
Feb 14 11:38:28 pg1 pglookout[56197]: AttributeError: _Distribution__dep_map
Feb 14 11:38:28 pg1 pglookout[56197]: During handling of the above exception, another exception occurred:
Feb 14 11:38:28 pg1 pglookout[56197]: Traceback (most recent call last):
Feb 14 11:38:28 pg1 pglookout[56197]:  File "/usr/local/bin/pglookout", line 6, in <module>
Feb 14 11:38:28 pg1 pglookout[56197]:    from pkg_resources import load_entry_point
Feb 14 11:38:28 pg1 pglookout[56197]:  File "/usr/lib/python3.8/site-packages/pkg_resources/__init__.py", line 3252, in <module>
Feb 14 11:38:28 pg1 pglookout[56197]:    def _initialize_master_working_set():
Feb 14 11:38:28 pg1 pglookout[56197]:  File "/usr/lib/python3.8/site-packages/pkg_resources/__init__.py", line 3235, in _call_aside
Feb 14 11:38:28 pg1 pglookout[56197]:    f(*args, **kwargs)
Feb 14 11:38:28 pg1 pglookout[56197]:  File "/usr/lib/python3.8/site-packages/pkg_resources/__init__.py", line 3264, in _initialize_master_working_set
Feb 14 11:38:28 pg1 pglookout[56197]:    working_set = WorkingSet._build_master()
Feb 14 11:38:28 pg1 pglookout[56197]:  File "/usr/lib/python3.8/site-packages/pkg_resources/__init__.py", line 583, in _build_master
Feb 14 11:38:28 pg1 pglookout[56197]:    ws.require(__requires__)
Feb 14 11:38:28 pg1 pglookout[56197]:  File "/usr/lib/python3.8/site-packages/pkg_resources/__init__.py", line 900, in require
Feb 14 11:38:28 pg1 pglookout[56197]:    needed = self.resolve(parse_requirements(requirements))
Feb 14 11:38:28 pg1 pglookout[56197]:  File "/usr/lib/python3.8/site-packages/pkg_resources/__init__.py", line 794, in resolve
Feb 14 11:38:28 pg1 pglookout[56197]:    new_requirements = dist.requires(req.extras)[::-1]
Feb 14 11:38:28 pg1 pglookout[56197]:  File "/usr/lib/python3.8/site-packages/pkg_resources/__init__.py", line 2736, in requires
Feb 14 11:38:28 pg1 pglookout[56197]:    dm = self._dep_map
Feb 14 11:38:28 pg1 pglookout[56197]:  File "/usr/lib/python3.8/site-packages/pkg_resources/__init__.py", line 2702, in _dep_map
Feb 14 11:38:28 pg1 pglookout[56197]:    self.__dep_map = self._filter_extras(self._build_dep_map())
Feb 14 11:38:28 pg1 pglookout[56197]:  File "/usr/lib/python3.8/site-packages/pkg_resources/__init__.py", line 2730, in _build_dep_map
Feb 14 11:38:28 pg1 pglookout[56197]:    for extra, reqs in split_sections(self._get_metadata(name)):
Feb 14 11:38:28 pg1 pglookout[56197]:  File "/usr/lib/python3.8/site-packages/pkg_resources/__init__.py", line 3199, in split_sections
Feb 14 11:38:28 pg1 pglookout[56197]:    for line in yield_lines(s):
Feb 14 11:38:28 pg1 pglookout[56197]:  File "/usr/lib/python3.8/site-packages/pkg_resources/__init__.py", line 2387, in yield_lines
Feb 14 11:38:28 pg1 pglookout[56197]:    for ss in strs:
Feb 14 11:38:28 pg1 pglookout[56197]:  File "/usr/lib/python3.8/site-packages/pkg_resources/__init__.py", line 2767, in _get_metadata
Feb 14 11:38:28 pg1 pglookout[56197]:    for line in self.get_metadata_lines(name):
Feb 14 11:38:28 pg1 pglookout[56197]:  File "/usr/lib/python3.8/site-packages/pkg_resources/__init__.py", line 1432, in get_metadata_lines
Feb 14 11:38:28 pg1 pglookout[56197]:    return yield_lines(self.get_metadata(name))
Feb 14 11:38:28 pg1 pglookout[56197]:  File "/usr/lib/python3.8/site-packages/pkg_resources/__init__.py", line 1420, in get_metadata
Feb 14 11:38:28 pg1 pglookout[56197]:    value = self._get(path)
Feb 14 11:38:28 pg1 pglookout[56197]:  File "/usr/lib/python3.8/site-packages/pkg_resources/__init__.py", line 1616, in _get
Feb 14 11:38:28 pg1 pglookout[56197]:    with open(path, 'rb') as stream:
Feb 14 11:38:28 pg1 pglookout[56197]: PermissionError: [Errno 13] Permission denied: '/usr/local/lib/python3.8/site-packages/pglookout-2.0.2-py3.8.egg/EGG-INFO/requires.txt'
Feb 14 11:38:28 pg1 systemd[1]: pglookout.service: Main process exited, code=exited, status=1/FAILURE
Feb 14 11:38:28 pg1 systemd[1]: pglookout.service: Failed with result 'exit-code'.
Feb 14 11:38:28 pg1 dbus-daemon[748]: [system] Activating service name='org.fedoraproject.Setroubleshootd' requested by ':1.1513' (uid=0 pid=708 comm="/usr/sbin/sedispatch " label="system_u:system_r:auditd_t:s0") (using servicehelper)
Feb 14 11:38:28 pg1 systemd[1]: postgresql-14.service: Killing process 56030 (postmaster) with signal SIGKILL.
Feb 14 11:38:28 pg1 systemd[1]: pglookout.service: Service RestartSec=100ms expired, scheduling restart.
Feb 14 11:38:28 pg1 systemd[1]: pglookout.service: Scheduled restart job, restart counter is at 1.
Feb 14 11:38:28 pg1 systemd[1]: Stopped PostgreSQL replication monitoring and failover daemon.

Stopping of isolated primary

A nice feature would be to automatically bring down a primary server to which no or too few standby servers are connected. This would help to minimize the impact of a split brain. This could be implemented by calling a script specified in the configuration. Repmgr from 2ndQuadrant has something like this.

An example setup for which this would useful:

We have three sites. Site [1] with the primary database, site [2] with a standby, site [3] with another standby that is used as quorum/observer.
The primary [1] uses synchronous replication, preferably to standby [2] with the quorum/observer [3] as backup. After a failover, [2] will do the same.

If the primary [1] becomes isolated, pglookout will make the standby [2] (also) primary and make the quorum/observer [3] follow this new primary [2].
The old primary [1] will be unable to process write transactions due to synchronous replication and no connected standby,

Applications use a JDBC-connector with both [1] and [2] in it and targetServerType=master, to connect only to primary servers. In this case, though, there are two, and the application may also connect to the old primary [1]. Write transactions will hang, but this is still not optimal.

So it would help if pglookout would see that there are no standby servers connected anymore to the old primary [1] and have it start a script that would bring it down.

make rpm fails on Red Hat Enterprise Linux 8.3

`/usr/bin/pylint-3 --rcfile .pylintrc pglookout/ test/
************* Module test.test_lookout
test/test_lookout.py:643:24: W0612: Unused variable 'standby_nodes' (unused-variable)


Your code has been rated at 9.99/10 (previous run: 9.99/10, +0.00)

make[1]: *** [Makefile:24: pylint] Error 4
make[1]: Leaving directory '/home/portalify/pglookout/rpm/BUILD/pglookout'
virhe: Bad exit status from /var/tmp/rpm-tmp.59pw2K (%check)

RPM käännösvirheitä:
Bad exit status from /var/tmp/rpm-tmp.59pw2K (%check)
make: *** [Makefile:42: rpm] Virhe 1
`

permission denied for function pg_read_binary_file

For at least Postgres 15.4, the pglookout-user needs permission to use this function in the database it connects to, eg:

grant execute on function pg_read_binary_file(text) to pglookout;

This is not yet documented in the README.

ModuleNotFoundError: No module named 'version' error while making egg file

I am trying to create an egg file for pglookout with Python3.9 on rockyLinux 9.2 distribution.
however, when i am running the command, i am getting the below error :

python3.9 setup.py bdist_egg
Traceback (most recent call last):
File "/var/pginstaller16/rpms/setup.py", line 5, in
import version
ModuleNotFoundError: No module named 'version'

Kindly suggest way forward.

pglookout high CPU when master missing

When the master goes missing, the pg_lookout processes on other servers go into a loop without sleeps, using up to 100% CPU.

The problem seems to be that all waiting is done based on timeouts in reading queues.

  • The main_loop in pglookout.py waits for messages in the failover_decision_queue.get after putting a "Master is missing" message on the cluster_monitor_check_queue.
  • The run loop in cluster_monitor.py waits for messages in the cluster_monitor_check_queue, gets the message and calls main_monitoring_loop with requested_check=true, which puts a "Completed requested monitoring loop" message on the failover_decision_queue

So messages keep getting exchanged, no sleeping is done, and all CPU gets used.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.