zalando / pgobserver Goto Github PK
View Code? Open in Web Editor NEWA battle-tested, flexible & comprehensive monitoring solution for your PostgreSQL databases
Home Page: https://zalando.github.io/PGObserver/
License: Other
A battle-tested, flexible & comprehensive monitoring solution for your PostgreSQL databases
Home Page: https://zalando.github.io/PGObserver/
License: Other
Hi,
I tried to test PGObserver using the Quick Test Run Using Vagrant, and i got an error at STEP 6:
default: Step 6 : RUN pip install -r /app/requirements.txt
default: ---> Running in b64ab7ca8a57
default: Downloading/unpacking CherryPy==3.2.4 (from -r /app/requirements.txt (line 1))
default: Cannot fetch index base URL http://pypi.python.org/simple/
default: Could not find any downloads that satisfy the requirement CherryPy==3.2.4 (from -r /app/requirements.txt (line 1))
default: No distributions at all found for CherryPy==3.2.4 (from -r /app/requirements.txt (line 1))
default: Storing complete log in /root/.pip/pip.log
default: The command '/bin/sh -c pip install -r /app/requirements.txt' returned a non-zero code: 1
The SSH command responded with a non-zero exit status. Vagrant
assumes that this means the command failed. The output for this command
should be in the log above. Please read the output to determine what
went wrong.
It may be related to pypi, if so, fell free to close this issue.
Thank you for your work !
for amount of WAL written
Hey there,
i tried this tool today and unfortunately i am having some problems. I am able to start the frontend and also the gatherer. The frontend also display some information about my configured database. So i guess that the basic configuration is okay. However on several pages of the front end, no data is displayed. For example, the tables Top 10 Sprocs last 1 hour by total run time and Top 10 Sprocs last 3 hours by total run time are not displayed, while on the same page, Load Average 15min Sprocs only shows a graph. Furthermore the following tables dont show anything:
Especially Pg_Stat_Statements report is very important for me, but does not display any data. I have already configured the pg_stat_statement module in my cluster like this (added in postgresql.conf):
shared_preload_libraries = 'pg_stat_statements' # (change requires restart)
pg_stat_statements.max = 10000
pg_stat_statements.track = all
Restarted everything, but still no luck.
With this enabled, i am able to do a
select * from pg_stat_statement
when connected to my database in psql and get the expected result, but not in PGObserver web frontend.
Furthermore i get the following error message when trying to access the "Top Tables" page:
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/cherrypy/_cprequest.py", line 656, in respond
response.body = self.handler()
File "/usr/local/lib/python2.7/dist-packages/cherrypy/lib/encoding.py", line 188, in __call__
self.body = self.oldhandler(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/cherrypy/_cpdispatch.py", line 34, in __call__
return self.callable(*self.args, **self.kwargs)
File "/home/ilexx/Downloads/PGObserver/frontend/src/tablesfrontend.py", line 132, in index
top_tables = tabledata.getTopTables(None, date_from, date_to, order, limit)
File "/home/ilexx/Downloads/PGObserver/frontend/src/tabledata.py", line 388, in getTopTables
pattern = '%' + pattern + '%'
TypeError: cannot concatenate 'str' and 'NoneType' objects
Are there some configurations of postgresql that i am missing?
Could you please give me some more advice.
So far, thanks...
The current OAuth2-related code in oauth.py
expects a scope parameter to be present in the authorization response. Otherwise it won't identify the authorization response as such.
This is not in accordance with RFC6749 (Section 4.1.2, Authorization Code Grant - Authorization Response), could lead to trouble in future and should be removed.
I'll open a PR.
"Alltables view" could be extended also. Also there current "7days fixed" span should be made dynamic
get rid of hard coded database for data storage
On one occasion by looking at the sprocs/all page of pgObserver, I couldn't find the sproc that was actively called on the running database. The reason was, it was older than 2 APIs ago. The problem is, no newer versions of that sproc were called (the backend was still at the N-3 API version) and the data for it was missing altogether.
The limitations, AFAIK, are there for the case the when the sproc changes significantly over the time, the data for the older version becomes irrelevant. Another reason is to reduce the amount of data to process. I think we may still satisfy both those requirements if we only include the data for a sproc which is older than N-2 revisions of API if the following conditions are true:
In other words, if the sproc A exists in R8, R9, R10, R11 and R12 APIs, the latest one in the database is R12 and it was called with R8 and R10 versions only during the last week, then we do include data for R10 only, because there were no calls for R11 and R12.
If, on the other hand, the same sproc gets called with R11 and R12, then only those 2 revisions are included, discarding the other ones.
Hi there,
I tried to use the tool today.
It was not possible for me, to get it running properly. At least for the python part I was missing documentation, how to get it running... Can you provide me some more Informations?
Currently a graph mixing both results is created without a warning
the set up instructions are a little sparse.
Some additional steps i had to do -
$ easy_install cherrypy
$ easy_install Jinja2
$ easy_install argparse
Some additional things that could be included:
Additionally the setup requires directly inserting rows into the database. I think it would be worthwhile to make a simple install script to initialize the data.
currently when the sproc has no parameters there's no way to understand which in which schema the sproc was called
The command gathering the Maven version in build.sh should read
MAVEN_VERSION=$(mvn --version | grep -o '3.' | head -n 1)
The original (with head first and then grep) writes "3.\n3." into the variable which is different from "3."
When the public schema procedures drops In SprocGatherer#getQuery?
I have some procedures in public schema and i need to gather their statistics, Can i include public schema in sql query If there is not side effect?
I know there is a sample one in the gatherer directory, but I'm not sure about the vagrant build process:
==> default: ldconfig deferred processing now taking place
==> default: ---> a4b7b19c1c01
==> default: Removing intermediate container 1f5d7dcf1b93
==> default: Step 2 : RUN mkdir -p /app
==> default: ---> Running in 430f78046c53
==> default: ---> 2d56fb5b4e81
==> default: Removing intermediate container 430f78046c53
==> default: Step 3 : COPY pgobserver.yaml /app/pgobserver.yaml
==> default: pgobserver.yaml: no such file or directory
during the build I have some problem: [INFO] 13 errors
some packages cannot be found!
I have a compilation errors, can someone help me?
tanks
In the log files in some cases the current database is not logged.
For instance:
SEVERE: Error during Load gathering java.sql.SQLException: Timed out waiting for a free available connection. at com.jolbox.bonecp.DefaultConnectionStrategy.getConnectionInternal(DefaultConnectionStrategy.java:88)
does not allow to check the server side.
If zoomed in to plot recompute top sprocs for that timeframe
According to the code frequency it looks that this project does not get developed any more:
https://github.com/zalando/PGObserver/graphs/code-frequency
Please tell the users the state of the project in the README.
Is there are new project which I should use instead of this project?
During the failover of the database being monitored we observed that PGObserver resets its sp_calls (total number of calls) metrics to 0, confusing our reporting script. This is a bug, it should instead continue from the previously collected values and, possibly, forgot writing new data while those measurements are not available.
Should be also possible to configure if one wants to see it or not.
In addition to min/max timestamps about gathered data existence we could also show when actually last IUD+call counts changed
It seems like the frontend has trouble reading some queries from the pg_stat_statements
table:
500 Internal Server Error
...
perf_stat_statements.html, line 84, in block "body"
<tr class="hiddenrow" style="display:none" ><td colspan="8"><textarea readonly>{{ d['query'] }}</textarea></td>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 210: ordinal not in range(128)
This row seems to cause trouble in perf_stat_statements.html:
<tr class="hiddenrow" style="display:none" ><td colspan="8"><textarea readonly>{{ d['query'] }}</textarea></td>
Same problem with the perf_stat_statements_detailed.html by the way:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 210: ordinal not in range(128)
Hopefully you know how to fix it?
EDIT:
I guess I found the query which causes the problem:
UPDATE afg SET status = status_in, updated_at = now(), display = display_in, urls = urls_in, type = type_in, headline = headline_in, description_1 = description_1_in, description_2 = description_2_in, destination_url = destination (...)
Guess the (...)
is the problem? Cause when I delete
<tr class="hiddenrow" style="display:none" ><td colspan="8"><textarea readonly>{{ d['query'] }}</textarea></td>
from perf_stat_statements.html, the /perfstatstatements loads fine. And the show graph url loads perfectly fine for all queries, except for the queries which start with
UPDATE afg SET status = status_in, updated_at = now(),
So I guess the (...)
is causing problems.
Trying to setup PGObserver on Fedora 25 I run into several problems.
Therefore I wonder, is there a all-in-one 'one-click' PGObserver Docker image planned or already existing?
If not would it be something you would like to have or do you think its not a good idea - if I would create one?
Best,
Oliver
where is the pgobserver.conf , can't find it.
or how preparing configuration files for gatherer and frontend.
Please add. Thanks!
Hi,
I'm trying to install PGObserver. I have cloned the project from git. However, I cannot find the pgobserver.conf file. I'm referring to the third line in the setup guide:
Copy pgobserver.conf to the home folder: ~/.pgobserver.conf
Running find etc on the PGObserver directory didn't return anything useful.
Thanks and best regards,
Andreas
We're using pgObserver to monitor our postgresql instances. It's really nice and has provided us with an easy way to view database activity.
As best as I can tell from looking at the gatherer code there is currently no collection of index statistics. It would be really useful if information about index use and bloat was available.
We're interested in the data that is available from postgres as described here:
https://wiki.postgresql.org/wiki/Index_Maintenance
http://www.postgresql.org/docs/current/static/monitoring-stats.html#MONITORING-STATS-SETUP
Is this feature planned?
Thanks for making pgObserver!
It's not very clear to me how to measure the sproc statistics.
So far I know I've got these 3 tables:
sproc_performance_data
sproc_schemas_monitoring_configuration
sprocs
This is my sproc_schemas_monitoring_configuration
table:
Do I need to insert rows in those other 2 tables? Or should this be enough? Or should the scmc_host_id be equal to the host id in the hosts table, which is monitored?
e.g. you want to see the last 2 weeks graph but only 1 weeks data is available
Vertical scale's minimum precision is 0.00s currently which is not enough in some cases
While available via url, add search form somewhere
Should give a nice size reduction as data is naturally ordered and insert only
We had an index that was shown as 0,3 bloat ratio (meaning 0,3%) and 0 bytes to win by the PGObserver estimator, while the estimate at https://github.com/ioguix/pgsql-bloat-estimation/blob/master/btree/btree_bloat.sql shown it as 50% bloated.
After creating the new index with the same parameters it was indeed shown to be smaller than the original one. (the original one was 18GBs, and the replacement took only 7GB). Therefore, it looks like the bloat estimator needs to be re-examined.
Currently when load gathering fails (e.g. there is no plpython or zz_utils.get_load_average sproc) then also WAL activity info will be missing
Hi!!
I'm trying to start the frontend via run.sh and i get this error
Traceback (most recent call last):
File "web.py", line 88, in
main()
File "web.py", line 76, in main
root.report = report.Report()
AttributeError: 'NoneType' object has no attribute 'report'
You guys can help?
Thanks!
Currently a re-start of frontend is required
especially io load
Hi,
I have a problem in Pg_stat_statements report: I have nothing data in Host, if I change url with one of my "host_ui_shortname" the data are displayed correct.
It is strange that only in this page my hosts are not loaded,
Could you please give me some information for resolve this problem?
Thanks
luca
In the web interface, the stored procedure timings are not being collected. Is there an extension that needs to be installed? Am I missing any steps?
The only step I have not done is create the CPU load scripts, since I do not need it.
At the moment we have multiple databases in one cluster. We have even multiple clusters on one host.
If we want to use the Performance Analyses views we can only select hosts. If you have configured multiple databases on one cluster and one host you have multiple hosts in the selection box but no way to view what database you are actually on.
It would be really helpful to use the UI Longname or UI Shortname in the menu to view the performance-stuff, in stead off the hostname. That way you can view all the other databases on the same host or cluster as well.
Hi, I've got a problem with logging pg_stat_statements ... I've got the module enabled in database (view pg_stat_statements exists in Schemas->public->Views) and there's of course some data. But when I try to access it on frontend, it doesn't show any statistics ... Also host selection is empty (see the picture).
Error log writes only this: ERROR: function zz_utils.get_load_average() does not exist
, but I think it's not related to my problem. Any help would be appreciated :)
While trying to deploy the the frontend, I'll receive following error:
Exception: Invalid S3 url: https://s3.eu-central-1.amazonaws.com/laas-config/config.yaml
Most likely, the reason is the regex to check for valid S3 URLs in https://github.com/zalando/PGObserver/blob/149e01867d0be82acfe4fdf184782a53fe4a4c05/frontend/src/aws_s3_configreader.py:
m = re.match('https?://s3-(.*)\.amazonaws.com/(.*)', s3_url)
if not m:
raise Exception('Invalid S3 url: {}'.format(s3_url))
This regex works for all S3 URLs in eu-west-1 where at least our links look as follows:
https://s3-eu-west-1.amazonaws.com/zalando-stups-mint-123456-eu-west-1/app-credentials/client.json
However, in eu-central-1, the links seem to have a slightly different structure:
https://s3.eu-central-1.amazonaws.com/zalando-stups-mint-123456-eu-central-1/pgobserverfrontend/client.json
Hi there,
I have problems to displayed all tables with Top 10 SProcs....... On the same page, Load Average 15min Sprocs shows a graph. I have already configured the pg_stat_statement module in my cluster.
shared_preload_libraries = 'pg_stat_statements' # (change requires restart) pg_stat_statements.max = 10000 pg_stat_statements.track = all
With this enabled, i am able to do a
select * from pg_stat_statement
when connected to my database in psql and get the expected result, but not in PGObserver web frontend.
I get an warning message in pgmon_java_*****
Nov 24, 2016 2:23:25 PM de.zalando.pgobserver.gatherer.SprocGatherer getSchemasToBeMonitored WARNING: NOT nspname LIKE ANY (array['public','pg\_%','information_schema','tmp%','temp%']::text[]) AND nspname LIKE ANY (array['%']::text[]) Nov 24, 2016 2:23:25 PM de.zalando.pgobserver.gatherer.SprocGatherer gatherData
Are there some configurations of postgresql that i am missing?
Could you please give me some more advice.
Thanks
Matthias
There is a Typo in the schema.sql Script, Line 120.
It is "host_grather_group" and should be "host_gather_group"
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.