Comments (17)
Can you register a custom typecaster for the text type, to return data.decode(conn.encoding, 'replace')
?
from pg_activity.
As discuss, let's:
- add the
register_type()
only for the cursors that need it; - keep the data in bytes until we display them to avoid doing work on stuff we don't need.
from pg_activity.
Thanks @dvarrazzo, I didn't see your post before posting mine.
I'll check with @dlax.
from pg_activity.
Can you register a custom typecaster for the text type, to return
data.decode(conn.encoding, 'replace')
?
As far as I understand, the problem is that the encoding we need to decode the query field from pg_stat_activity
is probably not the connection encoding in general; rather it's the encoding of the database the query is running in.
from pg_activity.
Ah sorry: creating a specific typecaster doesn't work in psycopg2 because the decode is done internally, before passing the data to the Python function, which receive a str
. This comes from the fact that psycopg2
is pretty much totally text-based. In psycopg 3 it can be done, because it's binary-based and the dumpers receive bytes, but let's first see if it's useful.
If you only need to survive the accident of receiving a badly encoded char, getting bytea from the query seems enough:
# Conn 1, in bad encoding
conn = psycopg2.connect("dbname=latin1 client_encoding=sql_ascii")
cur = conn.cursor()
cur.execute("select 'à'".encode("latin1"))
# Conn 2 in utf8
...
cur.execute("select query::bytea from pg_stat_activity where datname = 'latin1'");
In [20]: cur.fetchall()
Out[20]: [(<memory at 0x7fb89ed57dc0>,)]
In [21]: bytes(_20[0][0])
Out[21]: b"select '\xe0', pg_sleep(10)"
In [22]: bytes(_20[0][0]).decode("utf8", "replace")
Out[22]: "select '�', pg_sleep(10)"
do you need to know more precisely the content of the query?
from pg_activity.
I think we would still hit problem number 3 (which we unfortunately couldn't reproduce)
from pg_activity.
Yes, problem n°3 being query::bytea
producing invalid_text_representation: invalid input syntax for type bytea
on postgres side.
But, indeed, it might be worth moving to psycopg 3 while working on this issue.
from pg_activity.
Is it widely available or is it only available on recent distributions ?
from pg_activity.
In psycopg3 the problem can be solved easily, because the decoding step is in the loader:
In [1]: import psycopg
In [2]: conn1 = psycopg.connect("dbname=latin1 client_encoding=sql_ascii")
In [3]: conn1.execute("select 'à'".encode("latin1"))
Out[3]: <psycopg.Cursor [TUPLES_OK] [INTRANS] (user=piro database=latin1) at 0x7f378203b740>
In [4]: conn = psycopg.connect()
In [5]: conn.execute("select query from pg_stat_activity where datname = 'latin1'").fetchone()[0]
Traceback (most recent call last)
...
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe0 in position 8: invalid continuation byte
In [6]: class SafeText(psycopg.types.string.TextLoader):
...: def load(self, data):
...: try:
...: return super().load(data)
...: except Exception:
...: return bytes(data).decode(self._encoding, "replace")
...:
In [7]: conn.adapters.register_loader("text", SafeText)
In [8]: conn.execute("select query from pg_stat_activity where datname = 'latin1'").fetchone()[0]
Out[8]: "select '�'"
you can even use extra knowledge about the encoding of the database you are reading the query from, and use that one in the custom loader, instead of self._encoding
.
from pg_activity.
(rather than the above, subclassing the built-in TextLoader
, it would be wiser to write a Loader instance from scratch, starting by copying TextLoader
from the source code, because _encoding
is not part of the public interface).
from pg_activity.
from pg_activity.
Is pg_activity
packaged in distributions? Other programs are starting to be based on it (pgadmin, pgcli...), so I'm expecting it to land in distros soon.
from pg_activity.
Yes, pg_activity
packaged either directly in most distributions or through PGDG repository (which many postgres-people use obviously) and so recent versions are even available, through backports, in most stable or LTS distributions.
Psycopg 3 is packaged by Debian https://tracker.debian.org/pkg/psycopg3 1, but not yet available in stable. For RPM-based distributions, it's packaged by PGDG so that would work for us.
In pg_activity
I think we cannot really move faster than distributions... unless our users accept to stay on an old version which is not quite our policy.
Footnotes
-
strangely, it's named psycopg3! ↩
from pg_activity.
strangely, it's named psycopg3!
ahah, I forgot https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1016031
from pg_activity.
So, if psycopg 3 was packaged in apt pgdg too, pg_activity would be mostly covered, right?
from pg_activity.
from pg_activity.
So, the approach I'm taking to hopefully fix these issues is the following:
- load
TEXT
values asbytes
whenever there's aquery
column frompg_stat_activity
(or similar) involved - also load database encoding (not the one we're connected to, the one the query is running in)
- decode
bytes
values when loading each row with this database encoding (using a row factory, as this cannot be done with a psycopgLoader
-- as far as I understand -- because we need the whole row, not a single value) - use
.decode(..., errors="replace")
to prevent crash
from pg_activity.
Related Issues (20)
- ZeroDivisionError when server has no swap HOT 1
- [Rocky8] [Rocky9] Can't install because of python3.9 dependencies HOT 4
- Help for installation pg_activity 3.0.1 HOT 12
- packaged pg-activity not running on ubuntu 20.04 HOT 4
- AttributeError: module 'humanize' has no attribute 'precisedelta' HOT 2
- ModuleNotFoundError: No module named '_curses' HOT 1
- Ask for new release, including pid bug HOT 4
- 3.1.0 issue: AttributeError: 'NoneType' object has no attribute 'encode' HOT 4
- 3.1.1: TypeError: cannot use a string pattern on a bytes-like object HOT 17
- psycopg2.errors.InvalidTextRepresentation: invalid input syntax for type bytea HOT 3
- Weird display issues inside GNU screen HOT 8
- erorr attrs!=21.1,>=17 on centos8 HOT 1
- The 'client' column only shows the first 16 characters of IPv6 addresses. HOT 2
- Any plans to allow for running pg_activity remotely with superuser privileges? HOT 2
- AttributeError: module 'attr.validators' has no attribute 'gt' HOT 1
- Dependency errors when updating RHEL 8 HOT 1
- Document how to run tests locally HOT 3
- psycopg2.errors.InvalidTextRepresentation: invalid input syntax for type bytea HOT 1
- ValueError after connected HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pg_activity.