Giter Club home page Giter Club logo

Comments (17)

dvarrazzo avatar dvarrazzo commented on June 12, 2024

Can you register a custom typecaster for the text type, to return data.decode(conn.encoding, 'replace')?

from pg_activity.

blogh avatar blogh commented on June 12, 2024

As discuss, let's:

  • add the register_type() only for the cursors that need it;
  • keep the data in bytes until we display them to avoid doing work on stuff we don't need.

from pg_activity.

blogh avatar blogh commented on June 12, 2024

Thanks @dvarrazzo, I didn't see your post before posting mine.
I'll check with @dlax.

from pg_activity.

dlax avatar dlax commented on June 12, 2024

Can you register a custom typecaster for the text type, to return data.decode(conn.encoding, 'replace')?

As far as I understand, the problem is that the encoding we need to decode the query field from pg_stat_activity is probably not the connection encoding in general; rather it's the encoding of the database the query is running in.

from pg_activity.

dvarrazzo avatar dvarrazzo commented on June 12, 2024

Ah sorry: creating a specific typecaster doesn't work in psycopg2 because the decode is done internally, before passing the data to the Python function, which receive a str. This comes from the fact that psycopg2 is pretty much totally text-based. In psycopg 3 it can be done, because it's binary-based and the dumpers receive bytes, but let's first see if it's useful.

If you only need to survive the accident of receiving a badly encoded char, getting bytea from the query seems enough:

# Conn 1, in bad encoding

conn = psycopg2.connect("dbname=latin1 client_encoding=sql_ascii")
cur = conn.cursor()
cur.execute("select 'à'".encode("latin1"))
# Conn 2 in utf8
...
cur.execute("select query::bytea from pg_stat_activity where datname = 'latin1'");

In [20]: cur.fetchall()
Out[20]: [(<memory at 0x7fb89ed57dc0>,)]

In [21]: bytes(_20[0][0])
Out[21]: b"select '\xe0', pg_sleep(10)"

In [22]: bytes(_20[0][0]).decode("utf8", "replace")
Out[22]: "select '�', pg_sleep(10)"

do you need to know more precisely the content of the query?

from pg_activity.

blogh avatar blogh commented on June 12, 2024

I think we would still hit problem number 3 (which we unfortunately couldn't reproduce)

from pg_activity.

dlax avatar dlax commented on June 12, 2024

Yes, problem n°3 being query::bytea producing invalid_text_representation: invalid input syntax for type bytea on postgres side.

But, indeed, it might be worth moving to psycopg 3 while working on this issue.

from pg_activity.

blogh avatar blogh commented on June 12, 2024

Is it widely available or is it only available on recent distributions ?

from pg_activity.

dvarrazzo avatar dvarrazzo commented on June 12, 2024

In psycopg3 the problem can be solved easily, because the decoding step is in the loader:

In [1]: import psycopg

In [2]: conn1 = psycopg.connect("dbname=latin1 client_encoding=sql_ascii")

In [3]: conn1.execute("select 'à'".encode("latin1"))
Out[3]: <psycopg.Cursor [TUPLES_OK] [INTRANS] (user=piro database=latin1) at 0x7f378203b740>

In [4]: conn = psycopg.connect()

In [5]: conn.execute("select query from pg_stat_activity where datname = 'latin1'").fetchone()[0]
Traceback (most recent call last)
...
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe0 in position 8: invalid continuation byte

In [6]: class SafeText(psycopg.types.string.TextLoader):
   ...:     def load(self, data):
   ...:         try:
   ...:             return super().load(data)
   ...:         except Exception:
   ...:             return bytes(data).decode(self._encoding, "replace")
   ...: 

In [7]: conn.adapters.register_loader("text", SafeText)

In [8]: conn.execute("select query from pg_stat_activity where datname = 'latin1'").fetchone()[0]
Out[8]: "select '�'"

you can even use extra knowledge about the encoding of the database you are reading the query from, and use that one in the custom loader, instead of self._encoding.

from pg_activity.

dvarrazzo avatar dvarrazzo commented on June 12, 2024

(rather than the above, subclassing the built-in TextLoader, it would be wiser to write a Loader instance from scratch, starting by copying TextLoader from the source code, because _encoding is not part of the public interface).

from pg_activity.

dlax avatar dlax commented on June 12, 2024

from pg_activity.

dvarrazzo avatar dvarrazzo commented on June 12, 2024

Is pg_activity packaged in distributions? Other programs are starting to be based on it (pgadmin, pgcli...), so I'm expecting it to land in distros soon.

from pg_activity.

dlax avatar dlax commented on June 12, 2024

Yes, pg_activity packaged either directly in most distributions or through PGDG repository (which many postgres-people use obviously) and so recent versions are even available, through backports, in most stable or LTS distributions.

Psycopg 3 is packaged by Debian https://tracker.debian.org/pkg/psycopg3 1, but not yet available in stable. For RPM-based distributions, it's packaged by PGDG so that would work for us.

In pg_activity I think we cannot really move faster than distributions... unless our users accept to stay on an old version which is not quite our policy.

Footnotes

  1. strangely, it's named psycopg3!

from pg_activity.

dlax avatar dlax commented on June 12, 2024

strangely, it's named psycopg3!

ahah, I forgot https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1016031

from pg_activity.

dvarrazzo avatar dvarrazzo commented on June 12, 2024

So, if psycopg 3 was packaged in apt pgdg too, pg_activity would be mostly covered, right?

from pg_activity.

dlax avatar dlax commented on June 12, 2024

from pg_activity.

dlax avatar dlax commented on June 12, 2024

So, the approach I'm taking to hopefully fix these issues is the following:

  • load TEXT values as bytes whenever there's a query column from pg_stat_activity (or similar) involved
  • also load database encoding (not the one we're connected to, the one the query is running in)
  • decode bytes values when loading each row with this database encoding (using a row factory, as this cannot be done with a psycopg Loader -- as far as I understand -- because we need the whole row, not a single value)
  • use .decode(..., errors="replace") to prevent crash

from pg_activity.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.