Giter Club home page Giter Club logo

Comments (7)

jmafc avatar jmafc commented on August 17, 2024

It seems odd that psql would accept the input file since the third column values are missing. Have you tried adding a 0 at the end of each line to test?
@dvarrazzo Sorry to bother you again, but if SQL copy (I realize it's a Postgres invention) is supposed to accept missing data values (as it apparently does) and supply a default (I'm not sure it's a NULL or a 0 that is being injected), then it seems that psycopg copy_from() would have to abide by this (weird IMHO) rule.

from pyrseas.

jdearing-neudesic avatar jdearing-neudesic commented on August 17, 2024

The docks for the COPY STATEMENT which I fully acknowledge is distincy from the \copy instruction of the psql executable claims that the default null is "an unquoted empty string in CSV format". Therefor since the CSV in question has a trailing comma on each line, unlike the other CSVs in my project which don't end in a comma, I can conclude that at least by the standards of the server side COPY statement that CSV has encoded a NULL value for parentId column.

from pyrseas.

tbussmann avatar tbussmann commented on August 17, 2024

To track this down:

Docs on \copy:

The syntax of this command is similar to that of the SQL COPY command. All options other than the data source/destination are as specified for COPY.

Docs on COPY:

COPY table_name [ ( column_name [, ...] ) ]
    FROM { 'filename' | PROGRAM 'command' | STDIN }
    [ [ WITH ] ( option [, ...] ) ]
    [ WHERE condition ]

FORMAT
Selects the data format to be read or written: text, csv (Comma Separated Values), or binary. The default is text.
DELIMITER
Specifies the character that separates columns within each row (line) of the file. The default is a tab character in text format, a comma in CSV format. This must be a single one-byte character. This option is not allowed when using binary format.
NULL
Specifies the string that represents a null value. The default is \N (backslash-N) in text format, and an unquoted empty string in CSV format. You might prefer an empty string even in text format for cases where you don't want to distinguish nulls from empty strings. This option is not allowed when using binary format.

So everything seems to depend on the FORMAT parameter. Let's see how that is set:

# expected format: (\copy, table, from, path, csv)
db.dbconn.copy_from(stmt[3], stmt[1])
here we pass only two parameters (likely path and table name) to pgdbconn's copy_from. There, the separator is set to it's default , and passed to psycopg2's curs_copy_from. This adds a default value to NULL and constructs a COPY SQL command like COPY %s%s FROM stdin WITH DELIMITER AS %s NULL AS %s

To conclude: DELIMITER is fixed set to pgdbconn's default ,, NULL is fixed set to psycopg2's default \N. In the end we are doing a simple text import which is the Postgres default for FORMAT which is not set (and not setable with the given abstraction layers) by perseas/psycopg2. Let's try to find out why:

Remember the syntax of that COPY command constructed? This doesn't look like what we have leant from the docs in the beginning. So read them again to find:

The following syntax was used before PostgreSQL version 9.0 and is still supported:

COPY table_name [ ( column [, ...] ) ]
    FROM { 'filename' | STDIN }
    [ [ WITH ]
          [ BINARY ]
          [ OIDS ]
          [ DELIMITER [ AS ] 'delimiter' ]
          [ NULL [ AS ] 'null string' ]
          [ CSV [ HEADER ]
                [ QUOTE [ AS ] 'quote' ]
                [ ESCAPE [ AS ] 'escape' ]
                [ FORCE NOT NULL column [, ...] ] ] ]

Ha! FORMAT was not available back then!

from pyrseas.

jmafc avatar jmafc commented on August 17, 2024

Thanks for the detailed trace, Tobias. Aside: your postgresql.org links use latest in the path instead of current.
Although FORMAT may not have been available in 8.4 or earlier, CSV as a standalone keyword was supported (in fact, \copy even in PG 13 still accepts csv without a preceding format). In any case, going back to the commit a3fc1bc which implemented support for -u in processing \copy statements, it seems I had intended to add csv as an argument but I dropped the ball somewhere. Adding it now, would mean messing with pgdbconn, which IMO is another reason to reintegrate that code back into pyrseas.

from pyrseas.

tbussmann avatar tbussmann commented on August 17, 2024

Aside: your postgresql.org links use latest in the path instead of current.

thanks, fixed. Obviously a lack of QA ;)

Although FORMAT may not have been available in 8.4 or earlier, CSV as a standalone keyword was supported

right. And that does change NULL defaults, just like FORMAT CSV does. But the current defaults are supplied by the wrappers, not PostgreSQL itself.

Adding it now, would mean messing with pgdbconn, which IMO is another reason to reintegrate that code back into pyrseas.

actually - if I understand the code correctly - skipping pgdbconn would still not allow to specify 'CSV' as psycopg does not support this parameter at all. But it would be possible to pass an empty string for the 'null' parameter of it's curs_copy_from to mimic that behaviour.

from pyrseas.

jmafc avatar jmafc commented on August 17, 2024

Thanks Tobias for the clarification. I see now that psycopg2 copy_from and copy_to do not support the csv parameter in any way, only copy_expert does.
@jdearing-neudesic I'm afraid that at this point, the only thing we can do is document that if you're going to use yamltodb -u to load some tables, columns with SQL NULLs will have to be encoded as \N. Implementing a workaound with copy_expert seems to me as a step backward for very little gain, and adding an empty string in our code would only fix your situation but would break anyone using the standard default for NULLs, even those using dbtoyaml to export the data in the first place.

from pyrseas.

dvarrazzo avatar dvarrazzo commented on August 17, 2024

@dvarrazzo Sorry to bother you again, but if SQL copy (I realize it's a Postgres invention) is supposed to accept missing data values (as it apparently does) and supply a default (I'm not sure it's a NULL or a 0 that is being injected), then it seems that psycopg copy_from() would have to abide by this (weird IMHO) rule.

Hello,

copy_from and copy_to were a bad idea, for the same reason that there aren't cursor.select() or a cursor.insert() methods each one with a million of parameters and a few hundreds missing because they were introduced in a later Postgres release. After acknowledging this design error, the maintainer at the time added copy_expert(), which is the only thing that makes sense, because it accommodates any future Postgres feature and, like execute(), doesn't assume knowledge of the statement to send, but only of its protocol.

So no, there is no intention to support other parameters in copy_from(): the method is dropped from psycopg3, which only supports a cursor.copy() method (which supports not only file-like objects but also record streams and adaptations like normal queries).

The psycopg2.sql module, and its psycopg3.sql counterpart in the next release, allow to generate syntactically valid queries escaping table and field names correctly and allowing the user to use whatever feature is supported by the server they are talking to.

from pyrseas.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.