Giter Club home page Giter Club logo

Comments (8)

jmafc avatar jmafc commented on July 16, 2024

Indeed, I'm afraid all my tests with indexes were done with traditional indexes, i.e., not using a function. I'll investigate how difficult it'll be to fix that.

from pyrseas.

rhunwicks avatar rhunwicks commented on July 16, 2024

I think this might still be a problem if there is an function-based index on more than one column. See the dbtoyaml output for index pty_i_last_name_first_name below, which includes a columns section as well as the expression:

Postgresql definition:

CREATE TABLE ipc_owner.sys_parties
(
  pty_id bigint NOT NULL,
  organization_name character varying(80),
  last_name character varying(40),
  first_name character varying(40),
  other_names character varying(40),
  CONSTRAINT sys_parties_pkey PRIMARY KEY (pty_id )
)
WITH (
  OIDS=FALSE
);
CREATE INDEX pty_i_last_name_first_name
  ON ipc_owner.sys_parties
  USING btree
  (lower(last_name::text) COLLATE pg_catalog."default" , lower(first_name::text) COLLATE pg_catalog."default" , lower(other_names::text) COLLATE pg_catalog."default" );

CREATE INDEX pty_i_organization_name
  ON ipc_owner.sys_parties
  USING btree
  (lower(organization_name::text) COLLATE pg_catalog."default" );

ddbtoyaml output:

  table sys_parties:
    columns:
    - pty_id:
        not_null: true
        type: bigint
    - organization_name:
        type: character varying(80)
    - last_name:
        type: character varying(40)
    - first_name:
        type: character varying(40)
    - other_names:
        type: character varying(40)
    indexes:
      pty_i_last_name_first_name:
        access_method: btree
        columns:
        - other_names)::text)
        expression: lower((last_name)::text), lower((first_name)::text), lower((other_names)::text)
      pty_i_organization_name:
        access_method: btree
        expression: lower((organization_name)::text)
    primary_key:
      sys_parties_pkey:
        access_method: btree
        columns:
        - pty_id

from pyrseas.

rhunwicks avatar rhunwicks commented on July 16, 2024

The yaml I am expecting, and which seems to generate the correct index going in the other direction is:

    indexes:
      pty_i_last_name_first_name:
        access_method: btree
        expression: lower(last_name::text), lower(first_name::text), lower(other_names::text)

It also gets confused when there is more than column, but only one of them is function-based:

Actual SQL:

CREATE INDEX ead_i_address
  ON ipc_owner.sys_electronic_addresses
  USING btree
  (lower(address::text) COLLATE pg_catalog."default" , ead_type COLLATE pg_catalog."default" );

Expected yaml

    indexes:
      ead_i_address:
        access_method: btree
        expression: lower((address)::text), ead_type

Actual yaml

    indexes:
      ead_i_address:
        access_method: btree
        columns:
        - address)::text)
        - ead_type
        expression: lower((address)::text)

from pyrseas.

jmafc avatar jmafc commented on July 16, 2024

You're right. I've created two new test cases, one for expressions over multiple columns and another for "mixed" indexes (regular column plus function). I think we are getting all the information, that is we fetch pg_index.indkey, which has zeros for the parts that are actually expresssions, and pg_get_expr(indexprs, indrelid), which gives the representation of the expressions. I have to tweak how we deal with the info.

from pyrseas.

jmafc avatar jmafc commented on July 16, 2024

There are several ways we could show the info in YAML. Say we have a table with c1 int, c2 text, c3 text and for whatever reason we want to create an index on (lower(c2), c1, upper(c3). An option would be to show this as:

    indexes:
      idx:
        keys:
        - lower(c2):
            type:
              expression
        - c1
        - upper(c3):
             type:
               expression

We need to specify "type: expression" to read it back in (since with delimited identifiers someone could actually created a column named literally "lower(c2)", as crazy as that sounds). The second key could also have a "type: column" specification which of course would be the default. The downside is this is incompatible with the current output from dbtoyaml or the input expected by yamltodb. So we could accept "columns:" as substitute for "keys:".

The main alternative would be to use (similar to your example):

    indexes:
      idx:
        expression: (lower(c2), c1, upper(c3)

The downside of this that we'd have to parse and make sense of that expression (and it could get hairy with collations in 9.1+). The former format could be extended/modified to allow something like:

  keys:
    - lower(c2):
        type:
          expression
        collation:
          french

Comments, suggestions?

from pyrseas.

rhunwicks avatar rhunwicks commented on July 16, 2024

I like your first example, using keys with the ability for each key to be either a column or an expression.

Allowing columns as an alias for keys will ensure backwards compatibility for anyone using regular column-based indexes.

For people like me who are using function-based indexes it will be straightforward to use dbtoyaml to extract the correct syntax for the new version and then diff/meld/eclipse/etc. to compare the extract to the schema yaml file and update it to the new format.

from pyrseas.

jmafc avatar jmafc commented on July 16, 2024

This looks like a tough nut to crack. pg_index has three (and in 9.1 four) int2vector/oid2vector columns, which have to be split and analyzed. In SQL, we can do that with unnest(), but we'll fetch multiple rows. In Python, we get a string, e.g., "0 1 0", which is fairly easy to split. pg_index also has an indexprs column (text before 9.1, pg_node_tree in 9.1), which can be interpreted with pg_get_expr(), but what you get for my example above is "lower(c2), upper(c3)" (the regular column is missing). Although this particular case is easy to split in Python, with str.split(', '), if one of the functions has more than one argument separated by a comma (like btrim()), that technique will fail, which would then require more careful parsing. And then there is indoption whose individual int2 components are actually flag bits "defined by the index's access method". I had tried parsing the output of pg_get_indexdef(), but wasn't very successful (that's why you saw partial column specs like "address)::text)". Outputting the complete final parenthesized expression from pg_get_indexdef() would obviate all these issues for dbtoyaml, but may affect yamltodb. In any case, I think the broken down representation is preferable, so I'll try to work along those lines.

from pyrseas.

jmafc avatar jmafc commented on July 16, 2024

Made some progress on this. Two unit tests. First takes following statements:

CREATE TABLE t1 (c1 integer, c2 text, c3 text);
CREATE INDEX t1_idx ON t1 (lower(c2), lower(c3));

and outputs the following for the index:

indexes:
  t1_idx:
    access_method: btree
    keys:
    - lower(c2):
        type: expression
    - lower(c3):
        type: expression

Second test uses:

CREATE TABLE t1 (c1 integer, c2 text, c3 text);
CREATE INDEX t1_idx ON t1 (btrim(c3, 'x') NULLS FIRST, c1, lower(c2) DESC);

and emits the following for the index:

indexes:
  t1_idx:
    access_method: btree
    keys:
    - btrim(c3, 'x'::text):
        nulls: first
        type: expression
    - c1
    - lower(c2):
        order: desc
        type: expression

I'm not entirely happy with parsing the expressions, but at least it appears to do the job. If you have some really hairy examples, please let me know. I'll work on the YAML to SQL side now.

from pyrseas.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.