Currently database schema serialization format consist of single YAML file. The proble

Change <a class="commit-link" data-hovercard-type="commit" data-hovercard-url="https:/

database schema serialization format about pyrseas HOT 12 CLOSED

perseas commented on September 17, 2024

database schema serialization format

from pyrseas.

Comments (12)

jmafc commented on September 17, 2024

Agreed. Even before I created dbtoyaml, I preferred the "single object" model for storing, e.g., table definitions, in a VCS. You can do that, manually, by using "dbtoyaml -t" repeatedly, but

it is cumbersome when you have many tables, etc., (even if you script it, you could inadvertantly omit some), and
the individual tablename.yaml files can't be simply concatenated for processing by yamltodb without pre-processing to remove redundant "schema xxx" lines.

In addition, I believe programmers and DBAs would prefer to store the textual definition of views and function source as separate objects (non-redundantly) in a nicely indented format rather the output from pyyaml. And yes, the serialization enhacement should be optional.

Thinking out loud, perhaps this could be prototyped with a script that serialized the "all in one" format to separate files, and a converse tool to merge them back for input to yamltodb.

from pyrseas.

andreypopp commented on September 17, 2024

In addition, I believe programmers and DBAs would prefer to store the textual definition of views and function source as separate objects (non-redundantly) in a nicely indented format rather the output from pyyaml. And yes, the serialization enhacement should be optional.

Exactly! Same holds for tables — storing them in plain DDL would be nice too (this is subject for another ticket, but anyway... that would require additional effort for building SQL/DDL parser and maintaining its compatibility with original PostgreSQL's parser — maybe it would be nice to try to include Pyrseas as PostgresSQL's contrib module, what do you think?).

from pyrseas.

jmafc commented on September 17, 2024

I'm not keen on storing the SQL DDL for tables and particularly on parsing SQL, which is an inferior language IMHO (I prefer D, aka Tutorial D). Apgdiff already provides the capability of storing pg_dump output but it's hard to provide full coverage of PG features that way. As to contrib, Pyrseas is already in PGXN. I had started an attempt to embed Pyrseas (dbobject) code as a PL/Pythonu module(s) but didn't get too far.

from pyrseas.

andreypopp commented on September 17, 2024

I've done some experimental work on this, see commit 40698292cc5bc6793a545322b7268bf83ec2b43e:

View.definition and Function.source are now wrapped in pyrseas.yaml.MultiLineStr to be serialized using | style. That would make serialized definitions more readable.
dbtoyaml and yamltodb commands can now serialize/deserialize schema to/from repository — directory with schema.yaml and separate YAML file for each view with filename in form of schemaname.viewname.yaml.

I found it a bit difficult to store views as plain SQL files because COMMENT statement should be stored separately of CREATE VIEW... statement — that would require minimal parsing (actually just split two statements by ;), but if we do the same with functions — that would not be so easy. I think we could stick to some conventions like — "COMMENT statement should be the last one in file" or something like this, though I'm not a big fan of those non-intuitive implementation-dependent things.

Regarding functions — I also would like to store them in separate files, but besides all points above there is another difficulty — how to store functions with the same name but different arguments' types — a. store them in single file or b. store them separately but how we should encode filenames then?

Given all these questions I still found that storing database schema in multiple files feels more natural.

from pyrseas.

jmafc commented on September 17, 2024

Very interesting. It'll take me a while to digest it and see how it all works out. With regard to functions, if separate files are used, the names would have to follow the keylist attribute of class Proc, i.e., schema (name), (function) name, and arguments. For example, given public functions f(int, int) and f(int, text), the filenames could be public.f.integer_integer.yaml and public.f.integer_text.yaml. There are issues of course: f() stored as f..yaml? how to deal with data types with embedded spaces?

from pyrseas.

andreypopp commented on September 17, 2024

Just after thinking a bit more about functions — I think functions with the same name are closely related to each other so it might be a good idea to store them in single file.

from pyrseas.

jmafc commented on September 17, 2024

I cloned your repo to try things out. I'm not sure I understand what you did in pyrseas/yaml.py: importing yaml in a module named 'yaml' looks like looking for trouble. In any case, when I try to run dbtoyaml, I see:

$ export PYTHONPATH=$PWD
$ pyrseas/dbtoyaml.py --help
Traceback (most recent call last):
  File "pyrseas/dbtoyaml.py", line 12, in <module>
    from pyrseas import yaml
  File "/home/jma/src/andreypopp/pyrseas/yaml.py", line 5, in <module>
    import yaml
  File "/home/jma/src/andreypopp/pyrseas/yaml.py", line 14, in <module>
    yaml.add_representer(MultiLineStr, MultiLineStr_presenter)
AttributeError: 'module' object has no attribute 'add_representer'

If I print(dir(yaml)) before the line in error, I see:

['MultiLineStr', 'MultiLineStr_presenter', '__all__', '__builtins__', '__doc__', '__file__', '__name__', '__package__', 'absolute_import', 'yaml']

My test environment is Python 2.7.3rc2 on Linux (Debian), but also saw the same behavior with Python 3.2.3rc2.

from pyrseas.

jmafc commented on September 17, 2024

This is an interesting thread about splitting db objects in pg_dump output, in particular, various observations on directory structures for the output and making unique names for functions:

http://archives.postgresql.org/pgsql-hackers/2010-12/threads.php#02285
http://archives.postgresql.org/pgsql-hackers/2011-01/thrd16.php#00047

Also, in trying to keep dbtoyaml options similar/parallel to pg_dump options, I think the --repository option could be specified as -Fd/--format=directory since the default format will be a single file (somewhat equivalent to -Fp/--format=plain). I realize that the 'directory' format of pg_dump doesn't output the same structure as we're suggesting here, so perhaps --format=tree or --format=split may be more appropriate.

from pyrseas.

jmafc commented on September 17, 2024

Andrey, I finally got around to looking a bit more at this (because Josep Martínez Vilà is also interested in this feature). Since I couldn't figure out how it was apparently working for you and not for me, I went with a simpler solution, pyrseas/yamlutil.py, as follows:

"""Pyrseas YAML utilities"""

from yaml import add_representer


class MultiLineStr(str):
    """ Marker for multiline strings"""


def MultiLineStr_presenter(dumper, data):
    return dumper.represent_scalar('tag:yaml.org,2002:str', data, style='|')
add_representer(MultiLineStr, MultiLineStr_presenter)

With this and the changes to the __init__ methods of the Function and View classes, I was able to produce the function source and view definitions as multi-line strings, e.g.,

schema public:
  description: standard public schema
  function f1(integer, integer):
    language: sql
    returns: integer
    source: |-
      SELECT GREATEST($1, $2)
  ...
  view v1:
    definition: |2-
       SELECT now()::date AS today;
  view v2:
    definition: |2-
       SELECT t1.c1, t1.c3 * 2
         FROM t1;

However, this did not work properly with some actual views or functions (from production/staging databases) that had large text definitions. This will need some research and further testing.

I'll look at the repository changes next.

from pyrseas.

jmafc commented on September 17, 2024

Change f13980d, based on Andrey's experimental work above has been committed. Now view definitions, function source text and object descriptions (COMMENT text) are formatted in YAML block style.

from pyrseas.

jmafc commented on September 17, 2024

Change 94574eb is initial submission to address this new feature.

from pyrseas.

jmafc commented on September 17, 2024

Change 6a6565d essentially completes the work for this enhacement (although some refinements are still needed).

from pyrseas.

database schema serialization format about pyrseas HOT 12 CLOSED

Comments (12)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent