Giter Club home page Giter Club logo

dbcat's People

Contributors

jhecking avatar marqueewinq avatar nicolepng avatar vrajat avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

dbcat's Issues

Outdated documentation: cannot import name <...>

Hi, I'm trying to understand how to use dbcat and data-lineage with Snowflake.
I was following this guide, which was the only one I found: https://tokern.io/docs/data-lineage/example

I ran into the following errors:

  • cannot import name 'catalog_connection' from 'data_lineage': fixed by replacing data_lineage with dbcat, as seen in another issue.
  • cannot import name 'visit_dml_queries' from 'data_lineage.parser': not fixed, can't find a solution although other users report being able to run this.

Finally, could you please point me out where is an example of how to properly set up dbcat? I was able to do it by trial an error, running dbcat commands until I discovered the config file. The documentation in the repository's README is a bit short.

Thanks and good job!

unable to find catalog details after pull

I can see that the catalog tables are created after pull but I do not see any data within the tables, please let me know if I'm missing something here.

Catalog db: Postgres
Source db: Redshift

I'm also reviewing the code to figure out where the db connection is being made and data is being inserted into postgres db

postgres.public> SELECT t.*
                 FROM public.sources t
                 LIMIT 501
[2021-06-15 14:49:18] 0 rows retrieved in 65 ms

postgres.public> SELECT t.*
                 FROM public.tables t
                 LIMIT 501
[2021-06-15 14:52:40] 0 rows retrieved in 97 ms

Snowflake backend

Hello there! May I ask that whether there's any plans to add Snowflake as the backend storage? Or may I know whether there are any reasons why Snowflake is not considered as a backend? Thanks!

Make schema changes using database migration

I tried the updated docker file, receiving the following error when I execute the sample notebook:

requests.exceptions.HTTPError: 400 Client Error: BAD REQUEST for url: http://127.0.0.1:8000/api/v1/catalog/sources

Looks like one of the sql statements is looking for a missing column called source_id to compare against parameters:

sqlalchemy.exc.ProgrammingError: (psycopg2.errors.UndefinedColumn) column jobs.source_id does not exist

LINE 1: ....name AS jobs_name, jobs.context AS jobs_context, jobs.sourc...

[SQL: SELECT jobs.id AS jobs_id, jobs.name AS jobs_name, jobs.context AS jobs_context, jobs.source_id AS jobs_source_id

FROM jobs

WHERE %(param_1)s = jobs.source_id]

[parameters: {'param_1': 8}]

Originally posted by @siva-mudiyanur in tokern/data-lineage#57 (comment)

Redshift catalog extract

WHERE TABLE_SCHEMA NOT IN ('information_schema', 'pg_catalog')

in dbcat.catalog.db line 175 should be: WHERE SCHEMA NOT IN ('information_schema', 'pg_catalog')
this supplemental where clause is added after the extractor renames the table_schema columns to schema

Error when using 'pull'

Hi,

First of all, thanks for setting up a opensource platform for data lineage. I tried to setup config file and executed the pull command on dbcat but it give me the following error:

dbcat pull
Traceback (most recent call last):
  File "d:\python\lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "d:\python\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "D:\Python\Scripts\dbcat.exe\__main__.py", line 7, in <module>
  File "d:\python\lib\site-packages\click\core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "d:\python\lib\site-packages\click\core.py", line 782, in main
    rv = self.invoke(ctx)
  File "d:\python\lib\site-packages\click\core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "d:\python\lib\site-packages\click\core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "d:\python\lib\site-packages\click\core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "d:\python\lib\site-packages\click\decorators.py", line 33, in new_func
    return f(get_current_context().obj, *args, **kwargs)
  File "d:\python\lib\site-packages\dbcat\__main__.py", line 129, in pull_cli
    catalog = Catalog(**config["catalog"])
TypeError: __init__() missing 1 required positional argument: 'database' 

Please let me know if I missed anything, thank you!

Install errors with version v0.5.4

Hi @vrajat ..I've been getting errors when I try to update the package since the issue is fixed, am I missing something or package versions needs to be loosened up in the code?

INFO: pip is looking at multiple versions of dbcat to determine which version is compatible with other requirements. This could take a while.
ERROR: Cannot install amundsen-databuilder[athena,bigquery,glue,rds,snowflake]==5.2.0 and amundsen-databuilder[athena,bigquery,glue,rds,snowflake]==5.2.2 because these package versions have conflicting dependencies.

The conflict is caused by:
    amundsen-databuilder[athena,bigquery,glue,rds,snowflake] 5.2.2 depends on mysqlclient<3 and >=1.3.6; extra == "rds"
    amundsen-databuilder[athena,bigquery,glue,rds,snowflake] 5.2.0 depends on mysqlclient<3 and >=1.3.6; extra == "rds"

To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip attempt to solve the dependency conflict 

Originally posted by @siva-mudiyanur in #23 (comment)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.