tokern / dbcat Goto Github PK

View Code? Open in Web Editor NEW

30.0 30.0 8.0 531 KB

Data Catalog for Databases and Data Warehouses

License: MIT License

Python 99.72% Mako 0.28%

dbcat's People

Contributors

Stargazers

Watchers

Forkers

vrajat chadhoyt vfisa nicolepng keenborder786 marqueewinq tommyboytech amarshinde150

dbcat's Issues

Outdated documentation: cannot import name <...>

Hi, I'm trying to understand how to use dbcat and data-lineage with Snowflake.
I was following this guide, which was the only one I found: https://tokern.io/docs/data-lineage/example

I ran into the following errors:

cannot import name 'catalog_connection' from 'data_lineage': fixed by replacing data_lineage with dbcat, as seen in another issue.
cannot import name 'visit_dml_queries' from 'data_lineage.parser': not fixed, can't find a solution although other users report being able to run this.

Finally, could you please point me out where is an example of how to properly set up dbcat? I was able to do it by trial an error, running dbcat commands until I discovered the config file. The documentation in the repository's README is a bit short.

Thanks and good job!

unable to find catalog details after pull

I can see that the catalog tables are created after pull but I do not see any data within the tables, please let me know if I'm missing something here.

Catalog db: Postgres
Source db: Redshift

I'm also reviewing the code to figure out where the db connection is being made and data is being inserted into postgres db

postgres.public> SELECT t.*
                 FROM public.sources t
                 LIMIT 501
[2021-06-15 14:49:18] 0 rows retrieved in 65 ms

postgres.public> SELECT t.*
                 FROM public.tables t
                 LIMIT 501
[2021-06-15 14:52:40] 0 rows retrieved in 97 ms

Snowflake backend

Hello there! May I ask that whether there's any plans to add Snowflake as the backend storage? Or may I know whether there are any reasons why Snowflake is not considered as a backend? Thanks!

Make schema changes using database migration

I tried the updated docker file, receiving the following error when I execute the sample notebook:

requests.exceptions.HTTPError: 400 Client Error: BAD REQUEST for url: http://127.0.0.1:8000/api/v1/catalog/sources

Looks like one of the sql statements is looking for a missing column called source_id to compare against parameters:

sqlalchemy.exc.ProgrammingError: (psycopg2.errors.UndefinedColumn) column jobs.source_id does not exist

LINE 1: ....name AS jobs_name, jobs.context AS jobs_context, jobs.sourc...

[SQL: SELECT jobs.id AS jobs_id, jobs.name AS jobs_name, jobs.context AS jobs_context, jobs.source_id AS jobs_source_id

FROM jobs

WHERE %(param_1)s = jobs.source_id]

[parameters: {'param_1': 8}]

Originally posted by @siva-mudiyanur in tokern/data-lineage#57 (comment)

Redshift catalog extract

dbcat/dbcat/catalog/db.py

Line 175 in 938c5af

WHERE TABLE_SCHEMA NOT IN ('information_schema', 'pg_catalog')

in dbcat.catalog.db line 175 should be: WHERE SCHEMA NOT IN ('information_schema', 'pg_catalog')
this supplemental where clause is added after the extractor renames the table_schema columns to schema

Unique Constraint Failed

piicatcher-199

Error when using 'pull'

Hi,

First of all, thanks for setting up a opensource platform for data lineage. I tried to setup config file and executed the pull command on dbcat but it give me the following error:

dbcat pull
Traceback (most recent call last):
  File "d:\python\lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "d:\python\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "D:\Python\Scripts\dbcat.exe\__main__.py", line 7, in <module>
  File "d:\python\lib\site-packages\click\core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "d:\python\lib\site-packages\click\core.py", line 782, in main
    rv = self.invoke(ctx)
  File "d:\python\lib\site-packages\click\core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "d:\python\lib\site-packages\click\core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "d:\python\lib\site-packages\click\core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "d:\python\lib\site-packages\click\decorators.py", line 33, in new_func
    return f(get_current_context().obj, *args, **kwargs)
  File "d:\python\lib\site-packages\dbcat\__main__.py", line 129, in pull_cli
    catalog = Catalog(**config["catalog"])
TypeError: __init__() missing 1 required positional argument: 'database'

Please let me know if I missed anything, thank you!

Install errors with version v0.5.4

Hi @vrajat ..I've been getting errors when I try to update the package since the issue is fixed, am I missing something or package versions needs to be loosened up in the code?

INFO: pip is looking at multiple versions of dbcat to determine which version is compatible with other requirements. This could take a while.
ERROR: Cannot install amundsen-databuilder[athena,bigquery,glue,rds,snowflake]==5.2.0 and amundsen-databuilder[athena,bigquery,glue,rds,snowflake]==5.2.2 because these package versions have conflicting dependencies.

The conflict is caused by:
    amundsen-databuilder[athena,bigquery,glue,rds,snowflake] 5.2.2 depends on mysqlclient<3 and >=1.3.6; extra == "rds"
    amundsen-databuilder[athena,bigquery,glue,rds,snowflake] 5.2.0 depends on mysqlclient<3 and >=1.3.6; extra == "rds"

To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip attempt to solve the dependency conflict

Originally posted by @siva-mudiyanur in #23 (comment)

tokern / dbcat Goto Github PK

dbcat's People

Contributors

Stargazers

Watchers

Forkers

dbcat's Issues

Outdated documentation: cannot import name <...>

unable to find catalog details after pull

Snowflake backend

Make schema changes using database migration

Redshift catalog extract

Unique Constraint Failed

Error when using 'pull'

Install errors with version v0.5.4

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent