tokern / dbcat Goto Github PK
View Code? Open in Web Editor NEWData Catalog for Databases and Data Warehouses
License: MIT License
Data Catalog for Databases and Data Warehouses
License: MIT License
Hi, I'm trying to understand how to use dbcat and data-lineage with Snowflake.
I was following this guide, which was the only one I found: https://tokern.io/docs/data-lineage/example
I ran into the following errors:
cannot import name 'catalog_connection' from 'data_lineage'
: fixed by replacing data_lineage with dbcat, as seen in another issue.cannot import name 'visit_dml_queries' from 'data_lineage.parser'
: not fixed, can't find a solution although other users report being able to run this.Finally, could you please point me out where is an example of how to properly set up dbcat
? I was able to do it by trial an error, running dbcat commands until I discovered the config file. The documentation in the repository's README is a bit short.
Thanks and good job!
I can see that the catalog tables are created after pull but I do not see any data within the tables, please let me know if I'm missing something here.
Catalog db: Postgres
Source db: Redshift
I'm also reviewing the code to figure out where the db connection is being made and data is being inserted into postgres db
postgres.public> SELECT t.*
FROM public.sources t
LIMIT 501
[2021-06-15 14:49:18] 0 rows retrieved in 65 ms
postgres.public> SELECT t.*
FROM public.tables t
LIMIT 501
[2021-06-15 14:52:40] 0 rows retrieved in 97 ms
Hello there! May I ask that whether there's any plans to add Snowflake as the backend storage? Or may I know whether there are any reasons why Snowflake is not considered as a backend? Thanks!
I tried the updated docker file, receiving the following error when I execute the sample notebook:
requests.exceptions.HTTPError: 400 Client Error: BAD REQUEST for url: http://127.0.0.1:8000/api/v1/catalog/sources
Looks like one of the sql statements is looking for a missing column called source_id to compare against parameters:
sqlalchemy.exc.ProgrammingError: (psycopg2.errors.UndefinedColumn) column jobs.source_id does not exist
LINE 1: ....name AS jobs_name, jobs.context AS jobs_context, jobs.sourc...
[SQL: SELECT jobs.id AS jobs_id, jobs.name AS jobs_name, jobs.context AS jobs_context, jobs.source_id AS jobs_source_id
FROM jobs
WHERE %(param_1)s = jobs.source_id]
[parameters: {'param_1': 8}]
Originally posted by @siva-mudiyanur in tokern/data-lineage#57 (comment)
Line 175 in 938c5af
in dbcat.catalog.db line 175 should be: WHERE SCHEMA NOT IN ('information_schema', 'pg_catalog')
this supplemental where clause is added after the extractor renames the table_schema columns to schema
Hi,
First of all, thanks for setting up a opensource platform for data lineage. I tried to setup config file and executed the pull command on dbcat but it give me the following error:
dbcat pull
Traceback (most recent call last):
File "d:\python\lib\runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "d:\python\lib\runpy.py", line 85, in _run_code
exec(code, run_globals)
File "D:\Python\Scripts\dbcat.exe\__main__.py", line 7, in <module>
File "d:\python\lib\site-packages\click\core.py", line 829, in __call__
return self.main(*args, **kwargs)
File "d:\python\lib\site-packages\click\core.py", line 782, in main
rv = self.invoke(ctx)
File "d:\python\lib\site-packages\click\core.py", line 1259, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "d:\python\lib\site-packages\click\core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "d:\python\lib\site-packages\click\core.py", line 610, in invoke
return callback(*args, **kwargs)
File "d:\python\lib\site-packages\click\decorators.py", line 33, in new_func
return f(get_current_context().obj, *args, **kwargs)
File "d:\python\lib\site-packages\dbcat\__main__.py", line 129, in pull_cli
catalog = Catalog(**config["catalog"])
TypeError: __init__() missing 1 required positional argument: 'database'
Please let me know if I missed anything, thank you!
Hi @vrajat ..I've been getting errors when I try to update the package since the issue is fixed, am I missing something or package versions needs to be loosened up in the code?
INFO: pip is looking at multiple versions of dbcat to determine which version is compatible with other requirements. This could take a while.
ERROR: Cannot install amundsen-databuilder[athena,bigquery,glue,rds,snowflake]==5.2.0 and amundsen-databuilder[athena,bigquery,glue,rds,snowflake]==5.2.2 because these package versions have conflicting dependencies.
The conflict is caused by:
amundsen-databuilder[athena,bigquery,glue,rds,snowflake] 5.2.2 depends on mysqlclient<3 and >=1.3.6; extra == "rds"
amundsen-databuilder[athena,bigquery,glue,rds,snowflake] 5.2.0 depends on mysqlclient<3 and >=1.3.6; extra == "rds"
To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip attempt to solve the dependency conflict
Originally posted by @siva-mudiyanur in #23 (comment)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.