Comments (5)
Thanks so much for getting back to me @drewbanin - and happy to collaborate on this! Submitted a pull request with a stab at showing how it could work. I'm able to connect to Azure Databricks after the change, but I don't have an AWS version to regression test against.
The organization seems to be an internal number generated in Azure Databricks when you provision a workspace. All the Azure docs reference it when you want to connect to the spark cluster, and I wasn't able to connect before these changes.
It seems like 0 may be the default from spark, and perhaps how AWS' version of Databricks works? But 0 doesn't work at all on the Azure side... perhaps there was an Azure specific nuance they had to work around in how they provision clusters.
Happy to discuss further and react to any feedback! (I might be misunderstanding a few things in the code still as I'm learning dbt). Lastly, I get an SSL warning in the logs after successfully connecting.
There probably are some more tests on the model side of things to confirm Azure Databricks works as expected... I have a test cluster we can collaborate on if needed.
from dbt-spark.
Thanks for the report @akhildhingra! Would you recommend that we change something about how the dbt-spark
plugin works? Or is there something we should change in the documentation?
I'm not so familiar with databricks and would appreciate if you could explain further!
from dbt-spark.
Hey @drewbanin - I'm not having the same level of luck as @akhildhingra , but since we didn't get this documented, I definitely wanted to chime in.
In dbt/adapters/spark/connection.py, I see this:
SPARK_CONNECTION_URL = "https://{host}:{port}/sql/protocolv1/o/0/{cluster}"
based on Databricks' documentation:
https://docs.azuredatabricks.net/user-guide/bi/jdbc-odbc-bi.html#http-path
You'd typically be setting it to this:
SPARK_CONNECTION_URL = "https://{host}:{port}/sql/protocolv1/o/{organization}/{cluster}"
One grabs organization from the Databricks service itself, and I believe it's roughly equivalent to an id that uniquely represents your instance of the service.
I'm not able to get this working by editing that file yet, but wanted to try and jumpstart the conversation since you guys have something really interesting going here with this package!
from dbt-spark.
Hey @poidra02 - that's pretty interesting! We actually initially developed this plugin against Databricks Spark, but I remember following this guide for connecting to a cluster via JDBC. You can see the /sql/protocolv1/o/0/{cluster}
string in the screenshot on that page.
Can you say just a tiny bit more about this?
One grabs organization from the Databricks service itself, and I believe it's roughly equivalent to an id that uniquely represents your instance of the service.
I'm super happy to add an organization
config to the target
config for Spark connections, and we can probably make the default value 0
for backwards compatibility. I would like to wrap my head around the difference between using 0
and the organization
before making that change though!
Let me know if you'd like a hand with making this change in a fork of dbt-spark
. If you're able to get it working locally, then I'd happily accept a PR for the change :)
from dbt-spark.
Fixed in #34
from dbt-spark.
Related Issues (20)
- [Bug] New test failure for `tests/functional/adapter/test_constraints.py::TestSparkIncrementalConstraintsRollback::test__constraints_enforcement_rollback` HOT 2
- [Release Improvements] Refresh workflows for the `pyproject.toml` migration
- Import relevant pytest(s) for cross-database `cast` macro
- Cross-database `date` macro
- [Bug] New test failure for `tests/functional/adapter/test_python_model.py::TestPythonIncrementalModelSpark::test_incremental`
- [Bug] dbt run will fail if default namespace doesn't exist. HOT 1
- [Bug] Cannot run unit tests against Spark/Hudi, receiving "NoneType is not iterable" error HOT 3
- [Bug] `spark__list_relations_without_caching` expects legacy `schema` field HOT 1
- [Feature] Lazy load agate
- [Feature] Support HTTP transport protocol for Thrift method
- [Feature] Support OCI Dataflow as a backend for dbt-spark
- `dbt-core` Dockerfile does not work for `dbt-spark` due to `PyHive` HOT 2
- [Bug] CI is broken on `main` due to dependency resolution and timeout issues HOT 1
- [Feature] Spike on supporting Py3.12 in dbt-spark
- [Bug] The tblproperties are not applied when using Python Model to create a table HOT 1
- [Issue] sasl as a dependency HOT 3
- [Feature] Livy connection support for Spark SQL models
- [Unit Testing] Add functional tests for unit testing: HOT 1
- [Bug] dbt docs generate does not include Data Type (data stored in iceberg format) HOT 1
- [Bug] flake8 code smells
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dbt-spark.