Hi I tried connecting databricks via http. The dbt debug was failing

Thanks so much for getting back to me <a class="user-mention notranslate" data-hoverca

Thanks for the report <a class="user-mention notranslate" data-hovercard-type="user" d

Hey <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url=

Fixed in <a class="issue-link js-issue-link" data-error-text="Failed to load title" da

Spark_connection_url do not contain workspace_id while connecting to databricks about dbt-spark HOT 5 CLOSED

dbt-labs commented on July 22, 2024

Spark_connection_url do not contain workspace_id while connecting to databricks

from dbt-spark.

Comments (5)

poidra02 commented on July 22, 2024 1

Thanks so much for getting back to me @drewbanin - and happy to collaborate on this! Submitted a pull request with a stab at showing how it could work. I'm able to connect to Azure Databricks after the change, but I don't have an AWS version to regression test against.

The organization seems to be an internal number generated in Azure Databricks when you provision a workspace. All the Azure docs reference it when you want to connect to the spark cluster, and I wasn't able to connect before these changes.

It seems like 0 may be the default from spark, and perhaps how AWS' version of Databricks works? But 0 doesn't work at all on the Azure side... perhaps there was an Azure specific nuance they had to work around in how they provision clusters.

Happy to discuss further and react to any feedback! (I might be misunderstanding a few things in the code still as I'm learning dbt). Lastly, I get an SSL warning in the logs after successfully connecting.

There probably are some more tests on the model side of things to confirm Azure Databricks works as expected... I have a test cluster we can collaborate on if needed.

from dbt-spark.

drewbanin commented on July 22, 2024

Thanks for the report @akhildhingra! Would you recommend that we change something about how the dbt-spark plugin works? Or is there something we should change in the documentation?

I'm not so familiar with databricks and would appreciate if you could explain further!

from dbt-spark.

poidra02 commented on July 22, 2024

Hey @drewbanin - I'm not having the same level of luck as @akhildhingra , but since we didn't get this documented, I definitely wanted to chime in.

In dbt/adapters/spark/connection.py, I see this:
SPARK_CONNECTION_URL = "https://{host}:{port}/sql/protocolv1/o/0/{cluster}"

based on Databricks' documentation:
https://docs.azuredatabricks.net/user-guide/bi/jdbc-odbc-bi.html#http-path

You'd typically be setting it to this:
SPARK_CONNECTION_URL = "https://{host}:{port}/sql/protocolv1/o/{organization}/{cluster}"

One grabs organization from the Databricks service itself, and I believe it's roughly equivalent to an id that uniquely represents your instance of the service.

I'm not able to get this working by editing that file yet, but wanted to try and jumpstart the conversation since you guys have something really interesting going here with this package!

from dbt-spark.

drewbanin commented on July 22, 2024

Hey @poidra02 - that's pretty interesting! We actually initially developed this plugin against Databricks Spark, but I remember following this guide for connecting to a cluster via JDBC. You can see the /sql/protocolv1/o/0/{cluster}string in the screenshot on that page.

Can you say just a tiny bit more about this?

One grabs organization from the Databricks service itself, and I believe it's roughly equivalent to an id that uniquely represents your instance of the service.

I'm super happy to add an organization config to the target config for Spark connections, and we can probably make the default value 0 for backwards compatibility. I would like to wrap my head around the difference between using 0 and the organization before making that change though!

Let me know if you'd like a hand with making this change in a fork of dbt-spark. If you're able to get it working locally, then I'd happily accept a PR for the change :)

from dbt-spark.

jtcohen6 commented on July 22, 2024

Fixed in #34

from dbt-spark.

Spark_connection_url do not contain workspace_id while connecting to databricks about dbt-spark HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent