Giter Club home page Giter Club logo

Comments (5)

poidra02 avatar poidra02 commented on July 22, 2024 1

Thanks so much for getting back to me @drewbanin - and happy to collaborate on this! Submitted a pull request with a stab at showing how it could work. I'm able to connect to Azure Databricks after the change, but I don't have an AWS version to regression test against.

The organization seems to be an internal number generated in Azure Databricks when you provision a workspace. All the Azure docs reference it when you want to connect to the spark cluster, and I wasn't able to connect before these changes.

It seems like 0 may be the default from spark, and perhaps how AWS' version of Databricks works? But 0 doesn't work at all on the Azure side... perhaps there was an Azure specific nuance they had to work around in how they provision clusters.

Happy to discuss further and react to any feedback! (I might be misunderstanding a few things in the code still as I'm learning dbt). Lastly, I get an SSL warning in the logs after successfully connecting.

There probably are some more tests on the model side of things to confirm Azure Databricks works as expected... I have a test cluster we can collaborate on if needed.

from dbt-spark.

drewbanin avatar drewbanin commented on July 22, 2024

Thanks for the report @akhildhingra! Would you recommend that we change something about how the dbt-spark plugin works? Or is there something we should change in the documentation?

I'm not so familiar with databricks and would appreciate if you could explain further!

from dbt-spark.

poidra02 avatar poidra02 commented on July 22, 2024

Hey @drewbanin - I'm not having the same level of luck as @akhildhingra , but since we didn't get this documented, I definitely wanted to chime in.

In dbt/adapters/spark/connection.py, I see this:
SPARK_CONNECTION_URL = "https://{host}:{port}/sql/protocolv1/o/0/{cluster}"

based on Databricks' documentation:
https://docs.azuredatabricks.net/user-guide/bi/jdbc-odbc-bi.html#http-path

You'd typically be setting it to this:
SPARK_CONNECTION_URL = "https://{host}:{port}/sql/protocolv1/o/{organization}/{cluster}"

One grabs organization from the Databricks service itself, and I believe it's roughly equivalent to an id that uniquely represents your instance of the service.

I'm not able to get this working by editing that file yet, but wanted to try and jumpstart the conversation since you guys have something really interesting going here with this package!

from dbt-spark.

drewbanin avatar drewbanin commented on July 22, 2024

Hey @poidra02 - that's pretty interesting! We actually initially developed this plugin against Databricks Spark, but I remember following this guide for connecting to a cluster via JDBC. You can see the /sql/protocolv1/o/0/{cluster}string in the screenshot on that page.

Can you say just a tiny bit more about this?

One grabs organization from the Databricks service itself, and I believe it's roughly equivalent to an id that uniquely represents your instance of the service.

I'm super happy to add an organization config to the target config for Spark connections, and we can probably make the default value 0 for backwards compatibility. I would like to wrap my head around the difference between using 0 and the organization before making that change though!

Let me know if you'd like a hand with making this change in a fork of dbt-spark. If you're able to get it working locally, then I'd happily accept a PR for the change :)

from dbt-spark.

jtcohen6 avatar jtcohen6 commented on July 22, 2024

Fixed in #34

from dbt-spark.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.