Giter Club home page Giter Club logo

pyspark-data-sources's Introduction

pyspark-data-sources

pypi

This repository showcases custom Spark data sources built using the new Python Data Source API for the upcoming Apache Spark 4.0 release. For an in-depth understanding of the API, please refer to the API source code. Note this repo is demo only and please be aware that it is not intended for production use. Contributions and feedback are welcome to help improve the examples.

Installation

pip install pyspark-data-sources[all]

Usage

Install the pyspark 4.0 preview version: https://pypi.org/project/pyspark/4.0.0.dev1/

pip install "pyspark[connect]==4.0.0.dev1"

Or use Databricks Runtime 15.2 or above.

Try the data sources!

from pyspark_datasources.github import GithubDataSource

# Register the data source
spark.dataSource.register(GithubDataSource)

spark.read.format("github").load("apache/spark").show()

See more here: https://allisonwang-db.github.io/pyspark-data-sources/.

Contributing

We welcome and appreciate any contributions to enhance and expand the custom data sources. If you're interested in contributing:

  • Add New Data Sources: Want to add a new data source using the Python Data Source API? Submit a pull request or open an issue.
  • Suggest Enhancements: If you have ideas to improve a data source or the API, we'd love to hear them!
  • Report Bugs: Found something that doesn't work as expected? Let us know by opening an issue.

Need help or have questions? Don't hesitate to open a new issue, and we'll do our best to assist you.

Development

Build docs

mkdocs serve

pyspark-data-sources's People

Contributors

allisonwang-db avatar

Stargazers

Cheng Lian avatar Michael Gardner avatar  avatar Harsha avatar nate avatar Lee_gunju avatar Doug Barrett avatar Diego Rodrigues avatar Anders Sollander avatar Enrique Catalá avatar initions-glende avatar  avatar Felix Erb avatar Michael Shtelma avatar Jakub Szczepaniak avatar  avatar SundarShankar89 avatar Vincent Faigt avatar

Watchers

Xiao Li avatar  avatar  avatar

pyspark-data-sources's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.