Topic: pyspark Goto Github
Some thing interesting about pyspark
Some thing interesting about pyspark
pyspark,关注AI模型上线、模型部署
User: aipredict
pyspark,Implementing best practices for PySpark ETL jobs and applications.
User: alexioannides
pyspark,A comprehensive Spark guide collated from multiple sources that can be referred to learn more about Spark or as an interview refresher.
User: ankurchavda
pyspark,An open source, standard data file format for graph data storage and retrieval.
Organization: apache
Home Page: https://graphar.apache.org/
pyspark,Apache Linkis builds a computation middleware layer to facilitate connection, governance and orchestration between the upper applications and the underlying data engines.
Organization: apache
Home Page: https://linkis.apache.org/
pyspark,A curated list of awesome Apache Spark packages and resources.
Organization: awesome-spark
pyspark,Spark Gotchas. A subjective compilation of the Apache Spark tips and tricks
Organization: awesome-spark
pyspark,Apache Spark Connector for Azure Cosmos DB
Organization: azure
pyspark,t-Digest data structure in Python. Useful for percentiles and quantiles, including distributed enviroments like PySpark
User: camdavidsonpilon
pyspark,Pandas, Polars, and Spark DataFrame comparison for humans and more!
Organization: capitalone
Home Page: https://capitalone.github.io/datacompy/
pyspark,PySpark Cheat Sheet - example code to help you learn PySpark and develop apps faster
User: cartershanklin
pyspark,Learn Apache Spark in Scala, Python (PySpark) and R (SparkR) by building your own cluster with a JupyterLab interface on Docker. :zap:
Organization: cluster-apps-on-docker
pyspark,Process Common Crawl data with Python and Spark
Organization: commoncrawl
pyspark,Toolkit for Apache Spark ML for Feature clean-up, feature Importance calculation suite, Information Gain selection, Distributed SMOTE, Model selection and training, Hyper parameter optimization and selection, Model interprability.
Organization: databrickslabs
pyspark,Generate relevant synthetic data quickly for your projects. The Databricks Labs synthetic data generator (aka `dbldatagen`) may be used to generate large simulated / synthetic data sets for test, POCs, and other uses in Databricks environments including in Delta Live Tables pipelines
Organization: databrickslabs
Home Page: https://databrickslabs.github.io/dbldatagen
pyspark,A boilerplate for writing PySpark Jobs
User: ekampf
pyspark,This is a repo documenting the best practices in PySpark.
User: ericxiao251
Home Page: https://ericxiao251.github.io/spark-syntax/
pyspark,A library that provides useful extensions to Apache Spark and PySpark.
Organization: g-research
pyspark,Sparkling Water provides H2O functionality inside Spark cluster
Organization: h2oai
Home Page: https://docs.h2o.ai/sparkling-water/3.3/latest-stable/doc/index.html
pyspark,80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Functions, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.
User: harisekhon
Home Page: https://www.linkedin.com/in/HariSekhon
pyspark,:truck: Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
Organization: hi-primus
Home Page: https://hi-optimus.com
pyspark,Gathers Python deployment, infrastructure and practices.
User: huseinzol05
pyspark,the portable Python dataframe library
Organization: ibis-project
Home Page: https://ibis-project.org
pyspark,Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
User: jadianes
Home Page: http://jadianes.github.io/spark-py-notebooks
pyspark,Java library for approximate nearest neighbors search using Hierarchical Navigable Small World graphs
User: jelmerk
pyspark,State of the Art Natural Language Processing
Organization: johnsnowlabs
Home Page: https://sparknlp.org/
pyspark,Code for "Efficient Data Processing in Spark" Course
User: josephmachado
Home Page: https://josephmachado.podia.com/efficient-data-processing-in-spark
pyspark,Jupyter magics and kernels for working with remote Spark clusters
Organization: jupyter-incubator
pyspark,🐍 Quick reference guide to common patterns & functions in PySpark.
User: kevinschaich
Home Page: https://spark.apache.org/docs/latest/api/python/pyspark.sql.html
pyspark,Kuwala is the no-code data platform for BI analysts and engineers enabling you to build powerful analytics workflows. We are set out to bring state-of-the-art data engineering tools you love, such as Airbyte, dbt, or Great Expectations together in one intuitive interface built with React Flow. In addition we provide third-party data into data science models and products with a focus on geospatial data. Currently, the following data connectors are available worldwide: a) High-resolution demographics data b) Point of Interests from Open Street Map c) Google Popular Times
Organization: kuwala-io
Home Page: https://kuwala.io
pyspark,Hopsworks - Data-Intensive AI platform with a Feature Store
Organization: logicalclocks
Home Page: https://hopsworks.ai
pyspark,pyspark🍒🥭 is delicious,just eat it!😋😋
User: lyhue1991
pyspark, MapReduce, Spark, Java, and Scala for Data Algorithms Book
User: mahmoudparsian
Home Page: http://mapreduce4hackers.com
pyspark,O'Reilly Book: [Data Algorithms with Spark] by Mahmoud Parsian
User: mahmoudparsian
pyspark,PySpark-Tutorial provides basic algorithms using PySpark
User: mahmoudparsian
Home Page: http://mapreduce4hackers.com
pyspark,Simple and Distributed Machine Learning
Organization: microsoft
Home Page: http://aka.ms/spark
pyspark,MorphL Community Edition uses big data and machine learning to predict user behaviors in digital products and services with the end goal of increasing KPIs (click-through rates, conversion rates, etc.) through personalization
Organization: morphl-ai
Home Page: https://morphl.io
pyspark,pyspark methods to enhance developer productivity 📣 👯 🎉
User: mrpowers
Home Page: https://mrpowers.github.io/quinn/
pyspark,Python framework for building efficient data pipelines. It promotes modularity and collaboration, enabling the creation of complex pipelines from simple, reusable components.
Organization: nike-inc
Home Page: https://engineering.nike.com/koheesio/
pyspark,SQL data analysis & visualization projects using MySQL, PostgreSQL, SQLite, Tableau, Apache Spark and pySpark.
User: ptyadana
pyspark,A tool for building feature stores.
Organization: quintoandar
pyspark,LearningApacheSpark
User: runawayhorse001
Home Page: https://runawayhorse001.github.io/LearningApacheSpark/
pyspark,Fundamentals of Spark with Python (using PySpark), code examples
User: tirthajyoti
pyspark,Isolation Forest on Spark
User: titicaca
pyspark,Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.
Organization: uber
pyspark,Scriptis is for interactive data analysis with script development(SQL, Pyspark, HiveQL), task submission(Spark, Hive), UDF, function, resource management and intelligent diagnosis.
Organization: webankfintech
pyspark,Apache Spark (PySpark) Practice on Real Data
User: xd-deng
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.