Topic: data-lake Goto Github
Some thing interesting about data-lake
Some thing interesting about data-lake
data-lake,Personal Data Engineering Projects
User: alanchn31
data-lake,Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.
Organization: apache
Home Page: https://kyuubi.apache.org/
data-lake,Learn how to use Kinesis Firehose, AWS Glue, S3, and Amazon Athena by streaming and analyzing reddit comments in realtime. 100-200 level tutorial.
Organization: aws-samples
data-lake,Reference Architectures for Datalakes on AWS
Organization: aws-samples
data-lake,Amazon S3 Find and Forget is a solution to handle data erasure requests from data lakes stored on Amazon S3, for example, pursuant to the European General Data Protection Regulation (GDPR)
Organization: awslabs
data-lake,Enterprise-grade, production-hardened, serverless data lake on AWS
Organization: awslabs
Home Page: https://sdlf.workshop.aws/
data-lake,Samples and Docs for Azure Data Lake Store and Analytics
Organization: azure
Home Page: http://aka.ms/AzureDataLake
data-lake,BitSail is a distributed high-performance data integration engine which supports batch, streaming and incremental scenarios. BitSail is widely used to synchronize hundreds of trillions of data every day.
Organization: bytedance
Home Page: https://bytedance.github.io/bitsail/
data-lake,GraphQL API for Zeebe data
Organization: camunda-community-hub
data-lake,Data API Framework for AI Agents and Data Apps
Organization: canner
Home Page: https://vulcansql.com
data-lake,🤖 The semantic engine for LLMs, bringing semantic context to AI agents. 🔥
Organization: canner
Home Page: https://getwren.ai/oss
data-lake,Use SQL to build ELT pipelines on a data lakehouse.
Organization: cuebook
Home Page: https://cuelake.cuebook.ai
data-lake,A K8s-based infrastructure for analytics
Organization: data-mill-cloud
data-lake,Lighthouse is a library for data lakes built on top of Apache Spark. It provides high-level APIs in Scala to streamline data pipelines and apply best practices.
Organization: datamindedbe
Home Page: https://datamindedbe.github.io/lighthouse/
data-lake,Terraform module for an Azure Data Lake
Organization: datarootsio
data-lake,Cloudflare R2 bucket File Uploader with multipart upload enabled. Tested with files up to 10 GB size.
Organization: datopian
data-lake,data load tool (dlt) is an open source Python library that makes data loading easy 🛠️
Organization: dlt-hub
Home Page: https://dlthub.com/docs
data-lake,Sample Data Lakehouse deployed in Docker containers using Apache Iceberg, Minio, Trino and a Hive Metastore. Can be used for local testing.
User: dominikhei
data-lake,Demonstration of a Hive Input Format for Iceberg
Organization: expediagroup
data-lake,Resources for video demonstrations and blog posts related to DataOps on AWS
User: garystafford
data-lake,A Git-like version control file system for data lineage & data collaboration.
Organization: gitdataai
Home Page: https://gitdata.ai
data-lake,A Rust implementation of the Iceberg REST Catalog specification.
Organization: hansetag
data-lake,This Python MySQL Repo shows you how to use MySQL Connector Python to access MySQL databases. You will learn how to connect to MySQL database and perform common database operations such as SELECT, INSERT, UPDATE, & DELETE in Python.
User: imsanjoykb
Home Page: https://imsanjoykb.github.io/
data-lake,Road to Azure Data Engineer Part-I: DP-200 - Implementing an Azure Data Solution
User: jayvardhan-reddy
data-lake,Real Time Big Data / IoT Machine Learning (Model Training and Inference) with HiveMQ (MQTT), TensorFlow IO and Apache Kafka - no additional data store like S3, HDFS or Spark required
User: kaiwaehner
data-lake,Udacity Data Engineering Nanodegree Program
User: kenthsu
data-lake,Apache Spark 3 - Structured Streaming Course Material
User: learningjournal
Home Page: https://www.learningjournal.guru
data-lake,Apache Spark Course Material
User: learningjournal
Home Page: https://www.learningjournal.guru
data-lake,The DBT of ML, as Aligned describes data dependencies in ML systems, and reduce technical data debt
User: matsmoll
Home Page: https://www.aligned.codes
data-lake,A Declarative framework for Building, Maintaining, and Analyzing Graph Data
Organization: nodestream-proj
Home Page: https://nodestream-proj.github.io/docs/
data-lake,Data Engineering - Metropolitan Transportation Authority (MTA) Subway Data Analysis
User: ozkary
Home Page: https://www.ozkary.com/2023/03/data-engineering-process-fundamentals.html
data-lake,An efficient storage and compute engine for both on-prem and cloud-native data analytics.
Organization: pixelsdb
data-lake,JobAnalytics system consumes data from multiple sources and provides valuable information to both job hunters and recruiters.
User: rayyan17
data-lake,Portfolio of projects and studies conducted in data engineering.
User: razevedo1994
data-lake,rtdl makes it easy to build and maintain a real-time data lake
Organization: realtimedatalake
Home Page: https://rtdl.io
data-lake,An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
User: san089
data-lake,Few projects related to Data Engineering including Data Modeling, Infrastructure setup on cloud, Data Warehousing and Data Lake development.
User: san089
data-lake,Smart Automation Tool for building modern Data Lakes and Data Pipelines
Organization: smart-data-lake
Home Page: https://www.smartdatalake.io
data-lake,Breaking Cloud Native Web APIs in their natural habitat.
Organization: suecodelabs
data-lake,Kylo is a data lake management software platform and framework for enabling scalable enterprise-class data lakes on big data technologies such as Teradata, Apache Spark and/or Hadoop. Kylo is licensed under Apache 2.0. Contributed by Teradata Inc.
Organization: teradata
Home Page: http://kylo.io
data-lake,lakeFS - Data version control for your data lake | Git for data
Organization: treeverse
Home Page: https://docs.lakefs.io
data-lake,Generic Data Ingestion & Dispersal Library for Hadoop
Organization: uber
Home Page: https://eng.uber.com/marmaray-hadoop-ingestion-open-source/
data-lake,Data Engineer with Python lecture notes from #datacamp.
User: wathon
data-lake,The spatial table format for spatial lakehouse
Organization: wherobots
Home Page: https://docs.wherobots.com/latest/references/havasu/spec/
data-lake,📒(GitBook) A curated list of awesome Data Engineering resources
User: yahwang
Home Page: https://yahwang.gitbook.io/awesome-data-engineering/
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.