bluishglc Goto Github PK
Name: Laurence Geng
Type: User
Bio: Architect, author of the book Big Data Platform Architecture and Prototype Implementation,sales page: https://item.jd.com/12677623.html
Location: Shanghai, China
Name: Laurence Geng
Type: User
Bio: Architect, author of the book Big Data Platform Architecture and Prototype Implementation,sales page: https://item.jd.com/12677623.html
Location: Shanghai, China
《spark高级数据分析》练习
A set of notebooks to explore and explain core conceptions of Apache Hudi, such as file layouts, file sizing, compaction, clustering and so on.
Universal Command Line Interface for Amazon Web Services
This command line tool is a useful complement to aws-cli. It offers a suite of utilities that manages and operates ec2, emr and other aws services.
The AWS Glue Data Catalog is a fully managed, Apache Hive Metastore compatible, metadata repository. Customers can use the Data Catalog as a central repository to store structural and operational metadata for their data. AWS Glue provides out-of-box integration with Amazon EMR that enables customers to use the AWS Glue Data Catalog as an external Hive Metastore. This is an open-source implementation of the Apache Hive Metastore client on Amazon EMR clusters that uses the AWS Glue Data Catalog as an external Hive Metastore. It serves as a reference implementation for building a Hive Metastore-compatible client that connects to the AWS Glue Data Catalog. It may be ported to other Hive Metastore-compatible platforms such as other Hadoop and Apache Spark distributions
Exemplos de consumo e produção de eventos no Kafka (+ Schema Registry) utilizando o AWS Glue.
This workshop is meant to give customers a hands-on experience with mentioned AWS services. Serverless Data Lake workshop helps customers build a cloud-native and future-proof serverless data lake architecture. It allows hands-on time with AWS big data and analytics services including Amazon Kinesis Services for streaming data ingestion and analytics, AWS Glue for ETL and Data Catalogue Management, Amazon Athena to query data lake.
This is a bash library for reading ini style config files, which allows multiple [section] entries.
simple INI file parser
A prototype project of big data platform, the source codes of the book Big Data Platform Architecture and Prototype
大数据生态解决方案数据平台:基于大数据、数据平台、微服务、机器学习、商城、自动化运维、DevOps、容器部署平台、数据平台采集、数据平台存储、数据平台计算、数据平台开发、数据平台应用搭建的大数据解决方案。
大数据知识仓库涉及到数据仓库建模、实时计算、大数据、数据中台、系统设计、Java、算法等。
Flink cdc 整库同步 & flink 代码 demo
Scala wrapper for Typesafe config
Routines & sample codes to demonstrate framework usage and architecture thought.
This tool can easily make / build an emr cluster edge node / client node / gateway node
A utilities library for Amazon EMR Serverless, i.e. a generic job class for executing sql files, and so on.
:helicopter::rocket:基于Flink实现的商品实时推荐系统。flink统计商品热度,放入redis缓存,分析日志信息,将画像标签和实时记录放入Hbase。在用户发起推荐请求后,根据用户画像重排序热度榜,并结合协同过滤和标签两个推荐模块为新生成的榜单的每一个产品添加关联产品,最后返回新的用户列表。
Self-contained demo using Flink SQL and Debezium to build a CDC-based analytics pipeline. All you need is Docker! :whale:
This project is developed in 2011, I wrote it for learning MVC pattern and Java Swing library.
An example project to demo how Glue read and write hudi dataset, and also sync metadata to Glue Catalog.
A series of Jupyter notebooks that walk you through the fundamentals of Machine Learning and Deep Learning in Python using Scikit-Learn, Keras and TensorFlow 2.
Spark RDD to read and write from HBase
HBase RDD example project
一键建湖,增量入湖方案
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.