Giter Club home page Giter Club logo

bigdata-tut's Introduction

大数据学习项目

项目基本结构

工作过程中学习了:
    Java、Scala、SpringBoot、Maven
    Kafka、Spark、Flink
    Python、Pandas、Machine Learning 
    Git、GitLab CI/CD

自己想系统学习下大数据的如下知识:
    Hadoop2.x:尚硅谷Hadoop 2.x教程(hadoop框架精讲)
    Hadoop3.x:尚硅谷丨大数据Hadoop 3.x(2021全新升级/部署+源码+实战)
    Spark:尚硅谷2021迎新版大数据Spark从入门到精通
    Flink:尚硅谷2021最新Java版Flink(武老师清华硕士,原IBM-CDL负责人)
    Zookeeper:尚硅谷Zookeeper教程(zookeeper框架精讲)
    Hive:尚硅谷2021版Hive教程(基于hive3.1.2)
    HA:尚硅谷HA教程(大数据ha快速入门)
    Flume:尚硅谷Flume教程(flume框架快速入门)
    Kafka:尚硅谷Kafka教程(kafka框架快速入门)
    HBase:尚硅谷HBase教程(hbase框架快速入门)
    Sqoop:尚硅谷Sqoop教程(sqoop大数据开发标配)
    Azkaban:尚硅谷Azkaban教程(azkaban大数据快速入门)
    Oozie:尚硅谷Oozie教程(oozie大数据开发标配)
    Scala:尚硅谷Scala教程(大数据开发标配)
     
    大数据项目实战
    电信客服:尚硅谷大数据项目教程(大数据实战电信客服)
    机器学习与推荐系统:尚硅谷机器学习和推荐系统项目实战教程全套完整版(初学者零基础快速入门)
    电商推荐系统:尚硅谷大数据项目教程(大数据实战电商推荐系统)
    电商数仓V2.0:尚硅谷大数据项目数据仓库,电商数仓V2.0新版
    电商数仓V3.0:尚硅谷大数据电商数仓V3.0版本教程(数据仓库项目开发实战)
    阿里云离线数仓(阿里云委托录制):尚硅谷离线数据仓库项目(阿里云离线数仓)
    阿里云实时数仓(阿里云委托录制):尚硅谷实时数据仓库项目(阿里云实时数仓)
    电商项目(实时处理):电商项目_大数据实时处理(SparkStreaming版)

其中关键学习的组件是:
    Flume:尚硅谷Flume教程(flume框架快速入门)
    Hive:尚硅谷2021版Hive教程(基于hive3.1.2)
    HA:尚硅谷HA教程(大数据ha快速入门)
    Sqoop:尚硅谷Sqoop教程(sqoop大数据开发标配)
    Azkaban:尚硅谷Azkaban教程(azkaban大数据快速入门)
    Oozie:尚硅谷Oozie教程(oozie大数据开发标配)

以上知识主要参考:
    https://www.bilibili.com/video/BV1Qp4y1n7EN
    https://www.bilibili.com/read/cv5213600

自学目录

大数据的三件事:
    海量数据存储/传输/计算

Flume

Flume
    web/disk->source
    channel
    sink->hdfs/kafka
AVRO:
    轻量级的RPC框架
实践:
    使用Flume接受Linux发送到指定的端口的数据并打印到控制台
    nc localhost 4444
    日志搜集:

Sqoop

解决的问题是从各种关系数据库到Hadoop的转换

Hive

HSQL 

bigdata-tut's People

Contributors

epicbinlee avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.