Giter Club home page Giter Club logo

simplifying-data-engineering-and-analytics-with-delta's Introduction

Simplifying Data Engineering and Analytics with Delta

Simplifying Data Engineering and Analytics with Delta

This is the code repository for Simplifying Data Engineering and Analytics with Delta, published by Packt.

Create analytics-ready data that fuels artificial intelligence and business intelligence

What is this book about?

Delta helps you generate reliable insights at scale and simplifies architecture around data pipelines, allowing you to focus primarily on refining the use cases being worked on. This is especially important when you consider that existing architecture is frequently reused for new use cases

This book covers the following exciting features:

  • Explore the key challenges of traditional data lakes
  • Appreciate the unique features of Delta that come out of the box
  • Address reliability, performance, and governance concerns using Delta
  • Analyze the open data format for an extensible and pluggable architecture
  • Handle multiple use cases to support BI, AI, streaming, and data discovery
  • Discover how common data and machine learning design patterns are executed on Delta
  • Build and deploy data and machine learning pipelines at scale using Delta

If you feel this book is for you, get your copy today!

https://www.packtpub.com/

Instructions and Navigations

All of the code is organized into folders.

The code will look like the following:

SELECT COUNT(*) FROM some _ parquet _ table

Following is what you need for this book: Data engineers, data scientists, ML practitioners, BI analysts, or anyone in the data domain working with big data will be able to put their knowledge to work with this practical guide to executing pipelines and supporting diverse use cases using the Delta protocol. Basic knowledge of SQL, Python programming, and Spark is required to get the most out of this book.

With the following software and hardware list you can run all code files present in the book (Chapter 1-13).

Software and Hardware List

Basic knowledge of SQL, Python programming, and Spark is required to get the most out of this book. Delta is open source and can be run both on-prem and in the cloud. Because of the rise in cloud data platforms, a lot of the descriptions and examples are in the context of cloud storage. Use the following GitHub link for the Delta Lake documentation and quickstart guide to help you set up your environment and become familiar with the necessary APIs: https://github.com/delta-io/delta. Databricks is the original creator of Delta, which was open sourced to the Linux Foundation and is supported by a large user community. Examples in this book cover some Databricks-specific features to provide a complete view of features and capabilities. Newer features continue to be ported from Databricks to open source Delta. Please refer to the proposed roadmap for the feature migration details: https://github.com/ delta-io/delta/issues/920.

We also provide a PDF file that has color images of the screenshots/diagrams used in this book. Click here to download it.

Related products

Get to Know the Author

Anindita Mahapatra is a lead solutions architect at Databricks in the data and AI space helping clients across all industry verticals reap value from their data infrastructure investments. She teaches a data engineering and analytics course at Harvard University as part of their extension school program. She has extensive big data and Hadoop consulting experience from Think Big/Teradata, prior to which she was managing the development of algorithmic app discovery and promotion for both Nokia and Microsoft stores. She holds a master’s degree in liberal arts and management from Harvard Extension School, a master’s in computer science from Boston University, and a bachelor’s in computer science from BITS Pilani, India.

Download a free PDF

If you have already purchased a print or Kindle version of this book, you can get a DRM-free PDF version at no cost.
Simply click on the link to claim your free PDF.

https://packt.link/free-ebook/9781801814867

simplifying-data-engineering-and-analytics-with-delta's People

Contributors

anindita-mahapatra avatar utkarsha-packt avatar packt-itservice avatar seanlobo00 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.