chiphuyen / dmls-book Goto Github PK

Summaries and resources for Designing Machine Learning Systems book (Chip Huyen, O'Reilly 2022)

Home Page: https://www.amazon.com/Designing-Machine-Learning-Systems-Production-Ready/dp/1098107969

dmls-book's Introduction

Designing Machine Learning Systems (Chip Huyen 2022)

Machine learning systems are both complex and unique. Complex because they consist of many different components and involve many different stakeholders. Unique because they're data dependent, with data varying wildly from one use case to the next. In this book, you'll learn a holistic approach to designing ML systems that are reliable, scalable, maintainable, and adaptive to changing environments and business requirements.

The book has been translated into Spanish, Japanese, Korean, Polish, and Thai.

The book is available on:

and most places where technical books are sold.

Repo structure

This book focuses on the key design decisions when developing and deploying machine learning systems. This is NOT a tutorial book, so it doesn't have a lot of code snippets. In this repo, you won't find code examples, but you'll find:

Contributions

You're welcome to create issues or submit pull requests. Your feedback is much appreciated!

Who This Book Is For

This book is for anyone who wants to leverage ML to solve real-world problems. ML in this book refers to both deep learning and classical algorithms, with a leaning toward ML systems at scale, such as those seen at medium to large enterprises and fast-growing startups. Systems at a smaller scale tend to be less complex and might benefit less from the comprehensive approach laid out in this book.

Because my background is engineering, the language of this book is geared toward engineers, including ML engineers, data scientists, data engineers, ML platform engineers, and engineering managers.

You might be able to relate to one of the following scenarios:

You have been given a business problem and a lot of raw data. You want to engineer this data and choose the right metrics to solve this problem.
Your initial models perform well in offline experiments and you want to deploy them.
You have little feedback on how your models are performing after your models are deployed, and you want to figure out a way to quickly detect, debug, and address any issue your models might run into in production.
The process of developing, evaluating, deploying, and updating models for your team has been mostly manual, slow, and error-prone. You want to automate and improve this process.
Each ML use case in your organization has been deployed using its own workflow, and you want to lay down the foundation (e.g., model store, feature store, monitoring tools) that can be shared and reused across use cases.
You’re worried that there might be biases in your ML systems and you want to make your systems responsible!

You can also benefit from the book if you belong to one of the following groups:

Tool developers who want to identify underserved areas in ML production and figure out how to position your tools in the ecosystem.
Individuals looking for ML-related roles in the industry.
Technical and business leaders who are considering adopting ML solutions to improve your products and/or business processes. Readers without strong technical backgrounds might benefit the most from Chapters 1, 2, and 11.

Review

"This is, simply, the very best book you can read about how to build, deploy, and scale machine learning models at a company for maximum impact. Chip is a masterful teacher, and the breadth and depth of her knowledge is unparalleled." - Josh Wills, Software Engineer at WeaveGrid and former Director of Data Engineering, Slack
"There is so much information one needs to know to be an effective machine learning engineer. It's hard to cut through the chaff to get the most relevant information, but Chip has done that admirably with this book. If you are serious about ML in production, and care about how to design and implement ML systems end to end, this book is essential." - Laurence Moroney, AI and ML Lead, Google
"One of the best resources that focuses on the first principles behind designing ML systems for production. A must-read to navigate the ephemeral landscape of tooling and platform options." - Goku Mohandas, Founder of Made With ML

See what people are talking about the book on Twitter @designmlsys!

Chip Huyen, Designing Machine Learning Systems. O'Reilly Media, 2022.

@book{dmlsbook2022,  
    address = {USA},  
    author = {Chip Huyen},  
    isbn = {978-1801819312},   
    publisher = {O'Reilly Media},  
    title = {{Designing Machine Learning Systems}},  
    year = {2022}  
}

dmls-book's People

Contributors

Stargazers

Watchers

Forkers

enipu koutianqi ricable hongthana chuongloc paulwbailey amiirhaamzah naisofly jieunjeon emjayahn ailabteam prashant-bhar8waj sudonorm tpnguyen hathubkhn pierrenowi dfangshuo roodk timothywei tmaritz shyamal-anadkat moggirain saeedseyyedi ahlag benjaminmcf manishbhat5 atheeri priyankaiiit14 dharitshah13 airbots taneron jvhuang1786 sougata09 deepataiml statdataanalyzer rudrendupaul nsu1210 ysq151944 clshu alibakh62 karan-s-mittal rmbusch ttb-git leocorelli abhirajdas rahul-tuladhar-gt rahul-tuladhar chen3082 ujwal-deep jazir frederick0291 outlierd29 josechudev liliya2022 huibaobao nicole-hong pchandra90 atanuguin mmuniyasamy unography cajumago zmzlois sbmalik lehoanglam20000 soymintc ikj1992 rajan1994 yingchenliu98 shaonc yinpuli yqz5514 kyuuube mekongdelta-mind yiyichanmyae shleshapandey01 richachoudhary amart85 computescience prabz virartaza sabaiitj ashishpatel26 ethan-eplee jbadhree s-alirezasadeghi agarwalvaibhav mardoukhicapitalmanagement jingxianlin kian-kd hailinhbui10 italosayan alishakiba chaosmallpdf bohoro emfuzzylogic ycai4591679 restevesd tsinik-dw pablojmoreno pankajkumar002

dmls-book's Issues

Typo in chapter 7, page 205 of paper back

The last sentence on page 205 "Some companies use feature stores to ensure the consistency between the batch features used during training and the streaming features used in prediction".
Is it supposed to be "... used in production" instead?

Incorrect ISBN in README

The ISBN in README is incorrect, 978-1801819312 is for Machine Learning with PyTorch and Scikit-Learn: Develop machine learning and deep learning models with Python

The correct ISBN is 978-1-09-810796-3

Incorrect statement about the median on page 18

If the median is 100 ms, half of the requests take longer than 100 ms, and half of the requests take less than 100 ms.

This formulation implies a strict lower/upper bound, which the median need not be.
Just consider a sequence where every single entry is uniformly 100 ms.
The statement is easily corrected to reflect the intended meaning by inserting 2 negations.

If the median is 100 ms, half of the requests take no longer than 100 ms, and half of the requests take no less than 100 ms.

Model parallelism mention in data parallelism section of Chapter 6

Hello,

Thank you for the very informative book and all your amazing work!

I was going through chapter 6, page 170 (of the paperback version), paragraph 2. I think that the sentence should mention data in the first sentence instead of model.
In other words, should it be "Another problem is that spreading your data on multiple machines can cause your batch size to be very big." instead of "Another problem is that spreading your model on multiple machines can cause your batch size to be very big."?

How to find links to the resources that are cited?

Hi,

Could someone help:

For instance page 26 - Business and ML Objectives, Which post of Eugene Yan is it? Where can I find the list of all of the resources?

Cheers

Incorrect author for spotify case study

I spotted what seems to be a minor mistake on the collection of case studies at https://github.com/chiphuyen/dmls-book/blob/main/resources.md

The author of the spotify case study is Umesh .A Bhat.

Spotify’s Discover Weekly: How machine learning finds your new music (Umesh .A Bhat, 2017)

But when opening the blog article, the author is Sophia Ciocca

Graph on page 219 has kernel sizes wrong.

Hi Chip and team, thank you for the excellent book on ML systems design. I was going through the book and found that there is a mistake (typo) in the Graph optimization figure on page 219. The original graph, and the optimized graphs have different kernel sizes which is incorrect. I checked the cited reference link here at nvidia and indeed the original graph has a mistake on the kernel sizes. Two of the 3x3 kernels taking input from the input node must be 1x1 and two of the three 3x3 kernels in the next layer must be 5x5 and 1x1. Because this is an optimization, I thought the kernel sizes are also being optimized but after thinking about it for a while, I figured the weight dimensions will change and the trained weights will not work if that is the case.

Just wanted to make a note here in case somebody else also notices this and also if you would like to fix it in the future editions. Thank you for the amazing work once again.