rasbt / machine-learning-book Goto Github PK

View Code? Open in Web Editor NEW

2.9K 47.0 1.0K 196.15 MB

Code Repository for Machine Learning with PyTorch and Scikit-Learn

Home Page: https://sebastianraschka.com/books/#machine-learning-with-pytorch-and-scikit-learn

License: MIT License

Python 0.43% Jupyter Notebook 99.57% Shell 0.01% Makefile 0.01%

machine-learning scikit-learn deep-learning neural-networks pytorch

machine-learning-book's Introduction

Machine Learning with PyTorch and Scikit-Learn Book

Code Repository

Paperback: 770 pages
Publisher: Packt Publishing
Language: English

ISBN-10: 1801819319
ISBN-13: 978-1801819312
Kindle ASIN: B09NW48MR1

Table of Contents and Code Notebooks

Helpful installation and setup instructions can be found in the README.md file of Chapter 1.

In addition, Zbynek Bazanowski contributed this helpful guide explaining how to run the code examples on Google Colab.

Please note that these are just the code examples accompanying the book, which we uploaded for your convenience; be aware that these notebooks may not be useful without the formulae and descriptive text.

Machine Learning - Giving Computers the Ability to Learn from Data [open dir]
Training Machine Learning Algorithms for Classification [open dir]
A Tour of Machine Learning Classifiers Using Scikit-Learn [open dir]
Building Good Training Sets – Data Pre-Processing [open dir]
Compressing Data via Dimensionality Reduction [open dir]
Learning Best Practices for Model Evaluation and Hyperparameter Optimization [open dir]
Combining Different Models for Ensemble Learning [open dir]
Applying Machine Learning to Sentiment Analysis [open dir]
Predicting Continuous Target Variables with Regression Analysis [open dir]
Working with Unlabeled Data – Clustering Analysis [open dir]
Implementing a Multi-layer Artificial Neural Network from Scratch [open dir]
Parallelizing Neural Network Training with PyTorch [open dir]
Going Deeper -- The Mechanics of PyTorch [open dir]
Classifying Images with Deep Convolutional Neural Networks [open dir]
Modeling Sequential Data Using Recurrent Neural Networks [open dir]
Transformers -- Improving Natural Language Processing with Attention Mechanisms [open dir]
Generative Adversarial Networks for Synthesizing New Data [open dir]
Graph Neural Networks for Capturing Dependencies in Graph Structured Data [open dir]
Reinforcement Learning for Decision Making in Complex Environments [open dir]

Sebastian Raschka, Yuxi (Hayden) Liu, and Vahid Mirjalili. Machine Learning with PyTorch and Scikit-Learn. Packt Publishing, 2022.

@book{mlbook2022,  
address = {Birmingham, UK},  
author = {Sebastian Raschka, and Yuxi (Hayden) Liu, and Vahid Mirjalili},  
isbn = {978-1801819312},   
publisher = {Packt Publishing},  
title = {{Machine Learning with PyTorch and Scikit-Learn}},  
year = {2022}  
}

Coding Environment

Please see the ch01/README.md file for setup recommendations.

Translations into other Languages

Serbian Translation: Mašinsko učenje uz PyTorch i Scikit-Learn. ISBN: 9788673105772

machine-learning-book's People

Contributors

Stargazers

Watchers

Forkers

bupt-yxy snowdj wangvei allensmile a3sha2 jaidip1994 lqcata liujdincs ali-unlu scholargj arvind-maurya popov-maksim ekene966 bigdatasciencegroup htpeter elfits fbagire cchu92 fratenuta mekongdelta-mind abchamp ahmd-nish dkwakye vnpraveenb hadryan gadamico ccollado7 yunasung topmost2020 rambasnet dylanmervis zhmxu indrikwijaya restevesd architecturebase shrejais 19bcs4325-hardik bgkyer thanveerjilani mbrukman acaouk monster-007 asunyliu heitorrapela stonedcoder sanmayphy mccostic wecepedaa arsalan-sorayaei napo178 mmejdoubi mehmet-sari hackathorn lewieyasu orielbanne ngnamz geeklurnai faisal-alsrheed foluwa smartmaterialssfedu zephyr1022 paulagual yohannes-didana ericgarza70 bigdata-memory sriharsha0806 kercker recohut sine-quanon hiimanshusherawat kishimi8 matmalone girijeshcse fathelrhman123 stjordanis micseb insad chengning-zhang 3dalgolab amitmeel pablosilvaa huning2009 amitshenoymax learn-zone nushakoff frankyesid marcelomata edwardcodes sarvex wanaxe paintingpeter jasonhertzog dennis-lynch eladoz10 fasladodo nbrrawal aakgun naiborhujosua flashypepo guohaoqiang

machine-learning-book's Issues

Some queries regarding the matrix & vector dimensions in ch11

Hello,
page 340, paragraph 2 ... will be written as 𝑤𝑗,k(𝑙) .... I think it should be will be written as 𝑤𝑗,k(𝑙+1)
page 342, paragraph 3 ... Here, z(h) is our 1×m dimensional feature vector ... I think the correct one should be ...Here, x(in) is our 1×m dimensional feature vector
page 342, last paragraph .... Z(out). (The columns in this matrix represent the outputs for each sample.)" ... I think it should be ....Z(out). (The rows in this matrix ......
page 354, in "compute_mse_and_acc", should be mse = mse / (i+1) not mse / i

Different code between book and notebook for NN implementation

## code in notebook
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split 

iris = load_iris()
X = iris['data']
y = iris['target']
 
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=1./3, random_state=1)

from torch.utils.data import TensorDataset
from torch.utils.data import DataLoader
import numpy as np 
import torch
X_train_norm = (X_train - np.mean(X_train)) / np.std(X_train)
X_train_norm = torch.from_numpy(X_train_norm).float()
y_train = torch.from_numpy(y_train) 

train_ds = TensorDataset(X_train_norm, y_train)

torch.manual_seed(1)
batch_size = 2
train_dl = DataLoader(train_ds, batch_size, shuffle=True)

import torch.nn as nn
class Model(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(Model, self).__init__()
        self.layer1 = nn.Linear(input_size, hidden_size)  
        self.layer2 = nn.Linear(hidden_size, output_size)  

    def forward(self, x):
        x = self.layer1(x)
        x = nn.Sigmoid()(x)
        x = self.layer2(x)
        x = nn.Softmax(dim=1)(x)
        return x
    
input_size = X_train_norm.shape[1]
hidden_size = 16
output_size = 3
 
model = Model(input_size, hidden_size, output_size)

learning_rate = 0.001

loss_fn = nn.CrossEntropyLoss()
 
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

num_epochs = 100
loss_hist = [0] * num_epochs
accuracy_hist = [0] * num_epochs

for epoch in range(num_epochs):

    for x_batch, y_batch in train_dl:
        pred = model(x_batch)
        loss = loss_fn(pred, y_batch)
        loss.backward()
        optimizer.step()
        optimizer.zero_grad()
    
        loss_hist[epoch] += loss.item()*y_batch.size(0)
        is_correct = (torch.argmax(pred, dim=1) == y_batch).float()
        accuracy_hist[epoch] += is_correct.sum()
        
    loss_hist[epoch] /= len(train_dl.dataset)
    accuracy_hist[epoch] /= len(train_dl.dataset)
import matplotlib.pyplot as plt 
fig = plt.figure(figsize=(12, 5))
ax = fig.add_subplot(1, 2, 1)
ax.plot(loss_hist, lw=3)
ax.set_title('Training loss', size=15)
ax.set_xlabel('Epoch', size=15)
ax.tick_params(axis='both', which='major', labelsize=15)

ax = fig.add_subplot(1, 2, 2)
ax.plot(accuracy_hist, lw=3)
ax.set_title('Training accuracy', size=15)
ax.set_xlabel('Epoch', size=15)
ax.tick_params(axis='both', which='major', labelsize=15)
plt.tight_layout()

#plt.savefig('figures/12_09.pdf')
 
plt.show()

## code in book
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split 

iris = load_iris()
X = iris['data']
y = iris['target']
 
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=1./3, random_state=1)


from torch.utils.data import TensorDataset
from torch.utils.data import DataLoader

X_train_norm = (X_train - np.mean(X_train)) / np.std(X_train)
X_train_norm = torch.from_numpy(X_train_norm).float()
y_train = torch.from_numpy(y_train) 

train_ds = TensorDataset(X_train_norm, y_train)

torch.manual_seed(1)
batch_size = 2
train_dl = DataLoader(train_ds, batch_size, shuffle=True)

class Model(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
       ## in book without Model,self but i added
        super().__init__()
        self.layer1 = nn.Linear(input_size, hidden_size)  
        self.layer2 = nn.Linear(hidden_size, output_size)  

    def forward(self, x):
        x = self.layer1(x)
        x = nn.Sigmoid()(x)
        x = self.layer2(x)
        x = nn.Softmax(dim=1)(x)
        return x
    
input_size = X_train_norm.shape[1]
hidden_size = 16
output_size = 3
 
model = Model(input_size, hidden_size, output_size)

learning_rate = 0.001

loss_fn = nn.CrossEntropyLoss()
 
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

num_epochs = 100
loss_hist = [0] * num_epochs
accuracy_hist = [0] * num_epochs
## got error here
for epoch in range(num_epochs):

    for x_batch, y_batch in train_dl:
        pred = model(x_batch)
        loss = loss_fn(pred, y_batch)
        loss.backward()
        optimizer.step()
        optimizer.zero_grad()
    
        loss_hist[epoch] += loss.item()*y_batch.size(0)
        is_correct = (torch.argmax(pred, dim=1) == y_batch).float()
      ## change the mean() from the code in the book but still doesnt work
        accuracy_hist[epoch] += is_correct.sum()
        
    loss_hist[epoch] /= len(train_dl.dataset)
    accuracy_hist[epoch] /= len(train_dl.dataset)


fig = plt.figure(figsize=(12, 5))
ax = fig.add_subplot(1, 2, 1)
ax.plot(loss_hist, lw=3)
ax.set_title('Training loss', size=15)
ax.set_xlabel('Epoch', size=15)
ax.tick_params(axis='both', which='major', labelsize=15)

ax = fig.add_subplot(1, 2, 2)
ax.plot(accuracy_hist, lw=3)
ax.set_title('Training accuracy', size=15)
ax.set_xlabel('Epoch', size=15)
ax.tick_params(axis='both', which='major', labelsize=15)
plt.tight_layout()

#plt.savefig('figures/12_09.pdf')
 
plt.show()

## Note: I wrote the code from the local notebook step by step but got this error. However, the code works while running in the notebook  on google colab. Is it due to python version?--------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-145-4bceac91f560> in <module>
      7     for x_batch, y_batch in train_dl:
      8         pred = model(x_batch)
----> 9         loss = loss_fn(pred, y_batch)
     10         loss.backward()
     11         optimizer.step()

~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\torch\nn\modules\module.py in _call_impl(self, *input, **kwargs)
   1108         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1109                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1110             return forward_call(*input, **kwargs)
   1111         # Do not call functions when jit is used
   1112         full_backward_hooks, non_full_backward_hooks = [], []

~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\torch\nn\modules\loss.py in forward(self, input, target)
   1161 
   1162     def forward(self, input: Tensor, target: Tensor) -> Tensor:
-> 1163         return F.cross_entropy(input, target, weight=self.weight,
   1164                                ignore_index=self.ignore_index, reduction=self.reduction,
   1165                                label_smoothing=self.label_smoothing)

~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\torch\nn\functional.py in cross_entropy(input, target, weight, size_average, ignore_index, reduce, reduction, label_smoothing)
   2994     if size_average is not None or reduce is not None:
   2995         reduction = _Reduction.legacy_get_string(size_average, reduce)
-> 2996     return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index, label_smoothing)
   2997 
   2998 

RuntimeError: expected scalar type Long but found Int

Running on Colab requires another import

When running on Colab, notebook 15.2 requires a pipinstall torchdata in addition to the pip install torchtext in otder to import the IMDB dataset.

chap12.2 notebook iris data nn train

Hi,
In chapter 12, using iris data train model using torch.nn raise such errors:
RuntimeError: expected scalar type Long but found Int

ch13 pag 438 no softmax needed

Hello, kindly clarify this: in the nn.Sequential model, there's no softmax at the end, because we use the cross-entropy loss, which probably doesn't need that, because it is equivalent to the combination of LogSoftmax and NLLLoss? Yet the text below says that the output layer is activated by the softmax.

https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html#torch.nn.CrossEntropyLoss Note

Training error for DistilBERT (pages 575-582)

Hi Sebastian,

I want to share my findings about the error I encountered and how I solved it while trying to fine-tune DistilBert for sentiment classification (pages 575-582), and it will be great to get feedback do I understand and resolve the problem correctly.

First of all, I installed required packages including transformers 4.9.1.

After that, I followed the code from ch16-part3-bert.ipynb, with one little modification - because I don't have internet on the server with GPU, I downloaded model and tokenizer manually with additional required files and use it to load model/tokenizer.

When I ran this code:

train_encodings = tokenizer(list(train_texts), truncation=True, padding=True)
valid_encodings = tokenizer(list(valid_texts), truncation=True, padding=True)
test_encodings = tokenizer(list(test_texts), truncation=True, padding=True)

I saw the following warning:

Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.

And the sample encoding has the following attributes:

train_encodings[0]

Encoding(num_tokens=3157, attributes=[ids, type_ids, tokens, offsets, attention_mask, special_tokens_mask, overflowing])

I decided to follow the next cells and when I ran training loop using device='cuda', I saw the following error:

...
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

I switched to device='cpu' to see more detailed description of the error, and got the following error:

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
Input In [40], in <cell line: 3>()
     12 labels = batch['labels'].to(DEVICE)
     14 ### Forward
---> 15 outputs = model(input_ids, attention_mask=attention_mask, labels=labels)
     16 loss, logits = outputs['loss'], outputs['logits']
     18 ### Backward
 
File ~/venvs/machine_learning_book/lib/python3.8/site-packages/torch/nn/modules/module.py:1102, in Module._call_impl(self, *input, **kwargs)
   1098 # If we don't have any hooks, we want to skip the rest of the logic in
   1099 # this function, and just call forward.
   1100 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1101         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1102     return forward_call(*input, **kwargs)
   1103 # Do not call functions when jit is used
   1104 full_backward_hooks, non_full_backward_hooks = [], []
 
File ~/venvs/machine_learning_book/lib/python3.8/site-packages/transformers/models/distilbert/modeling_distilbert.py:625, in DistilBertForSequenceClassification.forward(self, input_ids, attention_mask, head_mask, inputs_embeds, labels, output_attentions, output_hidden_states, return_dict)
    617 r"""
    618 labels (:obj:`torch.LongTensor` of shape :obj:`(batch_size,)`, `optional`):
    619     Labels for computing the sequence classification/regression loss. Indices should be in :obj:`[0, ...,
    620     config.num_labels - 1]`. If :obj:`config.num_labels == 1` a regression loss is computed (Mean-Square loss),
    621     If :obj:`config.num_labels > 1` a classification loss is computed (Cross-Entropy).
    622 """
    623 return_dict = return_dict if return_dict is not None else self.config.use_return_dict
--> 625 distilbert_output = self.distilbert(
    626     input_ids=input_ids,
    627     attention_mask=attention_mask,
    628     head_mask=head_mask,
    629     inputs_embeds=inputs_embeds,
    630     output_attentions=output_attentions,
    631     output_hidden_states=output_hidden_states,
    632     return_dict=return_dict,
    633 )
    634 hidden_state = distilbert_output[0]  # (bs, seq_len, dim)
    635 pooled_output = hidden_state[:, 0]  # (bs, dim)
 
File ~/venvs/machine_learning_book/lib/python3.8/site-packages/torch/nn/modules/module.py:1102, in Module._call_impl(self, *input, **kwargs)
   1098 # If we don't have any hooks, we want to skip the rest of the logic in
   1099 # this function, and just call forward.
   1100 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1101         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1102     return forward_call(*input, **kwargs)
   1103 # Do not call functions when jit is used
   1104 full_backward_hooks, non_full_backward_hooks = [], []
 
File ~/venvs/machine_learning_book/lib/python3.8/site-packages/transformers/models/distilbert/modeling_distilbert.py:488, in DistilBertModel.forward(self, input_ids, attention_mask, head_mask, inputs_embeds, output_attentions, output_hidden_states, return_dict)
    485 head_mask = self.get_head_mask(head_mask, self.config.num_hidden_layers)
    487 if inputs_embeds is None:
--> 488     inputs_embeds = self.embeddings(input_ids)  # (bs, seq_length, dim)
    489 return self.transformer(
    490     x=inputs_embeds,
    491     attn_mask=attention_mask,
   (...)
    495     return_dict=return_dict,
    496 )
 
File ~/venvs/machine_learning_book/lib/python3.8/site-packages/torch/nn/modules/module.py:1102, in Module._call_impl(self, *input, **kwargs)
   1098 # If we don't have any hooks, we want to skip the rest of the logic in
   1099 # this function, and just call forward.
   1100 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1101         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1102     return forward_call(*input, **kwargs)
   1103 # Do not call functions when jit is used
   1104 full_backward_hooks, non_full_backward_hooks = [], []
 
File ~/venvs/machine_learning_book/lib/python3.8/site-packages/transformers/models/distilbert/modeling_distilbert.py:118, in Embeddings.forward(self, input_ids)
    115 position_ids = position_ids.unsqueeze(0).expand_as(input_ids)  # (bs, max_seq_length)
   117 word_embeddings = self.word_embeddings(input_ids)  # (bs, max_seq_length, dim)
--> 118 position_embeddings = self.position_embeddings(position_ids)  # (bs, max_seq_length, dim)
    120 embeddings = word_embeddings + position_embeddings  # (bs, max_seq_length, dim)
    121 embeddings = self.LayerNorm(embeddings)  # (bs, max_seq_length, dim)
 
File ~/venvs/machine_learning_book/lib/python3.8/site-packages/torch/nn/modules/module.py:1102, in Module._call_impl(self, *input, **kwargs)
   1098 # If we don't have any hooks, we want to skip the rest of the logic in
   1099 # this function, and just call forward.
   1100 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1101         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1102     return forward_call(*input, **kwargs)
   1103 # Do not call functions when jit is used
   1104 full_backward_hooks, non_full_backward_hooks = [], []
 
File ~/venvs/machine_learning_book/lib/python3.8/site-packages/torch/nn/modules/sparse.py:158, in Embedding.forward(self, input)
    157 def forward(self, input: Tensor) -> Tensor:
--> 158     return F.embedding(
    159         input, self.weight, self.padding_idx, self.max_norm,
    160         self.norm_type, self.scale_grad_by_freq, self.sparse)
 
File ~/venvs/machine_learning_book/lib/python3.8/site-packages/torch/nn/functional.py:2044, in embedding(input, weight, padding_idx, max_norm, norm_type, scale_grad_by_freq, sparse)
   2038     # Note [embedding_renorm set_grad_enabled]
   2039     # XXX: equivalent to
   2040     # with torch.no_grad():
   2041     #   torch.embedding_renorm_
   2042     # remove once script supports set_grad_enabled
   2043     _no_grad_embedding_renorm_(weight, input, max_norm, norm_type)
-> 2044 return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
 
IndexError: index out of range in self

It seems that somehow embedding lookup was not correct, so I decided to check embedding dimensions in our pre-trained model.

DistilBertForSequenceClassification(
  (distilbert): DistilBertModel(
    (embeddings): Embeddings(
      (word_embeddings): Embedding(30522, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
...

Model and tokenizer vocabulary sizes are equal to 30522, so I decided to return to the first warning of the tokenizer and follow the recommendation to specify max_length argument for the tokenizer.
I supposed that the problem with the positional embedding, because we have the samples with the length greater than number of rows in positional embedding (512).
We can see it above based on train_encodings[0] example (num_tokens=3157).
So I changed the code for tokenization:

train_encodings = tokenizer(list(train_texts),  max_length=512, truncation=True, padding=True)
valid_encodings = tokenizer(list(valid_texts),  max_length=512, truncation=True, padding=True)
test_encodings = tokenizer(list(test_texts),  max_length=512, truncation=True, padding=True)

And after that the training loop was completed successfully.

Thank you.

Visualizing Transformer based on your notebook

Dear Prof. Sebastian Raschka, I published a blog with an accompanying 3D interactive website based on your published notebook, to visualize the inner working of Transformer, hope you can check it out!

Partial derivatives notation (page 366)

Hi Sebastian,

Probably there are some inconsistencies in partial derivative formulas on page 366:

On the picture with NN, there are 2 duplicated notations near the node $a_{2}^{(out)}$ - both have the same value $\partial{L}/\partial{a_{2}^{(out)}}$ .
It seems that left notation will be equal to $\partial{a_{2}^{(out)}}/\partial{a_{1}^{(h)}}$ ?
Subsequent 2 equations for the partial derivatives of the loss with respect to the first hidden layer weight have the left side equal to $\partial{L}/\partial{w_{1,1}^{(out)}}$ .
Perhaps it is equal to $\partial{L}/\partial{w_{1,1}^{(h)}}$ ?

Thank you.

Ch8 improving code to reprocessing the movie dataset into more convenienct format

When running the code on jupyter notebooks there were 2 isses:

the status bar didn't show (this is fixed by setting the stream to 2 in ProgBar
you get a warning the "append" is deprecated and we should transition to "concat"

The code below fixes both issues. Let me know If you want me to make a pull request :)



import pyprind
import pandas as pd
import os
import sys

# change the `basepath` to the directory of the
# unzipped movie dataset

basepath = 'aclImdb'

labels = {'pos': 1, 'neg': 0}
#pbar = pyprind.ProgBar(50000, stream=sys.stdout)
pbar = pyprind.ProgBar(50000, stream=2)
df = pd.DataFrame()
for s in ('test', 'train'):
    for l in ('pos', 'neg'):
        path = os.path.join(basepath, s, l)
        for file in sorted(os.listdir(path)):
            with open(os.path.join(path, file), 
                      'r', encoding='utf-8') as infile:
                txt = infile.read()
            x = pd.DataFrame([[txt, labels[l]]], columns=['review', 'sentiment'])
            df = pd.concat([df,x], ignore_index=True)
            pbar.update()
#df.columns = ['review', 'sentiment']

one line conda environment

On p. 17 in Chapter 1 I see,

conda create -n pyml python=3.9
conda activate pyml

Those not familiar with environments in python might try the above command, and would discover they have dependency issues, so would attempt

conda install numpy
conda install scipy
conda install matplotlib
conda install pandas
conda install scikit-learn

In order to ensure the user is using the correct libraries in the absence of a .yml file you can specify the library version when creating the initial environment.

$ conda create -n "pyml" python=3.9 numpy=1.21.2 scipy=1.7.0 scikit-learn=1.0 matplotlib=3.4.3 pandas=1.3.2

Maybe include a couple optional commands for installation of jupyter, etc. I think leaving jupyter out was intentional due to includign .ipynb and .py files, but would be nice to have a reference command for how to install.

conda install jupyter

PyTorch installation is covered on p. 372 but only for pip.

conda install -c pytorch pytorch

Make compute_mse_and_acc func on pg 354 more elegant

actually, I am not sure why I wasn't computing it with

loss = np.sum((onehot_targets - probas)**2)

(instead of mean) in the loop and then using

mse = mse/num_examples

instead of

mse = mse/i

Let me make a note to update this!

Typing error ( page 199)

current: the interp function that we imported from SciPy

correction: the interp function that we imported from numpy

Smile vs Gender classification (notebook ch14_part2.ipynb)

Hi Sebastian,

There is a description about gender classification in notebook ch14_part2.ipynb:

Gender classification from face images using CNN
Training a CNN gender classifier

But the code itself and the book section is about smile classification.

Thank you.

Embedding matrix dimension (page 519)

Hi Sebastian,

There is the following statement on the page 519:
"The output will have the dimensionality batchsize × input_length × embedding_dim, where embedding_
dim is the size of the embedding features (here, set to 3). The other argument provided to
the embedding layer, num_embeddings, corresponds to the unique integer values that the model will
receive as input (for instance, n + 2, set here to 10). Therefore, the embedding matrix in this case has
the size 10×6."

Based on these conclusions, probably there is a typo in the last sentence and embedding matrix dimension is 10x3?

Thank you.

The optimal value of `C` (page 185)

The pages says that the sweet spot is between C =0.01 and C=0.1 but as indicated in the preceding graph the max test score occurs at C=1.

ch4 pag 120

The formula at the bottom
should have the (i) index also for the letters mu_x and sigma_x.

x_std(i) = x(i) - mu_x(i) / sigma(x)(i)

Possible error in ch14_part2.ipynb of GitHub

There is a code 'get_smile = lambda attr: attr[18]' in In[6] cell.
It should be 'get_smile = lambda attr: attr[31]' according to the 'list_attr_celeba' text document in the celeba file.

LogSoftmax in the output but not in the description/code (page 532)

Hi Sebastian,

There is an output of created RNN model which includes log softmax as the last layer on the page 532:

(softmax): LogSoftmax(dim=1)

But based on the code of the model and on the following steps we do not need this layer because we use nn.CrossEntropyLoss() where the input is expected to contain raw, unnormalized scores for each class.
Is it correct?

Thank you.

JointDataset vs TensorDataset (page 380)

Hi Sebastian,

There is a line of code which is specified 2 times on the page 380:
joint_dataset = JointDataset(t_x, t_y)

Probably the second occurrence could be about TensorDataset, based on text description:
joint_dataset = TensorDataset(t_x, t_y)

Thank you.

Last convolutional layer's kernel dimension in discriminator network (page 617)

Hi Sebastian,

There is the description of the discriminator network for DCGAN on the page 617:
"The last convolutional layer uses kernels of size 7×7 and a single filter to
reduce the spatial dimensionality of the output to 1×1×1."

Based on the picture of the discriminator, it seems that the last convolutional layer uses kernel of size 4×4.
Is it correct?

Thank you.

Add a note about loading distilbert from a local file

Add an optional note to the notebook in case people want to / have to load the model from a local directory. See #42 for details.

RANSAC MAD definition/calculation (pages 286-287)

Hi Sebastian,

First of all, thank you for your brilliant work (books, videos, articles etc.)!

I’ve found that there is RANSAC description on the page 286 and MAD is mentioned as median absolute deviation (which corresponds to algorithm's description in sklearn's documentation).
However the code for MAD computation on the page 287 uses mean absolute deviation (function mean_absolute_deviation).
Is it correct?

Thank you.

Softmax layer (page 437)

Here you have not defined the softmax layer while building the Sequential model.

Is is taken by default when we choose loss to be:nn.CrossEntropyLoss()?

fourth edition of python machine learning?

Hi,
First of all, thanks for your share with code, I'm very like your python machine learing series.

I have a question:
Is this book considered the fourth edition of python machine learning? Except tensorflow is replaced by pytorch.

Thanks,
Zhe

P.291 first paragraph

In original text of the first paragraph in p. 291

We can see that the MSE on the training dataset is larger than on the test set, which is an indicator that our model is slightly overfitting the training data in this case.

should be corrected as follows:

We can see that the MSE on the training dataset is less than on the test set, which is an indicator that our model is slightly overfitting the training data in this case.

CH 13: Changing the order of the cells gives different results. Pg 420 - 422

In the Solving an XOR classification problem section, the author defined the model, then defined the loss function and the optimizer, then created the data loader. Finally, he defined the training function followed by plotting the results.
If I follow this same sequence, I get this figure

This figure is different from what is shown in the book!

However, in the notebook, the author defined the data loader, then the model, then the loss function and the optimizer, followed by the training and plotting procedures. That it, he defined the data loader first instead of being before the training procedure.

Can anyone please explain why changing the order of the cells causes such error?

chapter 11, pag 361

Why there is an equal sign?

incomplete code in book vs notebook about solving XoR classification that made error

 ## code in notebook 
torch.manual_seed(1)
num_epochs =200
def train(model,num_epochs,train_dl,x_valid,y_valid):
    loss_hist_train =[0]*num_epochs
    accuracy_hist_train = [0]* num_epochs
    loss_hist_valid = [0]*num_epochs
    accuracy_hist_valid = [0]* num_epochs
    for epoch in range(num_epochs):
        for x_batch,y_batch in train_dl:
            pred = model(x_batch)[:,0]
            loss = loss_fn(pred,y_batch)
            loss.backward()
            optimizer.step()
            optimizer.zero_grad()
            loss_hist_train[epoch] +=loss.item()
            is_correct = ((pred >=0.5).float()==y_batch).float()
            accuracy_hist_train[epoch] +=is_correct.mean()
        loss_hist_train[epoch] /= n_train/batch_size
        accuracy_hist_train[epoch] /=n_train/batch_size
        pred = model(x_valid)[:,0]
        loss = loss_fn(pred,y_valid) 
        loss_hist_valid[epoch] = loss.item()
        is_correct = ((pred >=0.5).float()==y_valid).float()
        accuracy_hist_valid[epoch] +=is_correct.mean()
    return loss_hist_train,loss_hist_valid,accuracy_hist_train,accuracy_hist_valid

history = train(model,num_epochs,train_dl,x_valid,y_valid)

torch.manual_seed(1)
num_epochs =200
def train(model,num_epochs,train_dl,x_valid,y_valid):
    loss_hist_train =[0]*num_epochs
    accuracy_hist_train = [0]* num_epochs
    loss_hist_valid = [0]*num_epochs
    accuracy_hist_valid = [0]* num_epochs
    for epoch in range(num_epochs):
        for x_batch,y_batch in train_dl:
            pred = model(x_batch)[:,0]
            loss = loss_fn(pred,y_batch)
            loss.backward()
            optimizer.step()
            optimizer.zero_grad()
            loss_hist_train[epoch] +=loss.item()
            is_correct = ((pred >=0.5).float()==y_batch).float()
            accuracy_hist_train[epoch] +=is_correct.mean()
       ##Typo here without batch_size
        loss_hist_train[epoch] /= n_train
        accuracy_hist_train[epoch] /=n_train/batch_size
        pred = model(x_valid)[:,0]
        loss = loss_fn(pred,y_valid) 
        loss_hist_valid[epoch] = loss.item()
        is_correct = ((pred >=0.5).float()==y_valid).float()
        accuracy_hist_valid[epoch] +=is_correct.mean()
    return loss_hist_train,loss_hist_valid,accuracy_hist_train,accuracy_hist_valid

history = train(model,num_epochs,train_dl,x_valid,y_valid)

Chapter 15 code typos.

I ran across another couple of typos in the book. The Jupyter notebook with the data is correct. Chapter 15, page 508, 2nd code block, line 12 is missing the closing bracket. Chapter 15, page 508, 2nd code block, line 11 should be b_xh rather than b_hh.

Multilabel vs multiclass classification (page 400)

Hi Sebastian,

There is the following statement on the page 400:
"Before we discuss what a hyperbolic tangent looks like, let’s briefly recapitulate some of the basics of
the logistic function and look at a generalization that makes it more useful for multilabel classification
problems."

The generalization that is mentioned after this section is about multiclass classification - did you mean multiclass, not multilabel classification and softmax function on this page?

Thank you.

Chapter 15 minor typo

Loving the book so far! Just found a small typo on page 506:

The $W_{hy}$ term in the final output equation should be $W_{ho}$ I think.

Typing error (page 472)

print(f'CCE (w Probas): ' #should be w Logits here
f'{cce_logits_loss_fn(logits, target):.4f}')
CCE (w Probas): 0.5996
print(f'CCE (w Logits): ' #should be w Probas here
f'{cce_loss_fn(torch.log(probas), target):.4f}')
CCE (w Logits): 0.5996

Erratum page 24

"It is important to note that the convergence of the perceptron is only guaranteed if the two classes
are linearly separable, which means that the two classes cannot be perfectly separated by a linear
decision boundary."
s/cannot/can/

I'm a fan of previous editions -- looking forward to the new chapters on Pytorch.

KL divergence formula for discrete distributions (page 626)

Hi Sebastian,

There are 2 formulas of KL divergence for discrete distributions on the page 626, and it seems the second formula has missing log transformation for the second multiplier (instead of P(x)/Q(x) it must be equal to log(P(x) / Q(x)).

Thank you.

P.292 Proof of R^2 is rescaled version of the MSE should be

Now, let’s briefly show that $R^2$ is indeed just a rescaled version of the MSE:

$$ \begin{align*} R^2 &= 1 - \frac{SSE}{SST} \\ & = 1 - \frac{\frac{1}{n}SSE}{\frac{1}{n}SST} \\ &= 1 - \frac{ \frac{1}{n} \sum_{i=1}^{n} \big ( y^{(i)} - \hat{y}^{(i)} \big )^{2} } { \frac{1}{n} \sum_{i=1}^{n} \big ( y^{(i)} - \mu_{y} \big )^{2} } \\ &= 1 - \frac{MSE}{Var(y)} \end{align*} $$

Loss functions for classification - logits/probabilities (page 472)

Hi Sebastian,

There is the same value on the picture on the page 472 for y_pred in the columns for probabilities (BCELoss) and logits (BCEWithLogitsLoss): 0.8
Probably the value for the first column (BCELoss) is 0.69, which is equal to sigmoid(0.8)?

Thank you.

Incorrect word

Going through the README.md file in the first chapter of your book, I couldn't stopping wondering how you came up with this amazing book. Obviously, it would be an understatement to say this book is a great piece. Thumbs up.

However, I noticed two incorrect words, precisely "comotabible" and "avaiable" in the 4th and 5th paragraph of Jupiter Notebook respectively.

Asides that, you have done a perfect job and I can't wait to get your book.

Layer dimensions in NeuralNetMLP (page 348)

Hi Sebastian,

There are comments about dimensions for hidden and output layers for MLP on the page 348 and in the repository:

machine-learning-book/ch11/neuralnet.py

Lines 43 to 45 in baf2513

 # input dim: [n_hidden, n_features] dot [n_features, n_examples] .T 

 # output dim: [n_examples, n_hidden] 

 z_h = np.dot(x, self.weight_h.T) + self.bias_h

machine-learning-book/ch11/neuralnet.py

Lines 49 to 51 in baf2513

 # input dim: [n_classes, n_hidden] dot [n_hidden, n_examples] .T 

 # output dim: [n_examples, n_classes] 

 z_out = np.dot(a_h, self.weight_out.T) + self.bias_out

Based on the description on the same section above (page 342), it seems that:

For hidden layer:
- x has dimension [n_examples, n_features] (but [n_hidden, n_features] is specified)
- self.weight_h.T has dimension [n_hidden, n_features].T (but [n_features, n_examples] .T is specified)

So z_h = np.dot(x, self.weight_h.T) + self.bias_h will have specified dimension ([n_examples, n_hidden])

For output layer:
a_h has dimension [n_examples, n_hidden] (but [n_classes, n_hidden] is specified)
self.weight_h.T has dimension [n_classes, n_hidden].T (but [n_hidden, n_examples] .T is specified)

So z_out = np.dot(a_h, self.weight_out.T) + self.bias_out will have specified dimension ([n_examples, n_classes])

Is it correct, or I am wrong in comments interpretation?

Thank you.

Batch vs layer normalization description (page 560)

Hi Sebastian,

There is the description of the batch and layer normalization (including the picture) on the page 560:

"While layer normalization is traditionally performed across all elements in a given feature for each feature independently, the layer normalization used in transformers extends this concept and computes the normalization statistics across all feature values independently for each training example."

Is layer normalization mentioned correctly in the first case? It seems that when we calculate statistics for each feature independently, we perform batch normalization?

Thank you.

Missing text chunk (page 530)

Hi Sebastian,

There is a code snippet for text preprocessing for language model on the page 530:

text_chunks = [text_encoded[i:i+chunk_size]
               for i in range(len(text_encoded)-chunk_size)]

Probably the last text chunk is not included, and to include all text chunks we need to use the following code:

text_chunks = [text_encoded[i:i+chunk_size]
               for i in range(len(text_encoded)-chunk_size+1)]

Then for the last i value (len(text_encoded)-chunk_size)) we will have text chunk:
text_encoded[len(text_encoded)-chunk_size:len(text_encoded)]
which has the size chunk_size and I suppose can be included as additional text chunk.

Thank you.

P. 361 missing i in the first summation in MSE Loss equation

In page 361, MSE Loss equation missing index i in the first summation.

$$ L \big ( \mathbf{W}, \mathbf{b} \big ) = \frac{1}{n} \sum_{1}^{n} \frac{1}{t} \sum_{j=1}^{t} \big( y_{j}^{[i]} - a_{j}^{(out)[i]} \big )^{2} $$

sould be corrected as follows:

$$ L \big ( \mathbf{W}, \mathbf{b} \big ) = \frac{1}{n} \sum_{i=1}^{n} \frac{1}{t} \sum_{j=1}^{t} \big( y_{j}^{[i]} - a_{j}^{(out)[i]} \big )^{2} $$

and

superscript $[i]$ is the index of a particular example in our training dataset

the bracket superscript usage is somewhat inconsistent with the parenthesis superscript convention presented in page 338.

Fully connected layer dimensions for GNN (p. 650)

Hi Sebastian,

There is a very useful picture on the page 650 with network layers description for GNN.
Probably that for fully connected layer the product W_fc (4x4) with h_fc (1x4) must be in swapped order (h_fc x W_fc) to get the same dimension as h_fc (1x4)?

Thank you.

Make an optional flowchart

Suggestion : I would recommend and suggest that the author do a flowchart or a diagram for all steps from chapter 1 to 6 for each Machine learning algorithm or all of them at once. like for example some machine learning algorithm don't need to preprocess data ( standardized for example Like tree algorithm ) this will help new learners like me to memorize all required steps fast starting from preprocessing data till tuning and evaluating. This will help us to achieve and done a perfect machine learning model.

Typo on Page 38?

I think there is a typo in the last line of the screenshot (bottom of page 38); shouldn't the summation on the right be over the weights/inputs, i.e. $j$ instead of $i$ (training samples)?

Typo in page 80 - logical_or

The last paragraph in page 80 says "Using the following code, we will create a simple dataset that has the form of an XOR gate using the
logical_or function".

It should be logical_xor, as we can deduce from the preceding explanation and from the code immediately below the text.

AdaBoost duplicated update formula (page 233)

Hi Sebastian,

There is the same formula used for variables update_if_wrong_1 and update_if_wrong_2 on the page 233.
Also update_if_wrong_2 is not mentioned in corresponding notebook/script in repository.
Probably this variable is not used anywhere?

Thank you.

P. 458 bracket should be corrected to floor oepration

In Page 458,

In the following paragraph at the bottom of page,

Consider the following two cases:

Compute the output size for an input vector of size 10 with a convolution kernel of size 5, padding 2, and stride 1:

$$ n=10, m=5, p=2, s=1 \to o = \left [ \frac{10 + 2 \times 2 - 5}{1} \right ] +1 = 10 $$

(Note that in this case, the output size turns out to be the same as the input; therefore, we can conclude this to be same padding mode.)

How does the output size change for the same input vector when we have a kernel of size 3 and stride 2?

$$ n=10, m=3, p=2, s=2 \to o = \left [ \frac{10 + 2 \times 2 - 3}{2} \right ] +1 = 6 $$

should be corrected right floor symbole as follows:
Consider the following two cases:

Compute the output size for an input vector of size 10 with a convolution kernel of size 5, padding 2, and stride 1:

$$ n=10, m=5, p=2, s=1 \to o = \left \lfloor \frac{10 + 2 \times 2 - 5}{1} \right \rfloor +1 = 10 $$

(Note that in this case, the output size turns out to be the same as the input; therefore, we can conclude this to be same padding mode.)

How does the output size change for the same input vector when we have a kernel of size 3 and stride 2?

$$ n=10, m=3, p=2, s=2 \to o = \left \lfloor \frac{10 + 2 \times 2 - 3}{2} \right \rfloor +1 = 6 $$

P. 477 formula should be corrected with right floor symbol

In page 477. The formula for spatial dimension of output feature map

$$ o = \left [ \frac{n + 2p -m}{s} \right ] + 1 $$

should be corrected with right floor symbol as follows:

$$ o = \left \lfloor \frac{n + 2p -m}{s} \right \rfloor + 1 $$

[BUG] Error in ch11 example file

When we use X, y = fetch_openml('mnist_784', version=1, return_X_y=True), an error occurs:

URLError                                  Traceback (most recent call last)
Input In [2], in <cell line: 2>()
      1 from sklearn.datasets import fetch_openml
----> 2 X, y = fetch_openml('mnist_784', version=1, return_X_y=True, cache=False)

File /lib/python3.10/site-packages/sklearn/datasets/_openml.py:862, in fetch_openml(name, version, data_id, data_home, target_column, cache, return_X_y, as_frame)
    856     if data_id is not None:
    857         raise ValueError(
    858             "Dataset data_id={} and name={} passed, but you can only "
    859             "specify a numeric data_id or a name, not "
    860             "both.".format(data_id, name)
    861         )
--> 862     data_info = _get_data_info_by_name(name, version, data_home)
    863     data_id = data_info["did"]
    864 elif data_id is not None:
    865     # from the previous if statement, it is given that name is None

File /lib/python3.10/site-packages/sklearn/datasets/_openml.py:428, in _get_data_info_by_name(name, version, data_home)
    426 url = (_SEARCH_NAME + "/data_version/{}").format(name, version)
    427 try:
--> 428     json_data = _get_json_content_from_openml_api(
    429         url, error_message=None, data_home=data_home
    430     )
    431 except OpenMLError:
    432     # we can do this in 1 function call if OpenML does not require the
    433     # specification of the dataset status (i.e., return datasets with a
    434     # given name / version regardless of active, deactivated, etc. )
    435     # TODO: feature request OpenML.
    436     url += "/status/deactivated"

File /lib/python3.10/site-packages/sklearn/datasets/_openml.py:175, in _get_json_content_from_openml_api(url, error_message, data_home)
    172         return json.loads(response.read().decode("utf-8"))
    174 try:
--> 175     return _load_json()
    176 except HTTPError as error:
    177     # 412 is an OpenML specific error code, indicating a generic error
    178     # (e.g., data not found)
    179     if error.code != 412:

File /lib/python3.10/site-packages/sklearn/datasets/_openml.py:59, in _retry_with_clean_cache.<locals>.decorator.<locals>.wrapper(*args, **kw)
     56 @wraps(f)
     57 def wrapper(*args, **kw):
     58     if data_home is None:
---> 59         return f(*args, **kw)
     60     try:
     61         return f(*args, **kw)

File /lib/python3.10/site-packages/sklearn/datasets/_openml.py:171, in _get_json_content_from_openml_api.<locals>._load_json()
    169 @_retry_with_clean_cache(url, data_home)
    170 def _load_json():
--> 171     with closing(_open_openml_url(url, data_home)) as response:
    172         return json.loads(response.read().decode("utf-8"))

File /lib/python3.10/site-packages/sklearn/datasets/_openml.py:103, in _open_openml_url(openml_path, data_home)
    100 req.add_header("Accept-encoding", "gzip")
    102 if data_home is None:
--> 103     fsrc = urlopen(req)
    104     if is_gzip_encoded(fsrc):
    105         return gzip.GzipFile(fileobj=fsrc, mode="rb")

File /lib/python3.10/urllib/request.py:216, in urlopen(url, data, timeout, cafile, capath, cadefault, context)
    214 else:
    215     opener = _opener
--> 216 return opener.open(url, data, timeout)

File /lib/python3.10/urllib/request.py:519, in OpenerDirector.open(self, fullurl, data, timeout)
    516     req = meth(req)
    518 sys.audit('urllib.Request', req.full_url, req.data, req.headers, req.get_method())
--> 519 response = self._open(req, data)
    521 # post-process response
    522 meth_name = protocol+"_response"

File /lib/python3.10/urllib/request.py:541, in OpenerDirector._open(self, req, data)
    538 if result:
    539     return result
--> 541 return self._call_chain(self.handle_open, 'unknown',
    542                         'unknown_open', req)

File /lib/python3.10/urllib/request.py:496, in OpenerDirector._call_chain(self, chain, kind, meth_name, *args)
    494 for handler in handlers:
    495     func = getattr(handler, meth_name)
--> 496     result = func(*args)
    497     if result is not None:
    498         return result

File /lib/python3.10/urllib/request.py:1419, in UnknownHandler.unknown_open(self, req)
   1417 def unknown_open(self, req):
   1418     type = req.type
-> 1419     raise URLError('unknown url type: %s' % type)

URLError: <urlopen error unknown url type: https>

Downloading CelebA dataset from book's download link.

On page 483, one way to download the CelebA dataset is with the book's download link.
In the instructions, you mentioned that we must unzip the downloaded file. But one step that is missing is that we have to unzip the img_align_celeba.zip too; otherwise, PyTorch will throw an error complaining the dataset is corrupt, which is caused by this line of code:

https://github.com/pytorch/vision/blob/22400011d6a498ecf77797a56dfe13bc94c426ca/torchvision/datasets/celeba.py#L142

So, I think it's better to mention that explicitly too.

P.S: Thanks for this excellent book!

	# input dim: [n_hidden, n_features] dot [n_features, n_examples] .T
	# output dim: [n_examples, n_hidden]
	z_h = np.dot(x, self.weight_h.T) + self.bias_h

	# input dim: [n_classes, n_hidden] dot [n_hidden, n_examples] .T
	# output dim: [n_examples, n_classes]
	z_out = np.dot(a_h, self.weight_out.T) + self.bias_out

rasbt / machine-learning-book Goto Github PK

machine-learning-book's Introduction

Machine Learning with PyTorch and Scikit-Learn Book

Code Repository

Links

Table of Contents and Code Notebooks

Coding Environment

Translations into other Languages

machine-learning-book's People

Contributors

Stargazers

Watchers

Forkers

machine-learning-book's Issues

Recommend Projects

Recommend Topics

Recommend Org