apachecn / hands-on-ml-zh Goto Github PK

View Code? Open in Web Editor NEW

3.7K 267.0 1.5K 42.55 MB

:book: [译] Sklearn 与 TensorFlow 机器学习实用指南【版权问题，网站已下线！！】

Python 2.17% JavaScript 10.51% CSS 54.12% HTML 32.56% Shell 0.64%

tensorflow sklearn python machine-learning deep-learning book

hands-on-ml-zh's Introduction

Sklearn 与 TensorFlow 机器学习实用指南第二版

协议：CC BY-NC-SA 4.0

懦夫才用磁带备份，真男人把重要的东西传到 FTP，然后世界会帮他备份。——林纳斯·托瓦兹

编译

npm install -g gitbook-cli          # 安装 gitbook
gitbook fetch 3.2.3                 # 安装 gitbook 子版本
gitbook install                     # 安装必要的插件
gitbook <build|pdf|epub|mobi>       # 编译 HTML/PDF/EPUB/MOBI

下载

Docker

docker pull apachecn0/hands-on-ml-2e-zh
docker run -tid -p <port>:80 apachecn0/hands-on-ml-2e-zh
# 访问 http://localhost:{port} 查看文档

PYPI

pip install hands-on-ml-2e-zh
hands-on-ml-2e-zh <port>
# 访问 http://localhost:{port} 查看文档

NPM

npm install -g handson-ml-2e-zh
handson-ml-2e-zh <port>
# 访问 http://localhost:{port} 查看文档

hands-on-ml-zh's People

Contributors

Stargazers

Watchers

Forkers

daicoolb fendaq lierpeng fusheng-ji chenyyx qiaoxie zhumengdetianshi pp528833 servant007 moezx mashiroarchive sdpku wilsonqu hanhanlixianji iamseancheney hustercn huanlinzhang mincore jzorrof halleyhaoyu mgbin088 johnjiangla nehcuh rickllyxu maxwellalan royaljava wzhxs softwarewin bearwilliamed a-li-peng luwill jiaofusen little1tow somtian rwzhao yuckfu shmct ituco xamxxxx jingwangfei qinchangping yanheluke kidkid168 wackyyang yizhixiaoguli zhongkailv zhuhd15 erfengwelink shengrui1994 xingbuyang htaiwan xinruili kobedeshow flynn-z empythy henry9709 lanthlove zhouqzzw bluemoonwencong czh-hw vodaka renjunxiang fuzhaohai glen9527 vine3401 lioneldong sxhfut joulemusic owalnuto ican73 binbinerices uraboer starrry22 tomwlf oilblue gonewithgt luyimin714 zhuhongda1114 maybefeicun justinzhu alexcheen peterho jsgr iraychou lifangzheng shankeai ozrm peizhe seele0101 zhouliyfly lfwjune beyond88888888 dansyu xhalliwell bianximo jame-zhang albertchen121 mihaiwong jiacheng-wei zhearing

hands-on-ml-zh's Issues

第3章混淆矩阵的公式图片

“这证明了提高阈值会降调召回率。” -> 这说明提高阈值会使召回率降低。

越往后越随意了啊，第4章公式图例几乎都没有……

第五章支持向量机翻译小错误

在 训练目标 下的注中，

（因为最小化w值和b值，也是最小化该值一半的平方）

原文是 (since the values of w and b that minimize a value also minimize half of its square)，
所以应该是该值平方的一半，不是该值一半的平方吧。

翻译的译注有点问题

“”“在原书中使用LabelEncoder转换器来转换文本特征列的方式是错误的，该转换器只能用来转换标签（正如其名）。在这里使用LabelEncoder没有出错的原因是该数据只有一列文本特征值，在有多个文本特征列的时候就会出错。”“”

第二章中的译注
LabelEncoder这个转化函数本身要求传入的参数就是一个Series，而不是DataFrame，所以这里的提示没有什么意义
当然作为读书笔记还是加上这种提示的，但是并不代表原书作者使用的错误

第三章线性回归中的小错误

theta_best = np.linalg.inv(X_b.T.dot(X_B)).dot(X_b.T).dot(y)
这一行代码应为
theta_best = np.linalg.inv(X_b.T.dot(X_b)).dot(X_b.T).dot(y)

另外, 由于之前np.random没有设置随机数种子, 所以生成的x和y本来就因人而异, 导致"我们希望最后得到的参数为 \theta_0=4,\theta_1=3 而不是 \theta_0=3.865,\theta_1=3.139 "这一句会与自己调试时必然不符, 没有必要注上译者的结果

第二章代码错误？

文中为：from pandas.tools.plotting import scatter_matrix

现在pandas更新了，调用变成from pandas.plotting import scatter_matrix

请确认

错别字

第三章

混淆矩阵
对分类器来说，一个好得多的性能评估指标是混淆矩阵。大体思路是：输出类别A被分类成类别 B 的次数。举个例子，为了知道分类器将 5 误分为 3 的次数，你需要查看混淆矩阵的第五航第三列

应该为：第五航第三列 -> 第五行第三列

第三章勘误

对分类器来说，一个好得多的性能评估指标是混淆矩阵。大体思路是：输出类别A被分类成类别 B 的次数。
这句应该翻译为：类别为A的示例被错分类为类别B的次数。

Markdown 中的公式为啥都是图片，而不用 tex 直接写呢？

PDF十二章图片显示问题

下载的PDF版十二章图片无法显示，从Safari下载用Mac自带的预览打开。如图所示。

第十章使用tensorflow高级api报错

按照教程中的代码录入

import tensorflow as tf
import numpy as np
import os
from sklearn.metrics import accuracy_score
from tensorflow.examples.tutorials.mnist import input_data

### tensorflow警告记录，可以避免在运行文件时出现红色警告
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
old_v = tf.logging.get_verbosity()
tf.logging.set_verbosity(tf.logging.ERROR)

(X_train, y_train), (X_test, y_test) = tf.keras.datasets.mnist.load_data()
X_train = X_train.astype(np.float32).reshape(-1, 28 * 28) / 255.0
X_test = X_test.astype(np.float32).reshape(-1, 28 * 28) / 255.0
y_train = y_train.astype(np.int32)
y_test = y_test.astype(np.int32)

X_valid, X_train = X_train[:5000], X_train[5000:]
y_valid, y_train = y_train[:5000], y_train[5000:]

feature_cols = [tf.feature_column.numeric_column("X", shape=[28 * 28])]
# 下面的代码训练两个隐藏层的 DNN（一个具有 300 个神经元，另一个具有 100 个神经元）和一个具有 10 个神经元的 SOFTMax 输出层
dnn_clf = tf.estimator.DNNClassifier(hidden_units=[300, 100], n_classes=10,
                                     feature_columns=feature_cols)

input_fn = tf.estimator.inputs.numpy_input_fn(
    x={"X": X_train}, y=y_train, num_epochs=40, batch_size=50, shuffle=True)
dnn_clf.train(input_fn=input_fn)

y_pred = list(dnn_clf.predict(X_test))
accuracy=accuracy_score(y_test, y_pred)
print(accuracy)

报错

Traceback (most recent call last):
  File "C:\ProgramData\Anaconda3\envs\python36\lib\inspect.py", line 1119, in getfullargspec
    sigcls=Signature)
  File "C:\ProgramData\Anaconda3\envs\python36\lib\inspect.py", line 2186, in _signature_from_callable
    raise TypeError('{!r} is not a callable object'.format(obj))
TypeError: array([[0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       ...,
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.]], dtype=float32) is not a callable object

The above exception was the direct cause of the following exception:

发现第2章一个完整的机器学习项目中的一个小错误

在“在训练集上训练和评估”一步中

from sklearn.metrics import mean_squared_error
housing_predictions = lin_reg.predict(housing_prepared)
lin_mse = mean_squared_error(some_labels, housing_predictions)
lin_rmse = np.sqrt(lin_mse)
lin_rmse
68628.413493824875
some_labels应该是housing_labels

第9章一个代码错误

第9章手动一节，有这一行
gradients = 2/m * tf.matmul(tf.transpose(X), error)
实际执行时，由于2/m（m是整数）结果为0，导致gradients为0，算法不收敛

“gradients = 2 * xi.T.dot(xi,dot(theta)-yi)” 应为 “gradients = 2 * xi.T.dot(xi.dot(theta)-yi)”
“sgd_reg + SGDRregressor(n_iter=50, penalty=None, eta0=0.1)” 应为 “sgd_reg = SGDRregressor(n_iter=50, penalty=None, eta0=0.1)”
“训练过程使用的代价函数和测试过程使用的评价函数不一样样的。” 应为“评价函数是不一样的。”
“如我定义”应为“如果定义”或“如果我们定义”
“去增加了模型的偏差”应为“却增加了模型的偏差”
“对线性回归来说，对于岭回归，我们可以使用封闭方程去计算，也可以使用梯度下降去处理。”应为“就像进行线性回归那样，对于岭回归的处理，我们既可以使用封闭方程去计算，也可以使用梯度下降去处理。”

第四章 Normal Equation 翻译为正规方程比较好一些

demo问题

PDF中的第三章的准确率与召回率那里(page86)，precision_score(y_train_5, y_pred)中y_pred应该是y_train_pred

第二章【在训练集上训练和评估】中部分代码有误

在训练集上训练和评估

行的通，尽管预测并不怎么准确（比如，第二个预测偏离了 50%！）。让我们使用 Scikit-Learn 的mean_squared_error函数，用全部训练集来计算下这个回归模型的 RMSE：

from sklearn.metrics import mean_squared_error
housing_predictions = lin_reg.predict(housing_prepared)
lin_mse = mean_squared_error(housing_labels, housing_predictions)

最后一行应为lin_mse = mean_squared_error(some_labels, housing_predictions)

训练数据集大小有个bug

第二章中的训练集一会16512 一会16513，前后不一致了

第二章 tex 公式，遗漏了减号

位置在 “预测误差是”后面，英文原文为 "The prediction error for this
district is "

2.一个完整的机器学习项目.md 一个变量名错了

“使用交叉验证做更佳的评估”小节中，

def display_scores(scores):
... print("Scores:", scores)
... print("Mean:", scores.mean())
... print("Standard deviation:", scores.std())
...
display_scores(tree_rmse_scores)

最后一行的“tree_rmse_scores”应该是“rmse_scores”

from sklearn.preprocessing import LabelEncoder
encoder = LabelEncoder()
housing_cat = housing["ocean_proximity"]
housing_cat_encoded1 = encoder.fit_transform(housing_cat)
housing_cat_encoded2, housing_categories = housing_cat.factorize()
housing_cat_encoded1[:10] 
 housing_cat_encoded2[:10]

为什么housing_cat_encoded1的值0-4， housing_cat_encoded2的值0-2

第二版上市了

Hands on Machine Learning with Scikit-Learning,Keras&Tensorflow. 英文版第二版已经上市了，水友们继续翻译一波？

https://www.oreilly.com/library/view/hands-on-machine-learning/9781492032632/

第二章 CategoricalEncoder代码小错误

if self.encoding not in ['onehot', 'onehot-dense', 'ordinal']:
template = ("encoding should be either 'onehot', 'onehot-dense' "
"or 'ordinal', got %s")
raise ValueError(template % self.handle_unknown)

中的self.handle_unknow 应该是self.encoding

第一章代码示例中prepare_country_stats是一个自定义函数？

代码片段：

    # 准备数据
    country_stats = prepare_country_stats(oecd_bli, gdp_per_capita)
    X = np.c_[country_stats["GDP per capita"]]
    y = np.c_[country_stats["Life satisfaction"]]

这里的prepare_country_stats是从哪里来的？还是只是个示例说明，需要用np自己把这两个矩阵粘在一起？