统计学习方法习题解答，在线阅读地址：https://datawhalechina.github.io/statistical-learning-method-solutions-manual

Home Page: https://datawhalechina.github.io/statistical-learning-method-solutions-manual

License: Other

Jupyter Notebook 89.84% Python 10.16%

machine-learning statistical-learning-method

statistical-learning-method-solutions-manual's Issues

在线阅读地址打不开

在线阅读地址打不开了，链接点开一直是loading状态。

第一章习题1.2 贝叶斯估计的一般步骤有误

2.应该是
$$P(D|\Theta) = \prod_{i=0}^n P(X_i|\Theta)$$

第十章习题10.2 的答案是否有误?

您好, 习题10.2 中, 用于计算 $\frac{\alpha_4(3)\beta_4(3)}{P(O|\lambda)}$ 的这一行代码:

result = (0.011096 * 0.043) / 0.003477

中的0.011096 实际上使用的是 $\alpha_5(3)$ 的值, 而非 $\alpha_4(3)$

在forward过程中, 输出了6个$\alpha_1$:

alpha1(0) = p0b0b(o1) = 0.100000
alpha1(1) = p1b1b(o1) = 0.120000
alpha1(2) = p2b2b(o1) = 0.350000
alpha1(0) = [sigma alpha0(i)ai0]b0(o1) = 0.078000
alpha1(1) = [sigma alpha0(i)ai1]b1(o1) = 0.084000
alpha1(2) = [sigma alpha0(i)ai2]b2(o1) = 0.082200
alpha2(0) = [sigma alpha1(i)ai0]b0(o2) = 0.040320
alpha2(1) = [sigma alpha1(i)ai1]b1(o2) = 0.026496
alpha2(2) = [sigma alpha1(i)ai2]b2(o2) = 0.068124
alpha3(0) = [sigma alpha2(i)ai0]b0(o3) = 0.020867
alpha3(1) = [sigma alpha2(i)ai1]b1(o3) = 0.012362
alpha3(2) = [sigma alpha2(i)ai2]b2(o3) = 0.043611
alpha4(0) = [sigma alpha3(i)ai0]b0(o4) = 0.011432
alpha4(1) = [sigma alpha3(i)ai1]b1(o4) = 0.010194
alpha4(2) = [sigma alpha3(i)ai2]b2(o4) = 0.011096

根据该输出, $\alpha_4(3) = 0.043611$, 所以是否应该改为:

result = (0.043611 * 0.043) / 0.003477

kd tree-next_branch is not none 问题

原来的代码如果搜索到无左或右子树就回退，但事实上无叶节点的分支区域会存在离根节点更近的点
修改后的search代码
` def _search(self, point, tree=None, k=1, k_neighbors_sets=None, depth=0):
"""算法3.3 搜索

    Args:
        point (_type_): _description_
        tree (_type_, optional): _description_. Defaults to None.
        k (int, optional): _description_. Defaults to 1.
        k_neighbors_sets (_type_, optional): _description_. Defaults to None.
        depth (int, optional): _description_. Defaults to 0.

    Returns:
        _type_: _description_
    """
    n = point.shape[1] # 看输入格式np.array([[3, 4.5]]) # shape:(1, 2)
    if k_neighbors_sets is None:
        k_neighbors_sets = []
    if tree is None:
        return k_neighbors_sets
    
    # (1)找到包含目标点x的叶节点
    if tree.left_child is None and tree.right_child is None:
        # 更新当前k近邻集
        return self._update_k_neighbor_sets(k_neighbors_sets, k ,tree, point)
    
    # 递归地向下访问kd树
    if point[0][depth % n] < tree.value[depth % n]:
        direct = 'left'
        next_branch = tree.left_child
    else:
        direct = 'right'
        next_branch = tree.right_child
        
    if next_branch is not None:
        # (3)（b）检查另一子节点对应的区域是否相交
        # 递归
        k_neighbors_sets = self._search(point, tree=next_branch, k=k, depth=depth + 1,
                                         k_neighbors_sets=k_neighbors_sets)
        # 计算目标点与切分点形成的分割超平面的距离
        temp_dist = abs(tree.value[depth % n] - point[0][depth % n])
        
        # 判断超球体是否与超平面相交
        if not(k_neighbors_sets[0][0] < temp_dist and len(k_neighbors_sets) == k): # 换到另一侧
            # 如果相交，递归地进行近邻搜索
            # 判断当前结点，并更新当前k近邻点集
            k_neighbors_sets = self._update_k_neighbor_sets(k_neighbors_sets, k, tree, point) # tree 返回父节点
            if direct == 'left':
                return self._search(point, tree=tree.right_child, k=k, depth = depth + 1, k_neighbors_sets=k_neighbors_sets)
            else:
                return self._search(point, tree=tree.left_child, k=k, depth = depth + 1, k_neighbors_sets=k_neighbors_sets)
    else:
        temp_dist = abs(tree.value[depth % n] - point[0][depth % n])
        
        # 判断超球体是否与超平面相交
        if not(len(k_neighbors_sets) == k): # 换到另一侧
            # 如果相交，递归地进行近邻搜索
            # 判断当前结点，并更新当前k近邻点集
            k_neighbors_sets = self._update_k_neighbor_sets(k_neighbors_sets, k, tree, point) # tree 返回父节点
            if direct == 'left':
                return self._search(point, tree=tree.right_child, k=k, depth = depth + 1, k_neighbors_sets=k_neighbors_sets)
            else:
                return self._search(point, tree=tree.left_child, k=k, depth = depth + 1, k_neighbors_sets=k_neighbors_sets)
        # return self._update_k_neighbor_sets(k_neighbors_sets, k,tree, point)
    
    return k_neighbors_sets`

公式计算可以加一点注释吗

公式计算可以加一点注释吗，光第一节贝叶斯估计那里，突然冒出来一个式子pi(p)根本不知道定义啥，还得自己google，结果就是自己学会了，在这上反而耽误更多时间，太头大了，补个链接：http://www.statslab.cam.ac.uk/Dept/People/djsteaching/S1B-17-06-bayesian.pdf

fix:将习题7.2支持向量序号改为与题干一致

请问项目可以通过PR merge么？可以打开PR，这样大家如果发现错误可以直接提PR

19章的答案缺失

习题15.5中的图15.2链接挂了

习题 4.1，对数似然函数的推导

您好，请问【习题 4.1，对数似然函数】部分的推导：

是否应该改成：

虽然这并不影响后续的结论。

$$ \begin{aligned} \displaystyle \log L(p|Y) &= \log C_N^m p^m (1-p)^{N-m} \\ &= \log(C_N^m) + \log(p^m) + \log[ (1-p)^{N-m} ] \\ &= \log(C_N^m) + m\log p + (N-m)\log (1-p) \end{aligned} $$

习题9.3

习题9.3my_gmm.py第50行为什么是u = self.mean_[-1]

决策树习题5.3

第五章决策树习题5.3：是不是还需要考虑t2和t3是子节点和父节点的情形，这时候直接在父节点处剪枝，得到唯一的最优子树

习题4.2中多项式分布前面漏了一项？

证明公式(4.11)处第二步：

而多项分布的概率公式应该是这样的：

虽然这样不影响最后结果。

习题8.1中计算总分类其的误差率有对应的公式支撑吗？

习题8.1中 class MyAdaBoost 中的fit函数，计算总分类器的误差有点看不懂，能帮助解释一下吗？如下图所示

KDTree搜索knn代码修改

KDTree搜索knn有点小问题，用原始代码测试以下例子：

import numpy as np

X_train = np.array([[2, 3],
                    [5, 4],
                    [9, 6],
                    [4, 7],
                    [8, 4],
                    [7, 2]])
kd_tree = KDTree(X_train)
# 设置k值
k = 1
# 查找邻近的结点
dists, indices = kd_tree.query(np.array([[7, 4]]), k=k)
# 打印邻近结点
print_k_neighbor_sets(k, indices, dists)
# x点的最近邻点是(9, 6)，距离是2.8284

修改之后的版本如下，可以得到正确答案：x点的最近邻点是(8, 4)，距离是1.0000

class KDTree:
    """kd tree类"""

    def __init__(self, data):
        # 数据集
        self.data = np.asarray(data)
        # kd树
        self.kd_tree = None
        # 创建平衡kd树
        self._create_kd_tree(data)

    def _split_sub_tree(self, data, depth=0):
        # 算法3.2第3步：直到子区域没有实例存在时停止
        if len(data) == 0:
            return None
        # 算法3.2第2步：选择切分坐标轴, 从0开始（书中是从1开始）
        l = depth % data.shape[1]
        # 对数据进行排序
        data = data[data[:, l].argsort()]
        # 算法3.2第1步：将所有实例坐标的中位数作为切分点
        median_index = data.shape[0] // 2
        # 获取结点在数据集中的位置
        node_index = [i for i, v in enumerate(
            self.data) if list(v) == list(data[median_index])]
        return Node(
            # 本结点
            value=data[median_index],
            # 本结点在数据集中的位置
            index=node_index[0],
            # 左子结点
            left_child=self._split_sub_tree(data[:median_index], depth + 1),
            # 右子结点
            right_child=self._split_sub_tree(
                data[median_index + 1:], depth + 1)
        )

    def _create_kd_tree(self, X):
        self.kd_tree = self._split_sub_tree(X)

    def query(self, data, k=1):
        data = np.asarray(data)
        hits = self._search(data, self.kd_tree, k=k, k_neighbor_sets=list())
        dd = np.array([hit[0] for hit in hits])
        ii = np.array([hit[1] for hit in hits])
        return dd, ii

    def __repr__(self):
        return str(self.kd_tree)

    @staticmethod
    def _cal_node_distance(node1, node2):
        """计算两个结点之间的距离"""
        return np.sqrt(np.sum(np.square(node1 - node2)))

    def _search(self, point, tree=None, k=1, k_neighbor_sets=None, depth=0):
        n = len(point)
        if k_neighbor_sets is None:
            k_neighbor_sets = []
        if tree is None:
            return k_neighbor_sets

        # (1)找到包含目标点x的叶结点
        if tree.left_child is None and tree.right_child is None:
            # 更新当前k近邻点集
            return self._update_k_neighbor_sets(k_neighbor_sets, k, tree, point)
        

        # 递归地向下访问kd树
        if point[0][depth % n] < tree.value[depth % n]:
            direct = 'left'
            next_branch = tree.left_child
        else:
            direct = 'right'
            next_branch = tree.right_child
        if next_branch is not None:
            # # (3)(a) 判断当前结点，并更新当前k近邻点集
            # k_neighbor_sets = self._update_k_neighbor_sets(
            #     k_neighbor_sets, k, next_branch, point)
            # # (3)(b)检查另一子结点对应的区域是否相交
            # k_neighbor_sets = self._update_k_neighbor_sets(k_neighbor_sets, k, tree, point)
            k_neighbor_sets = self._search(point, tree=next_branch, k=k, depth=depth + 1, k_neighbor_sets=k_neighbor_sets)
            
            temp_dist = abs(tree.value[depth % n] - point[0][depth % n])  # 第s维上目标点与分割超平面的距离
            
            if direct == 'left':
                if not (k_neighbor_sets[0][0] < temp_dist and len(k_neighbor_sets) == k):  # 判断超球体是否与超平面相交
                    # 如果相交，递归地进行近邻搜索
                    k_neighbor_sets = self._update_k_neighbor_sets(k_neighbor_sets, k, tree, point) # 判断当前结点，并更新当前k近邻点集
                    return self._search(point, tree=tree.right_child, k=k, depth=depth + 1,
                                        k_neighbor_sets=k_neighbor_sets)
            else:
                if not (k_neighbor_sets[0][0] < temp_dist and len(k_neighbor_sets) == k):  # 判断超球体是否与超平面相交
                    # 如果相交，递归地进行近邻搜索
                    k_neighbor_sets = self._update_k_neighbor_sets(k_neighbor_sets, k, tree, point) # 判断当前结点，并更新当前k近邻点集
                    return self._search(point, tree=tree.left_child, k=k, depth=depth + 1,
                                        k_neighbor_sets=k_neighbor_sets)

        return k_neighbor_sets

    def _update_k_neighbor_sets(self, best, k, tree, point):
        # 计算目标点与当前结点的距离
        node_distance = self._cal_node_distance(point, tree.value)
        if len(best) == 0:
            best.append((node_distance, tree.index, tree.value))
        elif len(best) < k:
            # 如果“当前k近邻点集”元素数量小于k
            self._insert_k_neighbor_sets(best, tree, node_distance)
        else:
            # 叶节点距离小于“当前 𝑘 近邻点集”中最远点距离
            if best[0][0] > node_distance:
                best = best[1:]
                self._insert_k_neighbor_sets(best, tree, node_distance)
        return best

    @staticmethod
    def _insert_k_neighbor_sets(best, tree, node_distance):
        """将距离最远的结点排在前面"""
        n = len(best)
        for i, item in enumerate(best):
            if item[0] < node_distance:
                # 将距离最远的结点插入到前面
                best.insert(i, (node_distance, tree.index, tree.value))
                break
        if len(best) == n:
            best.append((node_distance, tree.index, tree.value))

Some code questions about problem 3.3 in Chapter 3.

Some_code_errors.md

习题1.1的解答勘误

解答步骤的第1步，“X的概率分布函数，即伯努利模型可写为”应改为“X的概率分布，即伯努利模型可写为”。
那个公式不是概率分布函数。
望纠正。

所有公式都不能正常显示

您好，我使用的是mac设备，safari和Chrome浏览器都不能正常显示公式。
$$ \displaystyle P(Y=c_k) = \frac{\displaystyle \sum......

习题14.3

习题14.3中T(n,k) 指数生成函数表达式不是显然的。此部分是该题的实际核心所在，而展开生成函数反而是简明的。请补充该部分的细节。

Cp7 - 3 solution

I think the solution of 7.3 may have some problems.

and my final result is

某些答案中的页码给定错误

习题21.3中,

根据书中第422页PageRank的一般定义

我使用的是<机器学习方法>这一本书, 实际的页码应该是355.
我们的项目现在应该是<机器学习方法>的答案而不是<统计学习方法>, 因为<统计学习方法>没有第三篇深度学习的内容.

习题8.1的解答中，第3步：自编程实现AdaBoost算法代码有误。

在fit()方法中，传入update_w()方法的参数y_predict，根据李航书，应该是基本分类器Gm的分类值，也就是代码中的参数sign，而不是总分类器G的分类值y_predict。在将代码修改后，经过6轮迭代，模型收敛，而不是8轮。

datawhalechina / statistical-learning-method-solutions-manual Goto Github PK

statistical-learning-method-solutions-manual's Issues

Recommend Projects

Recommend Topics

Recommend Org