Giter Club home page Giter Club logo

statistical-learning-method-solutions-manual's Issues

第十章 习题10.2 的答案是否有误?

您好, 习题10.2 中, 用于计算 $\frac{\alpha_4(3)\beta_4(3)}{P(O|\lambda)}$ 的这一行代码:

result = (0.011096 * 0.043) / 0.003477

中的0.011096 实际上使用的是 $\alpha_5(3)$ 的值, 而非 $\alpha_4(3)$

在forward过程中, 输出了6个$\alpha_1$:

alpha1(0) = p0b0b(o1) = 0.100000
alpha1(1) = p1b1b(o1) = 0.120000
alpha1(2) = p2b2b(o1) = 0.350000
alpha1(0) = [sigma alpha0(i)ai0]b0(o1) = 0.078000
alpha1(1) = [sigma alpha0(i)ai1]b1(o1) = 0.084000
alpha1(2) = [sigma alpha0(i)ai2]b2(o1) = 0.082200
alpha2(0) = [sigma alpha1(i)ai0]b0(o2) = 0.040320
alpha2(1) = [sigma alpha1(i)ai1]b1(o2) = 0.026496
alpha2(2) = [sigma alpha1(i)ai2]b2(o2) = 0.068124
alpha3(0) = [sigma alpha2(i)ai0]b0(o3) = 0.020867
alpha3(1) = [sigma alpha2(i)ai1]b1(o3) = 0.012362
alpha3(2) = [sigma alpha2(i)ai2]b2(o3) = 0.043611
alpha4(0) = [sigma alpha3(i)ai0]b0(o4) = 0.011432
alpha4(1) = [sigma alpha3(i)ai1]b1(o4) = 0.010194
alpha4(2) = [sigma alpha3(i)ai2]b2(o4) = 0.011096

根据该输出, $\alpha_4(3) = 0.043611$, 所以是否应该改为:

result = (0.043611 * 0.043) / 0.003477 

kd tree-next_branch is not none 问题

原来的代码如果搜索到无左或右子树就回退,但事实上无叶节点的分支区域会存在离根节点更近的点
修改后的search代码
` def _search(self, point, tree=None, k=1, k_neighbors_sets=None, depth=0):
"""算法3.3 搜索

    Args:
        point (_type_): _description_
        tree (_type_, optional): _description_. Defaults to None.
        k (int, optional): _description_. Defaults to 1.
        k_neighbors_sets (_type_, optional): _description_. Defaults to None.
        depth (int, optional): _description_. Defaults to 0.

    Returns:
        _type_: _description_
    """
    n = point.shape[1] # 看输入格式np.array([[3, 4.5]]) # shape:(1, 2)
    if k_neighbors_sets is None:
        k_neighbors_sets = []
    if tree is None:
        return k_neighbors_sets
    
    # (1)找到包含目标点x的叶节点
    if tree.left_child is None and tree.right_child is None:
        # 更新当前k近邻集
        return self._update_k_neighbor_sets(k_neighbors_sets, k ,tree, point)
    
    # 递归地向下访问kd树
    if point[0][depth % n] < tree.value[depth % n]:
        direct = 'left'
        next_branch = tree.left_child
    else:
        direct = 'right'
        next_branch = tree.right_child
        
    if next_branch is not None:
        # (3)(b)检查另一子节点对应的区域是否相交
        # 递归
        k_neighbors_sets = self._search(point, tree=next_branch, k=k, depth=depth + 1,
                                         k_neighbors_sets=k_neighbors_sets)
        # 计算目标点与切分点形成的分割超平面的距离
        temp_dist = abs(tree.value[depth % n] - point[0][depth % n])
        
        # 判断超球体是否与超平面相交
        if not(k_neighbors_sets[0][0] < temp_dist and len(k_neighbors_sets) == k): # 换到另一侧
            # 如果相交,递归地进行近邻搜索
            # 判断当前结点,并更新当前k近邻点集
            k_neighbors_sets = self._update_k_neighbor_sets(k_neighbors_sets, k, tree, point) # tree 返回父节点
            if direct == 'left':
                return self._search(point, tree=tree.right_child, k=k, depth = depth + 1, k_neighbors_sets=k_neighbors_sets)
            else:
                return self._search(point, tree=tree.left_child, k=k, depth = depth + 1, k_neighbors_sets=k_neighbors_sets)
    else:
        temp_dist = abs(tree.value[depth % n] - point[0][depth % n])
        
        # 判断超球体是否与超平面相交
        if not(len(k_neighbors_sets) == k): # 换到另一侧
            # 如果相交,递归地进行近邻搜索
            # 判断当前结点,并更新当前k近邻点集
            k_neighbors_sets = self._update_k_neighbor_sets(k_neighbors_sets, k, tree, point) # tree 返回父节点
            if direct == 'left':
                return self._search(point, tree=tree.right_child, k=k, depth = depth + 1, k_neighbors_sets=k_neighbors_sets)
            else:
                return self._search(point, tree=tree.left_child, k=k, depth = depth + 1, k_neighbors_sets=k_neighbors_sets)
        # return self._update_k_neighbor_sets(k_neighbors_sets, k,tree, point)
    
    return k_neighbors_sets`

习题 4.1,对数似然函数的推导

您好,请问【习题 4.1,对数似然函数】部分的推导:

Original
是否应该改成:
Mod
虽然这并不影响后续的结论。

$$ \begin{aligned} \displaystyle \log L(p|Y) &= \log C_N^m p^m (1-p)^{N-m} \\ &= \log(C_N^m) + \log(p^m) + \log[ (1-p)^{N-m} ] \\ &= \log(C_N^m) + m\log p + (N-m)\log (1-p) \end{aligned} $$

习题9.3

习题9.3my_gmm.py第50行为什么是u = self.mean_[-1]

决策树 习题5.3

第五章 决策树 习题5.3: 是不是还需要考虑t2和t3是子节点和父节点的情形,这时候直接在父节点处剪枝,得到唯一的最优子树

KDTree搜索knn代码修改

KDTree搜索knn有点小问题,用原始代码测试以下例子:

import numpy as np

X_train = np.array([[2, 3],
                    [5, 4],
                    [9, 6],
                    [4, 7],
                    [8, 4],
                    [7, 2]])
kd_tree = KDTree(X_train)
# 设置k值
k = 1
# 查找邻近的结点
dists, indices = kd_tree.query(np.array([[7, 4]]), k=k)
# 打印邻近结点
print_k_neighbor_sets(k, indices, dists)
# x点的最近邻点是(9, 6),距离是2.8284

修改之后的版本如下,可以得到正确答案:x点的最近邻点是(8, 4),距离是1.0000

class KDTree:
    """kd tree类"""

    def __init__(self, data):
        # 数据集
        self.data = np.asarray(data)
        # kd树
        self.kd_tree = None
        # 创建平衡kd树
        self._create_kd_tree(data)

    def _split_sub_tree(self, data, depth=0):
        # 算法3.2第3步:直到子区域没有实例存在时停止
        if len(data) == 0:
            return None
        # 算法3.2第2步:选择切分坐标轴, 从0开始(书中是从1开始)
        l = depth % data.shape[1]
        # 对数据进行排序
        data = data[data[:, l].argsort()]
        # 算法3.2第1步:将所有实例坐标的中位数作为切分点
        median_index = data.shape[0] // 2
        # 获取结点在数据集中的位置
        node_index = [i for i, v in enumerate(
            self.data) if list(v) == list(data[median_index])]
        return Node(
            # 本结点
            value=data[median_index],
            # 本结点在数据集中的位置
            index=node_index[0],
            # 左子结点
            left_child=self._split_sub_tree(data[:median_index], depth + 1),
            # 右子结点
            right_child=self._split_sub_tree(
                data[median_index + 1:], depth + 1)
        )

    def _create_kd_tree(self, X):
        self.kd_tree = self._split_sub_tree(X)

    def query(self, data, k=1):
        data = np.asarray(data)
        hits = self._search(data, self.kd_tree, k=k, k_neighbor_sets=list())
        dd = np.array([hit[0] for hit in hits])
        ii = np.array([hit[1] for hit in hits])
        return dd, ii

    def __repr__(self):
        return str(self.kd_tree)

    @staticmethod
    def _cal_node_distance(node1, node2):
        """计算两个结点之间的距离"""
        return np.sqrt(np.sum(np.square(node1 - node2)))

    def _search(self, point, tree=None, k=1, k_neighbor_sets=None, depth=0):
        n = len(point)
        if k_neighbor_sets is None:
            k_neighbor_sets = []
        if tree is None:
            return k_neighbor_sets

        # (1)找到包含目标点x的叶结点
        if tree.left_child is None and tree.right_child is None:
            # 更新当前k近邻点集
            return self._update_k_neighbor_sets(k_neighbor_sets, k, tree, point)
        

        # 递归地向下访问kd树
        if point[0][depth % n] < tree.value[depth % n]:
            direct = 'left'
            next_branch = tree.left_child
        else:
            direct = 'right'
            next_branch = tree.right_child
        if next_branch is not None:
            # # (3)(a) 判断当前结点,并更新当前k近邻点集
            # k_neighbor_sets = self._update_k_neighbor_sets(
            #     k_neighbor_sets, k, next_branch, point)
            # # (3)(b)检查另一子结点对应的区域是否相交
            # k_neighbor_sets = self._update_k_neighbor_sets(k_neighbor_sets, k, tree, point)
            k_neighbor_sets = self._search(point, tree=next_branch, k=k, depth=depth + 1, k_neighbor_sets=k_neighbor_sets)
            
            temp_dist = abs(tree.value[depth % n] - point[0][depth % n])  # 第s维上目标点与分割超平面的距离
            
            if direct == 'left':
                if not (k_neighbor_sets[0][0] < temp_dist and len(k_neighbor_sets) == k):  # 判断超球体是否与超平面相交
                    # 如果相交,递归地进行近邻搜索
                    k_neighbor_sets = self._update_k_neighbor_sets(k_neighbor_sets, k, tree, point) # 判断当前结点,并更新当前k近邻点集
                    return self._search(point, tree=tree.right_child, k=k, depth=depth + 1,
                                        k_neighbor_sets=k_neighbor_sets)
            else:
                if not (k_neighbor_sets[0][0] < temp_dist and len(k_neighbor_sets) == k):  # 判断超球体是否与超平面相交
                    # 如果相交,递归地进行近邻搜索
                    k_neighbor_sets = self._update_k_neighbor_sets(k_neighbor_sets, k, tree, point) # 判断当前结点,并更新当前k近邻点集
                    return self._search(point, tree=tree.left_child, k=k, depth=depth + 1,
                                        k_neighbor_sets=k_neighbor_sets)

        return k_neighbor_sets

    def _update_k_neighbor_sets(self, best, k, tree, point):
        # 计算目标点与当前结点的距离
        node_distance = self._cal_node_distance(point, tree.value)
        if len(best) == 0:
            best.append((node_distance, tree.index, tree.value))
        elif len(best) < k:
            # 如果“当前k近邻点集”元素数量小于k
            self._insert_k_neighbor_sets(best, tree, node_distance)
        else:
            # 叶节点距离小于“当前 𝑘 近邻点集”中最远点距离
            if best[0][0] > node_distance:
                best = best[1:]
                self._insert_k_neighbor_sets(best, tree, node_distance)
        return best

    @staticmethod
    def _insert_k_neighbor_sets(best, tree, node_distance):
        """将距离最远的结点排在前面"""
        n = len(best)
        for i, item in enumerate(best):
            if item[0] < node_distance:
                # 将距离最远的结点插入到前面
                best.insert(i, (node_distance, tree.index, tree.value))
                break
        if len(best) == n:
            best.append((node_distance, tree.index, tree.value))

习题1.1的解答勘误

解答步骤的第1步,“X的概率分布函数,即伯努利模型可写为”应改为“X的概率分布,即伯努利模型可写为”。
那个公式不是概率分布函数。
望纠正。

所有公式都不能正常显示

您好,我使用的是mac设备,safari和Chrome浏览器都不能正常显示公式。
$$ \displaystyle P(Y=c_k) = \frac{\displaystyle \sum......

习题14.3

习题14.3中T(n,k) 指数生成函数表达式不是显然的。此部分是该题的实际核心所在,而展开生成函数反而是简明的。请补充该部分的细节。

Cp7 - 3 solution

I think the solution of 7.3 may have some problems.
image
and my final result is
image

某些答案中的页码给定错误

习题21.3中,

根据书中第422页PageRank的一般定义

我使用的是<机器学习方法>这一本书, 实际的页码应该是355.
我们的项目现在应该是<机器学习方法>的答案而不是<统计学习方法>, 因为<统计学习方法>没有第三篇深度学习的内容.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.