Giter Club home page Giter Club logo

machinelearning's People

Contributors

ljpzzz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

machinelearning's Issues

policy_gradient没有效果

https://github.com/ljpzzz/machinelearning/blob/master/reinforcement-learning/policy_gradient.py

直接复制运行
episode: 0 Evaluation Average Reward: 15.0
episode: 100 Evaluation Average Reward: 10.2
episode: 200 Evaluation Average Reward: 9.2
episode: 300 Evaluation Average Reward: 9.3
episode: 400 Evaluation Average Reward: 9.4
episode: 500 Evaluation Average Reward: 9.0
episode: 600 Evaluation Average Reward: 9.6
episode: 700 Evaluation Average Reward: 9.4
episode: 800 Evaluation Average Reward: 9.7
episode: 900 Evaluation Average Reward: 9.5
episode: 1000 Evaluation Average Reward: 9.6
episode: 1100 Evaluation Average Reward: 9.7
episode: 1200 Evaluation Average Reward: 9.1
episode: 1300 Evaluation Average Reward: 9.2
episode: 1400 Evaluation Average Reward: 9.3
episode: 1500 Evaluation Average Reward: 9.3
episode: 1600 Evaluation Average Reward: 9.4
episode: 1700 Evaluation Average Reward: 9.3
episode: 1800 Evaluation Average Reward: 9.4
episode: 1900 Evaluation Average Reward: 8.8
episode: 2000 Evaluation Average Reward: 9.3
episode: 2100 Evaluation Average Reward: 9.4
episode: 2200 Evaluation Average Reward: 9.6
episode: 2300 Evaluation Average Reward: 9.6
episode: 2400 Evaluation Average Reward: 9.3
episode: 2500 Evaluation Average Reward: 9.3
episode: 2600 Evaluation Average Reward: 9.4
episode: 2700 Evaluation Average Reward: 9.7
episode: 2800 Evaluation Average Reward: 9.6
episode: 2900 Evaluation Average Reward: 9.8

关于卷积网络反向传播

写的非常棒,博主很厉害,卷积网络的反向传播的介绍主要是参考哪个资料呢,您给出的几个参考资料和您的叙述有比较大的差距,我想看看原始资料的叙述,谢谢。如果时间太长已经不记得了那就算了,我自己尝试手推推,因为想实现一个卷积网络反向传播,深入理解一下数学运算和原理。

公众号建议

您好,长期阅读您深入浅出的博客文章,获益匪浅。
不知您是否有公众号?方便订阅和日常移动端阅读。
如果没有,小小建议您能开通一个,更大范围地传播您整理的知识精华。

语法解答

在sarsa_windygrid文件中第83行有一句我不太理解这句话的构造,但是我能懂什么意思。麻烦您给详细解释一下语法问题。这句话应该是表示选出最大Q值的动作。enumerate我懂什么意思,但是choice括号里的连起来我就不太懂了,麻烦您给详解一下。
next_action = np.random.choice([action_ for action_, value_ in enumerate(values_) if value_ == np.max(values_)])

谢谢你的上传,学到很多,但是我跑的结果和你不一样

print (docres)
[[0.00896713 0.99103287]
[0.98510899 0.01489101]
[0.98466115 0.01533885]]
我就是用你的代码跑的,得到这样的结果。。。
请问有什么我误解的吗?
我唯二的改变是拿走了
document_decode = document.decode('GBK')和# result = result.encode('utf-8')、
但是我在with open里面加了encoding="utf8"
谢谢。

nature_dqn.py代码问题

您好,请问在target net网络的代码里,为什么target_Q_value计算用的是h_layer?应该是h_layer_t吧

hidden layers

h_layer_t = tf.nn.relu(tf.matmul(self.state_input,W1t) + b1t)

Q Value layer

self.target_Q_value = tf.matmul(h_layer,W2t) + b2t

贴一下我用的包

库里没给依赖包的版本,直接用新版的跑不起来,我试了一些才跑起来,这里列一下仅供参考。
mac os
tensorflow 1.14
pyglet 1.5.11
gym 0.9.6

关于DDPG中actor的loss问题

在您的博客中读到actor的损失函数如下。

QQ20210325-0

我的理解是,对那个目标函数的梯度做积分就是下面那个式子(只不过没有负号),然后这个目标函数加个负号就可以用来表示损失函数,不知这样理解是否正确?
还有就是,我的actor网络是输入状态s,输出一个连续的动作值a,那么对应您这个公式是否就是直接把对应S和A的crtic网络生成的Q值作为损失函数计算呢?

xgbt

期待楼主写个xgbt的文章,归纳的总结。类似sklearn中如何使用lr的那种,介绍xgbt主要使用的损失函数,特点之类的

DNN

您好,最近在学习DNN。前后向传播、激活函数+损失函数、正则化都有所了解。现在的困扰是,对于DNN来说,重点是什么呢?现在的理解是前面那4个概念+网络结构(流行的网络结构,如CNN,RNN等),我需要做的就是学习这些网络结构去解决工作中遇到的问题。我的意思是,还有什么其他的重点,比如,网络慢,效果不好等这些具体的情况该如何优化之类的知识

关于更新values

强化学习第一篇,第218行,更新estimations时,为什么要过滤掉探索动作的收益,这样的话探索率epsilon还有意义吗?

决策树方法特征处理

hi,您好。请教一下

  1. 在ID3,C4.5方法中,对于离散特征数据划分,是说有多少个取值就分成多少个分支么,如果这个特征的取值有100个,也是分成100个分支么,还是说会限制最多分多少个?
  2. 在CART树中,对离散特征是取子集划分。像文中给的例子,如果特征A有三个取值,会分成{(A1,A2),A3}, {A1, (A2,A3)}, {(A1,A3),A2},然后选一个基尼系数大的。当特征取值很多时,取子集就有很多很多种了,在计算量上未免太大?
  3. 在所有的树模型中,如果我们把样本扔进去,模型是怎么区分是离散特征还是连续特征呢?因为看起来输入的时候,并没有告诉模型,哪个是离散特征哪个是连续特征?

LSTM反向传播

你好,我想问一下就是最后求权重梯度的。

1.我的各个门的输出变量的维度是(batchsize,hiddensize),输入变量x的维度是(batchsize,inputsize),因为我有batchsize嘛,所以想batchsize中的每个样本单独计算,但是现在的维度并不一致。δ(t)C⊙C(t−1)⊙f(t)⊙(1−f(t))计算完的维度是(hiddensize),然而,h(t−1)的维度也是(hiddensize),(h(t−1))T转置之后的维度是(1,hiddensize),这样进行矩阵乘法后,计算出来的维度是1,但我的Wfh的维度是(hiddensize,hiddensize)啊,后来对于这个问题,我用δ(t)C⊙C(t−1)⊙f(t)⊙(1−f(t)) * h(t−1).unsqueeze(dim=1),这样维度对了,请问这样可以嘛?

2.当为计算Wfx的维度的时候,δ(t)C⊙C(t−1)⊙f(t)⊙(1−f(t))计算完的维度是(hiddensize),然而,x(t)的维度是(inputsize),两者又出现不一致的情况,我又将矩阵乘法改成了δ(t)C⊙C(t−1)⊙f(t)⊙(1−f(t)).unsqueeze(dim=1) * x(t).unsqueeze(dim=0),不知道这样写严不严谨?

希望能得到您的帮助

运行reinforcement learning里的ddqn出现了错误,可能是相关配置不对,麻烦您给解答一下

我是在pycharm里导入了numpy,gym,tensorflow模块,在pycharm里运行的,然后就出现了如下错误。
Traceback (most recent call last):
File "C:\Program Files\JetBrains\PyCharm 2018.2.4\helpers\pydev_pydev_bundle\pydev_import_hook.py", line 20, in do_import
module = self._system_import(name, *args, **kwargs)
ModuleNotFoundError: No module named 'numpy.core._multiarray_umath'
ImportError: numpy.core.multiarray failed to import
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "C:\Program Files\JetBrains\PyCharm 2018.2.4\helpers\pydev_pydev_bundle\pydev_import_hook.py", line 20, in do_import
module = self._system_import(name, *args, **kwargs)
File "", line 980, in _find_and_load
SystemError: <class '_frozen_importlib._ModuleLockManager'> returned a result with an error set
ImportError: numpy.core._multiarray_umath failed to import
ImportError: numpy.core.umath failed to import
2019-04-10 10:23:13.232651: F tensorflow/python/lib/core/bfloat16.cc:675] Check failed: PyBfloat16_Type.tp_base != nullptr

您好,可以帮我看看这个Nature-DQN的代码吗?运行特别慢

`import tensorflow as tf
import numpy as np
import gym
import random
from collections import deque
from keras.utils.np_utils import to_categorical

import tensorflow.keras.backend as K

class QNetwork(tf.keras.Model):

def __init__(self):
    super().__init__()
    self.dense1=tf.keras.layers.Dense(24,activation='relu')
    self.dense2=tf.keras.layers.Dense(2)
    self.dense3=tf.keras.layers.Dense(24,activation='relu')
    self.dense4=tf.keras.layers.Dense(2)

def call(self,inputs):
    x=self.dense1(inputs)
    x=self.dense2(x)
    return x

def tarNet_Q(self,inputs):
    x=self.dense3(inputs)
    x=self.dense4(x)
    return x

def get_action(self,inputs):
    q_values=self(inputs)
    return K.eval(tf.argmax(q_values,axis=-1))[0]

env=gym.make('CartPole-v0')

num_episodes=300
num_exploration=200
max_len=400
batch_size=32
lr=1e-3
gamma=0.9
initial_epsilon=0.5
final_epsilon=0.01
replay_buffer=deque(maxlen=10000)
tarNet_update_frequence=10
optimizer=tf.train.AdamOptimizer(learning_rate=lr)
qNet=QNetwork()
for i in range(1,num_episodes+1):
state=env.reset()
epsilon=max(initial_epsilon*(num_exploration-i)/num_exploration,final_epsilon)
for t in range(max_len):#设置最大得分1000
if random.random()<epsilon:
action=env.action_space.sample()
else:
action=qNet.get_action(tf.constant(np.expand_dims(state,axis=0),dtype=tf.float32))
next_state,reward,done,info=env.step(action)
reward=-1.if done else reward
replay_buffer.append((state,action,reward,next_state,done))
state=next_state
if done:
print('episode %d,epsilon %f,score %d'%(i,epsilon,t))
break
if len(replay_buffer)>=batch_size:
batch_state,batch_action,batch_reward,batch_next_state,batch_done=
[np.array(a,dtype=np.float32) for a in zip(random.sample(replay_buffer,batch_size))]
q_value=qNet.tarNet_Q(tf.constant(batch_next_state,dtype=tf.float32))
y=batch_reward+(gamma
tf.reduce_max(q_value,axis=1))*(1-batch_done)
with tf.GradientTape() as tape:
loss=tf.losses.mean_squared_error(y,tf.reduce_max(
qNet(tf.constant(batch_state))*to_categorical(batch_action,num_classes=2),axis=1))
grads=tape.gradient(loss,qNet.variables[:4])
optimizer.apply_gradients(grads_and_vars=zip(grads,qNet.variables[:4]))
if i%tarNet_update_frequence==0:
for j in range(2):
tf.assign(qNet.variables[4+j],qNet.dense1.get_weights()[j])
tf.assign(qNet.variables[6+j],qNet.dense2.get_weights()[j])
env.close()
`
我觉得运行慢是因为复制网络参数的方式不对,请看到的兄弟姐妹给个建议。

数据集

请问一下经典机器学习算法实现里面的数据集是不是不见了

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.