ljpzzz / machinelearning Goto Github PK
View Code? Open in Web Editor NEWMy blogs and code for machine learning. http://cnblogs.com/pinard
License: MIT License
My blogs and code for machine learning. http://cnblogs.com/pinard
License: MIT License
https://github.com/ljpzzz/machinelearning/blob/master/reinforcement-learning/policy_gradient.py
直接复制运行
episode: 0 Evaluation Average Reward: 15.0
episode: 100 Evaluation Average Reward: 10.2
episode: 200 Evaluation Average Reward: 9.2
episode: 300 Evaluation Average Reward: 9.3
episode: 400 Evaluation Average Reward: 9.4
episode: 500 Evaluation Average Reward: 9.0
episode: 600 Evaluation Average Reward: 9.6
episode: 700 Evaluation Average Reward: 9.4
episode: 800 Evaluation Average Reward: 9.7
episode: 900 Evaluation Average Reward: 9.5
episode: 1000 Evaluation Average Reward: 9.6
episode: 1100 Evaluation Average Reward: 9.7
episode: 1200 Evaluation Average Reward: 9.1
episode: 1300 Evaluation Average Reward: 9.2
episode: 1400 Evaluation Average Reward: 9.3
episode: 1500 Evaluation Average Reward: 9.3
episode: 1600 Evaluation Average Reward: 9.4
episode: 1700 Evaluation Average Reward: 9.3
episode: 1800 Evaluation Average Reward: 9.4
episode: 1900 Evaluation Average Reward: 8.8
episode: 2000 Evaluation Average Reward: 9.3
episode: 2100 Evaluation Average Reward: 9.4
episode: 2200 Evaluation Average Reward: 9.6
episode: 2300 Evaluation Average Reward: 9.6
episode: 2400 Evaluation Average Reward: 9.3
episode: 2500 Evaluation Average Reward: 9.3
episode: 2600 Evaluation Average Reward: 9.4
episode: 2700 Evaluation Average Reward: 9.7
episode: 2800 Evaluation Average Reward: 9.6
episode: 2900 Evaluation Average Reward: 9.8
写的非常棒,博主很厉害,卷积网络的反向传播的介绍主要是参考哪个资料呢,您给出的几个参考资料和您的叙述有比较大的差距,我想看看原始资料的叙述,谢谢。如果时间太长已经不记得了那就算了,我自己尝试手推推,因为想实现一个卷积网络反向传播,深入理解一下数学运算和原理。
您好,长期阅读您深入浅出的博客文章,获益匪浅。
不知您是否有公众号?方便订阅和日常移动端阅读。
如果没有,小小建议您能开通一个,更大范围地传播您整理的知识精华。
幸苦博主整理这么好的资源。
《word2vec原理(一) CBOW与Skip-Gram模型基础》这个资源里面halfman的编码图好像错了,e和a在子结点的顺序是反的
在sarsa_windygrid文件中第83行有一句我不太理解这句话的构造,但是我能懂什么意思。麻烦您给详细解释一下语法问题。这句话应该是表示选出最大Q值的动作。enumerate我懂什么意思,但是choice括号里的连起来我就不太懂了,麻烦您给详解一下。
next_action = np.random.choice([action_ for action_, value_ in enumerate(values_) if value_ == np.max(values_)])
print (docres)
[[0.00896713 0.99103287]
[0.98510899 0.01489101]
[0.98466115 0.01533885]]
我就是用你的代码跑的,得到这样的结果。。。
请问有什么我误解的吗?
我唯二的改变是拿走了
document_decode = document.decode('GBK')和# result = result.encode('utf-8')、
但是我在with open里面加了encoding="utf8"
谢谢。
加入了此篇博客“卷积神经网络(CNN)反向传播算法”的MATLAB代码实现,如有必要,麻烦作者可以引用一下这个link。
您好,请问在target net网络的代码里,为什么target_Q_value计算用的是h_layer?应该是h_layer_t吧
h_layer_t = tf.nn.relu(tf.matmul(self.state_input,W1t) + b1t)
self.target_Q_value = tf.matmul(h_layer,W2t) + b2t
库里没给依赖包的版本,直接用新版的跑不起来,我试了一些才跑起来,这里列一下仅供参考。
mac os
tensorflow 1.14
pyglet 1.5.11
gym 0.9.6
期待楼主写个xgbt的文章,归纳的总结。类似sklearn中如何使用lr的那种,介绍xgbt主要使用的损失函数,特点之类的
您好,最近在学习DNN。前后向传播、激活函数+损失函数、正则化都有所了解。现在的困扰是,对于DNN来说,重点是什么呢?现在的理解是前面那4个概念+网络结构(流行的网络结构,如CNN,RNN等),我需要做的就是学习这些网络结构去解决工作中遇到的问题。我的意思是,还有什么其他的重点,比如,网络慢,效果不好等这些具体的情况该如何优化之类的知识
强化学习第一篇,第218行,更新estimations时,为什么要过滤掉探索动作的收益,这样的话探索率epsilon还有意义吗?
hi,您好。请教一下
你好,我想问一下就是最后求权重梯度的。
1.我的各个门的输出变量的维度是(batchsize,hiddensize),输入变量x的维度是(batchsize,inputsize),因为我有batchsize嘛,所以想batchsize中的每个样本单独计算,但是现在的维度并不一致。δ(t)C⊙C(t−1)⊙f(t)⊙(1−f(t))计算完的维度是(hiddensize),然而,h(t−1)的维度也是(hiddensize),(h(t−1))T转置之后的维度是(1,hiddensize),这样进行矩阵乘法后,计算出来的维度是1,但我的Wfh的维度是(hiddensize,hiddensize)啊,后来对于这个问题,我用δ(t)C⊙C(t−1)⊙f(t)⊙(1−f(t)) * h(t−1).unsqueeze(dim=1),这样维度对了,请问这样可以嘛?
2.当为计算Wfx的维度的时候,δ(t)C⊙C(t−1)⊙f(t)⊙(1−f(t))计算完的维度是(hiddensize),然而,x(t)的维度是(inputsize),两者又出现不一致的情况,我又将矩阵乘法改成了δ(t)C⊙C(t−1)⊙f(t)⊙(1−f(t)).unsqueeze(dim=1) * x(t).unsqueeze(dim=0),不知道这样写严不严谨?
希望能得到您的帮助
https://www.cnblogs.com/pinard/p/5976811.html
您好,最小二乘法和损失函数是平方损失有什么区别么,一般平方损失是用梯度下降迭代法求解,最小二乘法是用解析法直接求解,梯度下降法是数值法?
DDPG中的损失函数是不是和原文中不一致?
我是在pycharm里导入了numpy,gym,tensorflow模块,在pycharm里运行的,然后就出现了如下错误。
Traceback (most recent call last):
File "C:\Program Files\JetBrains\PyCharm 2018.2.4\helpers\pydev_pydev_bundle\pydev_import_hook.py", line 20, in do_import
module = self._system_import(name, *args, **kwargs)
ModuleNotFoundError: No module named 'numpy.core._multiarray_umath'
ImportError: numpy.core.multiarray failed to import
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "C:\Program Files\JetBrains\PyCharm 2018.2.4\helpers\pydev_pydev_bundle\pydev_import_hook.py", line 20, in do_import
module = self._system_import(name, *args, **kwargs)
File "", line 980, in _find_and_load
SystemError: <class '_frozen_importlib._ModuleLockManager'> returned a result with an error set
ImportError: numpy.core._multiarray_umath failed to import
ImportError: numpy.core.umath failed to import
2019-04-10 10:23:13.232651: F tensorflow/python/lib/core/bfloat16.cc:675] Check failed: PyBfloat16_Type.tp_base != nullptr
`import tensorflow as tf
import numpy as np
import gym
import random
from collections import deque
from keras.utils.np_utils import to_categorical
import tensorflow.keras.backend as K
class QNetwork(tf.keras.Model):
def __init__(self):
super().__init__()
self.dense1=tf.keras.layers.Dense(24,activation='relu')
self.dense2=tf.keras.layers.Dense(2)
self.dense3=tf.keras.layers.Dense(24,activation='relu')
self.dense4=tf.keras.layers.Dense(2)
def call(self,inputs):
x=self.dense1(inputs)
x=self.dense2(x)
return x
def tarNet_Q(self,inputs):
x=self.dense3(inputs)
x=self.dense4(x)
return x
def get_action(self,inputs):
q_values=self(inputs)
return K.eval(tf.argmax(q_values,axis=-1))[0]
env=gym.make('CartPole-v0')
num_episodes=300
num_exploration=200
max_len=400
batch_size=32
lr=1e-3
gamma=0.9
initial_epsilon=0.5
final_epsilon=0.01
replay_buffer=deque(maxlen=10000)
tarNet_update_frequence=10
optimizer=tf.train.AdamOptimizer(learning_rate=lr)
qNet=QNetwork()
for i in range(1,num_episodes+1):
state=env.reset()
epsilon=max(initial_epsilon*(num_exploration-i)/num_exploration,final_epsilon)
for t in range(max_len):#设置最大得分1000
if random.random()<epsilon:
action=env.action_space.sample()
else:
action=qNet.get_action(tf.constant(np.expand_dims(state,axis=0),dtype=tf.float32))
next_state,reward,done,info=env.step(action)
reward=-1.if done else reward
replay_buffer.append((state,action,reward,next_state,done))
state=next_state
if done:
print('episode %d,epsilon %f,score %d'%(i,epsilon,t))
break
if len(replay_buffer)>=batch_size:
batch_state,batch_action,batch_reward,batch_next_state,batch_done=
[np.array(a,dtype=np.float32) for a in zip(random.sample(replay_buffer,batch_size))]
q_value=qNet.tarNet_Q(tf.constant(batch_next_state,dtype=tf.float32))
y=batch_reward+(gammatf.reduce_max(q_value,axis=1))*(1-batch_done)
with tf.GradientTape() as tape:
loss=tf.losses.mean_squared_error(y,tf.reduce_max(
qNet(tf.constant(batch_state))*to_categorical(batch_action,num_classes=2),axis=1))
grads=tape.gradient(loss,qNet.variables[:4])
optimizer.apply_gradients(grads_and_vars=zip(grads,qNet.variables[:4]))
if i%tarNet_update_frequence==0:
for j in range(2):
tf.assign(qNet.variables[4+j],qNet.dense1.get_weights()[j])
tf.assign(qNet.variables[6+j],qNet.dense2.get_weights()[j])
env.close()
`
我觉得运行慢是因为复制网络参数的方式不对,请看到的兄弟姐妹给个建议。
在dqn算例中第140行为什么要用 reward = -1 if done else 0.1
重新覆写reward为1或者0.1呢?而不是用gym环境给出的reward。https://zhuanlan.zhihu.com/p/21477488 这篇文章中结构差不多,但没有覆写,而是一个新的变量reward_agent = -1 if done else 0.1
,其他dqn变种的算例中也都同样如此。
请问一下经典机器学习算法实现里面的数据集是不是不见了
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.