peterroelants / peterroelants.github.io Goto Github PK

View Code? Open in Web Editor NEW

337.0 337.0 184.0 9.17 MB

Blog

Home Page: http://peterroelants.github.io/

License: Mozilla Public License 2.0

HTML 46.22% Ruby 0.01% CSS 0.51% JavaScript 0.02% Shell 0.05% Python 0.06% Jupyter Notebook 53.14%

peterroelants.github.io's People

Contributors

Stargazers

Watchers

Forkers

congyangmin nextgenintelligence darrengarvey laisun nousefreak guanlongtianzi sandy4321 killedision oftensmile algorismes recluze ln-nicolas fork-repo zhiyue-archive hehuanshu96 kentchun33333 xiaojiamu chenb1981 xiyuanhou nagyistge countryold nakramr curiousfungus zhuohuwu0603 yuan39 adriannr claybourne ram-cse isaachook curtis999 samuelschen robomate xandrade raghavendranpm higepon jqlts1 arumugamkasi hailingc fedorkononov enavarrocomes qing1994 li-michael andewy scchess zhuangzp lyj7110226 lalmanea dw198033 montecarlo1 audrey666 tpnguyen jackchinor puwei eulertech zhangxinsheng yp607 rp98 kushalveer afugls tomarraj008 rose312 dkolmas rue-vmom timothyhay zyzdiana prasadshunmugam chenwanlong007 zhangyforever qyynuaa erbay umariqb deveshwarh1996 deafrhino rahulgrover99 trefoil-ml satpreetsingh kush2418 lizethgo dragonlilo hasithanekkalapu how2 amoazeni fafasonga hadizayer frank-lb mageed cmdev8 tompxu scheeloong laszlofabian jattmannu mahbubk9 hbcbh1999 shafiahmed shotaromiwa sultansidhu marbramen lcchen008 afcarl katieannbaker

peterroelants.github.io's Issues

Equivalent implementation

Not an issue, but I've put together a reproduction of the algorithm with some more OO and my favorite plotting library, at https://github.com/matanster/bandits. Just saying, and thanks for the original post on this!

Wrong sigma for functions

peterroelants.github.io/_posts/2015-06-10-cross-entropy-softmax.html

Line 105 in 6ffce2c

 . This softmax function $\varsigma$ takes as input a $C$-dimensional vector $\mathbf{z}$ and outputs a $C$-dimensional vector $\mathbf{y}$ of real values between $0$ and $1$. This function is a normalized exponential and is defined as: 

Greek letter ς is only used as last letter in words that end with an s sound (weird rule I know). In math
to denote functions we use regular sigma σ (in mathml \sigma).

Error in GP example with noise

I believe there is an error in the GP example with noise. More precisely, in:
Σ11 = kernel_func(X1, X1) + σ_noise * np.eye(n1)
one should be adding sigma**2, because it is covariance matrix.

RNN part1

Hi, I think there is a mistake in the equation ds_k/ds_{k-m}... in the last factor you have ds_{k-m+1}/ds_{k-1} but I think it should be ds_{k-m+1}/ds_{k-m}

SyntaxError: only named arguments may follow *expression

Hi,
when running
np.expand_dims(np.linspace(*xlim, 25), 1)
I get
SyntaxError: only named arguments may follow *expression

I am running the same version of packages and python as in the notebook.

Clarification with notes on "Understanding Gaussian Processes"

Hi Peter, first thanks so much for putting your notes on machine learning online - I found the article "Understanding Gaussian processes" particularly rigorous and helpful.

Can I please clarify two things in that particular post?

In the section "Predictions from posterior", can I please verify if the computations for the conditional distribution is correct? Specifically,

\mu_{2 | 1} &= \mu_{2}+ \Sigma_{21} \Sigma_{11}^{-1}\left(\mathbf{y}{1}-\mu{1}\right)

\Sigma_{2 | 1}=\Sigma_{22}-\Sigma_{21} \Sigma_{1}^{-1} \Sigma_{12}

should be

\mu_{2 | 1} &=\mu_{2}+ \Sigma_{12} \Sigma_{22}^{-1}\left(\mathbf{y}{1}-\mu{1}\right)

\Sigma_{2 | 1}=\Sigma_{22}-\Sigma_{12} \Sigma_{22}^{-1} \Sigma_{21}

I've derived the computation based on your post on conditional distribution here.

In the section "Predictions from posterior", you stated that "Keep in mind that y_1 and y_2 are jointly Gaussian since they both should come from the same function. Can I please clarify that "same function" means that both y_1 and y_2 came from the same gaussian distribution over functions f(x)?

Thanks for your time!

Neural Network Intercept Bias

I am going through a NN tutorial from this website

I am confused about one particular paragraph in this page (screenshot below).

Is the choice of the intercept bias of -1 purely arbitrary? I don't quite understand his explanation.
It said in the screenshot that the RBF function maps all values to a range of [0, +infinity]. However, the RBF function only maps to a range of [0,1]. Is this a mistake? And how does this positive range lead to a choice of -1 intercept bias?

From: http://stackoverflow.com/q/41989488/1387612

Link to the IPYNB notebook mentioned in https://peterroelants.github.io/posts/rnn-implementation-part02/ is broken

In the blog https://peterroelants.github.io/posts/rnn-implementation-part02/ , at the end there is a link to the IPYNB which redirects to

https://github.com/peterroelants/peterroelants.github.io/blob/main/notebooks/RNN_implementation/rnn-implementation-part02.ipynb

but it is actually

https://github.com/peterroelants/peterroelants.github.io/blob/main/notebooks/rnn_implementation/rnn-implementation-part02.ipynb

Need to change that in HTML file

Problem when running gaussian-process-kernel-fitting.ipynb

Thank you very much for the great repo!
When I try to run the code in the notebook "gaussian-process-kernel-fitting.ipynb" in the "Tuning the hyperparameters" section I get the exception
"RuntimeError: loss passed to Optimizer.compute_gradients should be a function when eager execution is enabled."

It seems to be related to tensorflow version, but I could not solve it myself.
I tried the solution mention here:
https://stackoverflow.com/questions/57858219/loss-passed-to-optimizer-compute-gradients-should-be-a-function-when-eager-exe
However, that creates another problem.

my evironment is running under:
python 3.6.9
tensorflow==2.1.0
tensorflow-estimator==2.1.0
tensorflow-probability==0.9.0

Error in the math

Edited: No error, my mistake

Wrong matrix dimensions

peterroelants.github.io/_posts/2015-06-10-neural-network-implementation-part04.html

Line 1338 in 6ffce2c

The resulting gradient is a $2 \times 1$ Jacobian matrix:

should be 1x2

convert ipython notebook to jekyll pages

This is not an issue per se, but I am wondering what method you used to convert ipython notebook to a github pages. Can you briefly shared your experience?

Thank you. This is great project.

missing - in negative log marginal likelihood

in gaussian-process-kernel-fitting.ipynb, do we need a '-' for the negative log marginal likelihood?
https://peterroelants.github.io/posts/gaussian-process-kernel-fitting/

Gaussian process tutorial notebook not working on local machine

Heya!

I tried to run the GP tutorial notebook on my local machine, but got the following error pop up:

NotJSONError('Notebook does not appear to be JSON: \'{\\n "cells": [\\n {\\n "cell_type": "m...')

The other notebooks in the directory work just fine. Tried to run the JSON through an online validator and it passed. Any ideas?

Thanks!

Part 1: weights diverge when using more input samples

First, thank you for these articles!
However, when playing with the code, if I changed the number of input samples to 40 I get this result:

w(0): 0.1000     cost: 46.1816
w(1): 4.7754     cost: 92.1105
w(2): -1.8647    cost: 184.7509
w(3): 7.5657     cost: 371.6103
w(4): -5.8276    cost: 748.5129

I solved this by using a learning rate inversely proportional to the number of samples, i.e.
learning_rate = 2 / nb_of samples
instead of a fixed 0.1.

I tested it with sample sizes from 5 to 10 million, and it seems to always converge now.
I don't know if this makes any mathematical sense, just want to let you know.

missing terms in partial derivatives?

Peter,

Thank you so much for the great RNN tutorial post. This might seem long, but it is very quick.

1 - For Part 1, you defined the states array S to be 1x1. How will your example change if one decided to use 2 hidden states for example. The clear final solution is that one of them will be turned off, but how would you define it?. In this case your wRec will be 2x1 right?

In the example you provided for this section, you assumes that the weight between the last state and the final output is already given and it equals 1, right? since all the RNN exmaples talks about Wx, Wrec, and Wy that goes from hidden to output.
Also, how would you arrange the data if you want multi-dimensional input and multi-dimensional output at the same time? For example, each time step has a vector input and a vector output.

2- In the same part - section “Compute the gradients with the backward step”; you explain BPTT briefly, and it is not clear to me how you came up with the partial derivatives. I worked out a small 3 time steps example.

Questions:

Why your summation starts from 0?
Why there is dc/dSk ? , dc : “partial for cost”
I found that there is Wrec in the derivatives. Did you miss that or is it included somewhere?

My Example,

dc/dwx = dc/dy * dy/wx
dc/dy = 2(y - t)

but y in this example is nothing but (S2 * 1), so:

y = S3
y = x3 * wx + S2_Wrec ,… substitute for S2
y = x3 * wx + (x2_ wx + S1* Wrec )* Wrec , …. expand
y = x3 * wx + x2* wx * Wrec + S1* Wrec^2 ,…. Substitute for S1
y = x3 * wx + x2* wx * Wrec + (x1_wx + S0 * Wrec )_ Wrec^2 ,… expand
y = x3 * wx + x2* wx * Wrec + x1*wx * Wrec^2 + S0 * Wrec^3

then,

dy/dwx = x3 + x2 * Wrec + x1 * Wrec^2
= sum (xi * Wrec^(i-1)) where i = {1,2,3}

Best,
-M