gpapamak / maf Goto Github PK
View Code? Open in Web Editor NEWMasked Autoregressive Flow
License: Other
Masked Autoregressive Flow
License: Other
I'm not an expert but I've been working hard to run this code on google colab. Looks like this is not working with the latest python package. Please make those small changes that are required to run on python3 so that people like me can run this code. Nice paper by the way!
Hello,
Thank you for the code.
Could you specify the preprocessing methods you apply in the original datasets (e.g. mnist)? Apart from dequantization, logit and all the functions which are already in the code.
Line 234 in 3239e80
The tf.sum term should be tf.sum(self.u ** 2 * tf.exp(self.logp) - self.logp))
Hi!
Thanks for sharing amazing work!
I'm trying to port your code to PyTorch (for further use in my research).
I have a question regarding your implementation of Batch Norm. As you mention in the paper, it's implemented using global batch statistics. Could you please provide pointers to the lines where it is implemented exactly? My knowledge of Theano is a little bit rusty.
When doing density estimation on the UCI datasets HEPMASS and MiniBooNE, I saw in the appendix D.2 of the article that several dimensions of the raw data were removed since certain real values are reoccurring too frequently. This does make sense to me since such densities would involve Dirac delta distributions being problematic when trying to estimate them with continuous densities. However, when I checked the code I stumbled upon the following lines:
Line 91 in ea057bf
Line 52 in ea057bf
sorted
function is sorting the array based on the first entry, which is the real value corresponding to the count and not the count itself. I demonstrate this problem in the following notebook:max_count
is computed correctly, i.e. by using
max_count = np.max(np.unique(feature, return_counts=True)[1])
On the other side, for MiniBooNE there are some dimension which are drop although max_count
is only moderately high, e.g. 6, while dimensions with values reoccurring 3434 times are kept.
This might be a minor issue but since the version of the MiniBooNE dataset you made publicly available has been used numerous times by others as a benchmark for density estimation I think it is an issue which requires our attention.
It's unclear how every attribute with a Pearson correlation coefficient greater than 0.98
are eliminated. As correlation is calculated in pairs, how do you decide which attribute to eliminate?
Thanks.
Hello @gpapamak,
Due to API changes in pandas
, the GAS and HEPMASS datasets are not usable anymore. Notably, the DataFrame.as_matrix
method has been deprecated since pandas=0.23.0
and the DataFrame
pickling format of pandas<2.0
is not compatible with pandas>=2.0
. There is also an issue with Counter.iteritems
which is deprecated since Python 3.0.
I don't think modifying this repository to fix these issues is a good idea as it could break the code. Instead, I made a lightweight fork (francois-rozet/uci-datasets) of the repo's UCI datasets and wrote instructions to generate environment-agnostic .npy
files containing the processed data. These .npy
files can then be used without relying on the original code and its dependencies. I hope it's ok for you.
README links to the datasets at https://zenodo.org/record/1161203#.Wmtf_XVl8eN , the link throws 404
Hi,
I am not sure I understand the log-likelihood expression for the Gaussian MADE:
https://github.com/gpapamak/maf/blob/master/ml/models/mades.py#L234
is this correct?
I assumed that the log -likelihood would be that of a univariate gaussian with mean mu and var alpha? is my understanding wrong?
Trying to understand the code I realized that it runs in python2, I decided to try to do a mini migration. From what I've seen there are serious compatibility problems with python > 3.6. I've managed to get the code working after a few minor changes in version 3.6.4. (It would be necessary to test the changes exhaustively.) Would be could if you can give me access to push my migration branch.
You can find my code in the fork that I have on my github
I am trying to run your code on POWER, GAS datasets.
The data I download from the link is 'txt' files.
However, in your code, you read from a file called 'data.npy'.
def load_data():
return np.load(datasets.root + 'power/data.npy')
Could you please provide the code to preprocess the data and generate npy files?
Thanks.
I have tried running your code but got the following error message (MNIST experiments):
theano.gof.fg.MissingInputError: A variable that is an input to the graph was neither provided as an input to the function nor given a value. A chain of variables leading from this input to an output is [x, dot.0, Elemwise{add,no_inplace}.0, Elemwise{add,no_inplace}.0, Elemwise{add,no_inplace}.0, h1, dot.0, Elemwise{add,no_inplace}.0, Elemwise{add,no_inplace}.0, h2, dot.0, logp, Elemwise{mul,no_inplace}.0, Elemwise{exp,no_inplace}.0, Elemwise{mul,no_inplace}.0, Sum{axis=[0], acc_dtype=float64}.0, mean]. This chain may not be unique
Backtrace when the variable is created:
File "run_experiments.py", line 245, in <module>
main()
File "run_experiments.py", line 241, in main
methods[name]()
File "run_experiments.py", line 184, in run_experiments_mnist
ex.train_maf_cond([n_hiddens]*2, act_fun, n_layers*i, mode)
File "/u/home/maf/experiments.py", line 248, in train_maf_cond
model = mafs.ConditionalMaskedAutoregressiveFlow(data.n_labels, data.n_dims, n_hiddens, act_fun, n_mades, mode=mode)
File "/u/home/maf/ml/models/mafs.py", line 172, in __init__
self.input = tt.matrix('x', dtype=dtype) if input is None else input
It looks like the model is not getting the data properly. Could this be caused by changes in theano version ?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.