Comments (9)
I am not sure what you are trying to accomplish; I am assuming some sort of regression optimized with backprop using the continuous entropy estimate as a cost function. If that is the case, I am not sure that is a theoretically sound approach. The Kozachenko-Leonenko estimator implemented by get_h
is non-differentiable since it uses k-nearest neighbour distances between data points to estimate the entropy. However, a small change in the data values can convert a (k)th nearest neighbour of a data point into a (k+1)th or (k-1)th nearest neighbour. As a result, the entropy estimates will not form a smooth and differentiable function of the data.
from entropy_estimators.
Yes what I was trying to achieve is similar to what you have described; I had realised that the get_h
function was non-differentiable due to the k-nearest neighbour distances so I was wondering if you knew of a way to estimate the k-nearest neighbour distances so that it could be differentiable? Are you saying that this is theoretically impossible, because unfortunately this was the only accurate way of estimating the entropy that I could find? It is very annoying because I could convert the rest of the function to torch to be differentiable..
from entropy_estimators.
Yeah, I don't think it's a valid approach as I can't see a way to approximate the estimator with a differentiable function.
There are, however, analytic solutions for the entropy of probability distributions from the exponential family. If your data points can be reasonably approximated with any of those distributions, then you will have yourself a differentiable function. I have implemented the multivariate normal case (get_h_mvn
, IIRC) but I am reasonably sure that there are solutions for all distributions from that family. If you can't find the formulae online, you will find them probably in Cover & Thomas, Elements of Information Theory.
All of that being said, if it's a regression problem, what is wrong with RMS as a cost function?
from entropy_estimators.
The data points could be potentially be approximated with a multivariate normal so I implemented a differentiable version of get_h_mvn
. However, the data points in question are close to zero so the calculation of the determinant tends to 0 which means the log calculation tends to nan.
It's not actually a regression problem, more of an unsupervised clustering problem so RMS isn't appropriate.
from entropy_estimators.
Entropy is a measure of dispersion, so if there is no spread then the entropy is by definition zero.
How do you have floating point targets if it is a clustering problem? Also, how is the algorithm unsupervised if you are using backprop to train it? If you want any more help, you will have to explain the problem with more detail.
from entropy_estimators.
So my starting point was this paper: 'Unsupervised Deep Embedding for Clustering Analysis', but it relies on initial high confidence targets using k-means as a starting point. My problem is that the cluster centres that my model learns do not disperse enough, hence the initial paper's reliance on the high confidence targets. My idea was to minimize the inverse of the entropy across the centroids alongside a balance of clustering methods to enforce the centroids to disperse.
Is this enough information or is this not enough of an explanation? Thank you for the advice.
from entropy_estimators.
Yeah, that paper and your idea sketch help a lot. It's not a bad idea for a fix but I don't understand why the problem occurs in the first place. Specifically, I am a bit confused why there is no dispersion in the initialized cluster centers. It suggests that the representations in the last layer of the autoencoder are not very distinct. Have you checked that the autoencoder works well? If it does, maybe you need to reduce the number of units in the last layer to force each unit to have a wider range of activities.
Also, enforcing sparsity in the output of the last layer should help.
Personally, I would start troubleshooting there. However, if you think that you have exhausted everything on that front, then maximising the entropy seems like a reasonable approach. However, I am unsure that the approaches we have discussed so far are appropriate, as they all assume a large number of samples. If I understand correctly, you are only interested in the dispersion of the cluster centers, which presumably will be few (how many clusters do you have?). I would use a simpler measure of dispersion: maybe something like the square root of the sum of distances between cluster centers.
from entropy_estimators.
Thank you for the troubleshooting ideas, I had followed along these lines myself and I am in the process of implementing a beta-VAE to get better representations. The sparsity suggestion is something I had not thought of either.
Yes you understand perfectly, I am only interested in the dispersion of the cluster centres as you say. I think that using simpler measures of similarity to assess dispersion is the correct solution to the problem here. I was going to use a Cosine similarity however maybe the Euclidean distance would be best suited. I will try these out but I am sure that this is the solution to the original issue.
Thank you so much for your help with this, this has been very useful to discuss this with you!
from entropy_estimators.
Anytime, and good luck. Let me know if you get it to work!
from entropy_estimators.
Related Issues (14)
- continuous entropy with KNN HOT 21
- Mutual information is greater than information entropy HOT 25
- Does "partial mutual information" here mean "conditional mutual information"?
- Does "partial mutual information" here mean "conditional mutual information"? HOT 2
- Multiplying the euclidian distance by 2. HOT 2
- Error occurred with "get_imin" HOT 1
- Can we truely get the joint distribution P(x,y) to calculate the H(x,y) ? HOT 4
- categorical values mutual info pls HOT 4
- Unexpected -inf entropy estimations HOT 6
- Transfer Entropy on Different Dimensions? HOT 2
- Regarding Maximal Entropy HOT 3
- Process finished with exit code -1073741571 (0xC00000FD) HOT 14
- readme import numpy as np HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from entropy_estimators.