Giter Club home page Giter Club logo

Comments (9)

paulbrodersen avatar paulbrodersen commented on June 7, 2024

I am not sure what you are trying to accomplish; I am assuming some sort of regression optimized with backprop using the continuous entropy estimate as a cost function. If that is the case, I am not sure that is a theoretically sound approach. The Kozachenko-Leonenko estimator implemented by get_h is non-differentiable since it uses k-nearest neighbour distances between data points to estimate the entropy. However, a small change in the data values can convert a (k)th nearest neighbour of a data point into a (k+1)th or (k-1)th nearest neighbour. As a result, the entropy estimates will not form a smooth and differentiable function of the data.

from entropy_estimators.

willleeney avatar willleeney commented on June 7, 2024

Yes what I was trying to achieve is similar to what you have described; I had realised that the get_h function was non-differentiable due to the k-nearest neighbour distances so I was wondering if you knew of a way to estimate the k-nearest neighbour distances so that it could be differentiable? Are you saying that this is theoretically impossible, because unfortunately this was the only accurate way of estimating the entropy that I could find? It is very annoying because I could convert the rest of the function to torch to be differentiable..

from entropy_estimators.

paulbrodersen avatar paulbrodersen commented on June 7, 2024

Yeah, I don't think it's a valid approach as I can't see a way to approximate the estimator with a differentiable function.
There are, however, analytic solutions for the entropy of probability distributions from the exponential family. If your data points can be reasonably approximated with any of those distributions, then you will have yourself a differentiable function. I have implemented the multivariate normal case (get_h_mvn, IIRC) but I am reasonably sure that there are solutions for all distributions from that family. If you can't find the formulae online, you will find them probably in Cover & Thomas, Elements of Information Theory.

All of that being said, if it's a regression problem, what is wrong with RMS as a cost function?

from entropy_estimators.

willleeney avatar willleeney commented on June 7, 2024

The data points could be potentially be approximated with a multivariate normal so I implemented a differentiable version of get_h_mvn. However, the data points in question are close to zero so the calculation of the determinant tends to 0 which means the log calculation tends to nan.

It's not actually a regression problem, more of an unsupervised clustering problem so RMS isn't appropriate.

from entropy_estimators.

paulbrodersen avatar paulbrodersen commented on June 7, 2024

Entropy is a measure of dispersion, so if there is no spread then the entropy is by definition zero.

How do you have floating point targets if it is a clustering problem? Also, how is the algorithm unsupervised if you are using backprop to train it? If you want any more help, you will have to explain the problem with more detail.

from entropy_estimators.

willleeney avatar willleeney commented on June 7, 2024

So my starting point was this paper: 'Unsupervised Deep Embedding for Clustering Analysis', but it relies on initial high confidence targets using k-means as a starting point. My problem is that the cluster centres that my model learns do not disperse enough, hence the initial paper's reliance on the high confidence targets. My idea was to minimize the inverse of the entropy across the centroids alongside a balance of clustering methods to enforce the centroids to disperse.

Is this enough information or is this not enough of an explanation? Thank you for the advice.

from entropy_estimators.

paulbrodersen avatar paulbrodersen commented on June 7, 2024

Yeah, that paper and your idea sketch help a lot. It's not a bad idea for a fix but I don't understand why the problem occurs in the first place. Specifically, I am a bit confused why there is no dispersion in the initialized cluster centers. It suggests that the representations in the last layer of the autoencoder are not very distinct. Have you checked that the autoencoder works well? If it does, maybe you need to reduce the number of units in the last layer to force each unit to have a wider range of activities.
Also, enforcing sparsity in the output of the last layer should help.

Personally, I would start troubleshooting there. However, if you think that you have exhausted everything on that front, then maximising the entropy seems like a reasonable approach. However, I am unsure that the approaches we have discussed so far are appropriate, as they all assume a large number of samples. If I understand correctly, you are only interested in the dispersion of the cluster centers, which presumably will be few (how many clusters do you have?). I would use a simpler measure of dispersion: maybe something like the square root of the sum of distances between cluster centers.

from entropy_estimators.

willleeney avatar willleeney commented on June 7, 2024

Thank you for the troubleshooting ideas, I had followed along these lines myself and I am in the process of implementing a beta-VAE to get better representations. The sparsity suggestion is something I had not thought of either.

Yes you understand perfectly, I am only interested in the dispersion of the cluster centres as you say. I think that using simpler measures of similarity to assess dispersion is the correct solution to the problem here. I was going to use a Cosine similarity however maybe the Euclidean distance would be best suited. I will try these out but I am sure that this is the solution to the original issue.

Thank you so much for your help with this, this has been very useful to discuss this with you!

from entropy_estimators.

paulbrodersen avatar paulbrodersen commented on June 7, 2024

Anytime, and good luck. Let me know if you get it to work!

from entropy_estimators.

Related Issues (14)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.