Giter Club home page Giter Club logo

Comments (5)

spirosChv avatar spirosChv commented on May 18, 2024 1

Hi @spirosChv thanks for your comment and @GaganaB for the great insight and explanation. I think I do understand the case of train vs test, although I've previously only seen in practice shuffling both. Is there any advantage to that @GaganaB?

In practice, there is no advantage to shuffling the test set. During the test phase, the inputs pass through a static network. So, the order does not matter. However, you want to shuffle to have a more unbiased estimation during training. Imagine that you have collected some images; the first half is clean, whereas the second half is blurry. If you do not shuffle, your network will never learn the existence of both.
On the other hand, when you test, as you do not learn anything, the order does not play any role. In the end, you report a loss and an accuracy score as an average across all testing inputs. To calculate the average, the order does not matter.

PS. I suggest continuing this conversation on discord to increase visibility from other TAs/Students/etc.

Thank you.

from course-content-dl.

spirosChv avatar spirosChv commented on May 18, 2024

@wizofe, thank you for contributing to our repo. Although I do not remember by heart where this is used, shuffling is False during the test as it does not matter the order of the images. While during training, the order matters as we split the dataset into batches. Does this make sense?

The drop_last argument during the test is unnecessary as we do not care if one batch is smaller, but during training, we want to drop the last batch if it is smaller (for this, I am not quite sure if it makes any difference).

from course-content-dl.

GaganaB avatar GaganaB commented on May 18, 2024

Hi @wizofe, I agree with Spiros here. I'll elaborate below just to clarify a few things (hopefully).

  • Shuffling: The test and train sets are generated by probabilistic distributions over the entire data called the data generating processes. And this works on the assumption of i.i.d i.e., Independent examples and identically distributed. And we shuffle this data to overcome catastrophic forgetting, and to ensure representative samples across test/train/validation sets.
    Mathematically speaking: assume that we have P elements in W (that is, there are P weights in the network), L is a surface in a P+1-dimensional space. This arises from the fact that for any given matrices of weights W , the loss function can be evaluated on X and that value becomes the elevation of the surface.
    But there is the problem of non-convexity; the surface I described will have numerous local minima, and therefore gradient descent algorithms are susceptible to becoming "stuck" in those minima while a deeper/lower/better solution may lie nearby. This is likely to occur if X is unchanged over all training iterations, because the surface is fixed for a given X; all its features are static, including its various minima. To ensure that the gradient descent doesn't get stuck, we shuffle the train data. And since we follow the iid principle anyway, shuffling the test sets would not be necessary.

  • Drop_Last: The drop_last parameter signals to the sampler to drop the tail of the data to make it evenly divisible across the number of replicas. Since we shuffle the train data anyway, we can afford to drop the last non-full batch. But we don't share similar luxuries when it comes to the test sample.

I hope that helps. Feel free to comment below if we can be of further assistance. :)

from course-content-dl.

wizofe avatar wizofe commented on May 18, 2024

Hi @spirosChv thanks for your comment and @GaganaB for the great insight and explanation. I think I do understand the case of train vs test, although I've previously only seen in practice shuffling both. Is there any advantage to that @GaganaB?

from course-content-dl.

wizofe avatar wizofe commented on May 18, 2024

Thank you both!

from course-content-dl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.