Giter Club home page Giter Club logo

Comments (3)

SaMnCo avatar SaMnCo commented on August 24, 2024

Quick additional questions in the same spirit:

  • There are many options in train.lua. Any advice on the BEST setting (assuming I have unlimited compute power, what would be the best settings?), and on how variables influence the quality of the training?
  • What is the recommended way to minimize the size of the model while keeping an acceptable performance?
  • Does the size of images impact the model?
  • I see a "-start_from" option, that let me think I can improve models and / or build the model iteratively. If I split my training set in sub sets and separately train them, can I aggregate the results somehow? (note this would clearly indicate it's possible to scale out) What would be the potential downsides of this approach?

Many thanks,

from neuraltalk2.

dazoulay avatar dazoulay commented on August 24, 2024

Hi SamnCo, did you figure any of these questions out? You're input would be greatly appreciated. Thank you.

from neuraltalk2.

SaMnCo avatar SaMnCo commented on August 24, 2024

Hi @dazoulay sorry for the time to answer, been OOO for a little while with poor net access. Anyway...

I didn't move a lot on these, but I have some new inputs:
For the training parameters, I see more and more people using a model to actually learn what the best settings would be. Imagine you orchestrate training with various settings, collect results at different points in time, compare them, then learn from that to adjust and converge towards the best settings. It's another layer of ML/DL on top. This seem to be a successful approach, but I didn't test it myself.

For the 4th item: Essentially, the start_from allows you to give an existing model to start from and improve it.
Regarding scale out, as far as I went, you can consider 2 types of scaling:

  1. Train several models in //, compare results, keep the best model: this is assimilated as data scaling, as the various models trained on different machines do not communicate
  2. Use a network of machines to train on the same set. AFAIK, the only frameworks allowing that are Tensorflow, DL4j and Caffe, all using Spark as the underlying engine to scale. The main drawback coming from that is that Spark is sort of a "start network", with a central orchestrator making many decision. That means evaluation and communicating back to the orchestration node can (and will!) become the bottleneck. I submitted the idea to use SDNs to improve communication between nodes, which could help, but again this would be up to the orchestrator to "predict" the best network and set it up. Nevertheless this seems the most promising for now, until Google releases more of the scale out aspects of Tensorflow.
    Note: the bottleneck here is related to velocity. If you have all the time in the world, it will still fix the "size" issue and allow you to go beyond the size of the RAM of your video cards.

I am involved in several DL projects ATM, but moving away from Torch. I may get more info in the upcoming weeks, but won't necessary update here. Checkout my account for DL projects.

from neuraltalk2.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.