Giter Club home page Giter Club logo

Comments (9)

pnoga avatar pnoga commented on June 13, 2024

For big batch you need more RAM memory - make sure during your proces that you have enough memory.
Also if you wish to train network you need to be sure that you have specific hyper parameters that allow you to fully train network.
Currently we are recommending batchsize in range 32 - 128. Those are configurations that are under your validation - other are experimental and we plan to extend our support in the future.

from caffe.

mycarrar avatar mycarrar commented on June 13, 2024

Oh okay. I see one of the main differences in using the intel phi over a gpu is that you have more memory and thus can achieve a larger batch size. I'll keep experimenting, but I think you are right it. The job was being killed because it was consuming too much memory.

While I have you here. Should hyperthreading speed up training time? I have found that if I use hyperthreading when I go from OMP_NUM_CORES=64 to OMP_NUM_CORES=256 Caffe actually slows down?

from caffe.

pnoga avatar pnoga commented on June 13, 2024

Try to use HT and dont use OMP_NUM_CORES.
Solution for HT is being optimized (mostly for multinode). We are working to provide the best performance out of the box (without any additional commands or cores restrictions). Those changes should be released in one month.

from caffe.

inJeans avatar inJeans commented on June 13, 2024

Hmmm. Okay. If I don't use OMP_NUM_THREADS then my job runs serially. Is HT another environment variable? There is a flag to enable hyperthreading when submitting the job but it doesn't let you specify the number of threads to use. I was following the guidelines here https://github.com/intel/caffe/wiki/Recommendations-to-achieve-best-performance that is why I was using OMP_NUM_THREADS.

Is this the place to get help with this sort of thing. Or is there a forum somewhere else I should be using?

P.S. I was accidentally logged in as my friend earlier.

from caffe.

pnoga avatar pnoga commented on June 13, 2024

HT is environment variable like configuration of MCDRAM.
We are focusing of research those variables and provide some best known configuration and develop new features that will use other configurations and give better performance - so our recommandation might change over time.
There is no forum. We use this github for user support.

from caffe.

jdukat avatar jdukat commented on June 13, 2024

Is there still anything to be discussed here? Can this issue be closed?

from caffe.

inJeans avatar inJeans commented on June 13, 2024

Can I just confirm before you do. What do you think may be causing the issue of serial execution when I don't specify OMP_NUM_THREADS?

from caffe.

pnoga avatar pnoga commented on June 13, 2024

Could you describe this issue? We are moving here from one topic to another. The best way will be to use separate threads for each issue.

from caffe.

inJeans avatar inJeans commented on June 13, 2024

Of course. Above you suggested "dont use OMP_NUM_CORES." So I tried running without setting OMP_NUM_THREADS and as I said above, when I do so caffe seems to run serially? Even when I have set the HT option on.

from caffe.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.