Comments (3)
Quick additional questions in the same spirit:
- There are many options in train.lua. Any advice on the BEST setting (assuming I have unlimited compute power, what would be the best settings?), and on how variables influence the quality of the training?
- What is the recommended way to minimize the size of the model while keeping an acceptable performance?
- Does the size of images impact the model?
- I see a "-start_from" option, that let me think I can improve models and / or build the model iteratively. If I split my training set in sub sets and separately train them, can I aggregate the results somehow? (note this would clearly indicate it's possible to scale out) What would be the potential downsides of this approach?
Many thanks,
from neuraltalk2.
Hi SamnCo, did you figure any of these questions out? You're input would be greatly appreciated. Thank you.
from neuraltalk2.
Hi @dazoulay sorry for the time to answer, been OOO for a little while with poor net access. Anyway...
I didn't move a lot on these, but I have some new inputs:
For the training parameters, I see more and more people using a model to actually learn what the best settings would be. Imagine you orchestrate training with various settings, collect results at different points in time, compare them, then learn from that to adjust and converge towards the best settings. It's another layer of ML/DL on top. This seem to be a successful approach, but I didn't test it myself.
For the 4th item: Essentially, the start_from allows you to give an existing model to start from and improve it.
Regarding scale out, as far as I went, you can consider 2 types of scaling:
- Train several models in //, compare results, keep the best model: this is assimilated as data scaling, as the various models trained on different machines do not communicate
- Use a network of machines to train on the same set. AFAIK, the only frameworks allowing that are Tensorflow, DL4j and Caffe, all using Spark as the underlying engine to scale. The main drawback coming from that is that Spark is sort of a "start network", with a central orchestrator making many decision. That means evaluation and communicating back to the orchestration node can (and will!) become the bottleneck. I submitted the idea to use SDNs to improve communication between nodes, which could help, but again this would be up to the orchestrator to "predict" the best network and set it up. Nevertheless this seems the most promising for now, until Google releases more of the scale out aspects of Tensorflow.
Note: the bottleneck here is related to velocity. If you have all the time in the world, it will still fix the "size" issue and allow you to go beyond the size of the RAM of your video cards.
I am involved in several DL projects ATM, but moving away from Torch. I may get more info in the upcoming weeks, but won't necessary update here. Checkout my account for DL projects.
from neuraltalk2.
Related Issues (20)
- Replacing words in captions
- Missing file name HOT 1
- Few notes on the image test split of this repo.
- coco_preprocess.ipynb: print val.keys() ^ SyntaxError: invalid syntax HOT 1
- Is it possible to port this to Pytorch?
- Do you have Deep-Visual alignment cvpr 2015 implemented?
- Already captioned mscoco images
- cudnn problem HOT 3
- protobuf, loadcaffe, and hdf5 are also required for inference
- train.lua error , cuda runtime error (2) : out of memory HOT 1
- Wrong JPEG library version: library is 90, caller expects 80
- language_eval error HOT 5
- Good pretrained checkpoint model
- hi,do you think pytorch is better than torch?
- AssertionError: error: some caption had no words?
- Chinese image description, In the result, multiple words of the same type appear
- init.lua:235: Not a JPEG file: starts with 0x47 0x49 HOT 1
- train.lua run error
- how to run it in win10???
- DeepAI.org display is non-functional. Is this supposed to be the case? IDK.
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from neuraltalk2.