Comments (9)
Yes, it saves the best weights if the performance on validation improves. I can think about adding a --skip_save_best_model
parameter, but then the model will never be saved. There could be some use cases where that could be useful (hyperparameters search for instance). I may also try to find a way to keep the best model weights copied in memory and just saved at the end. Will add this to the list of enhancements.
from ludwig.
Ludwig saves both the latest weights and the best weights. You can turn off saving the latest weights with skip_save_progress_weights
, but you can't turn off saving the best weights, because, well, that's the whole purpose of training. Why would you want not to save the best weights?
from ludwig.
I meant by saving the best weight to the disk. Isn't it what it does? @w4nderlust
from ludwig.
Thanks, this will improve the performance for small training problems. Also, I've sent you a jupyter notebook and test data set for a regression problem showing the performance issue. What matters more is the regression results are so different between Ludwig and Keras with the same parameter setting, in which Keras results make more sense. If you want me open a new issue to describe it, I can. But I suspect it could be due to the way how biases are initialized. @w4nderlust
Yes, it saves the best weights if the performance on validation improves. I can think about adding a
--skip_save_best_model
parameter, but then the model will never be saved. There could be some use cases where that could be useful (hyperparameters search for instance). I may also try to find a way to keep the best model weights copied in memory and just saved at the end. Will add this to the list of enhancements.
from ludwig.
Those are two separate issues (speed and predictions), but will look at both of them, no need to open another issue. Thanks so much for the effort.
from ludwig.
Thanks a lot. Just a bit more information. I have also tried to fake a linear regression problem with Y = a * X + b + c*normal(0, 1) and create a single neuron/single layer for training. The results are very sensitive to how I initialize the network.
Those are two separate issues (speed and predictions), but will look at both of them, no need to open another issue. Thanks so much for the effort.
from ludwig.
So with the pushed fix the speed issue is in large part solved. Now there are parameters to turn off saving of intermediate models, progress and logs. Now on small models the speed compared to native TF is comparable (still slightly slower because of keeping track of statistics and generation of placeholders, but pretty close), while for bigger models, bigger datasets and bigger batch sizes the difference is negligible.
Still working on the prediction issue.
from ludwig.
from ludwig.
The regression isntability issue should have been solved. Please reopen if it is still an issue.
from ludwig.
Related Issues (20)
- KeyError in compare_performance visualization because "loss" not in metric_names (visualize.py line 1497) HOT 3
- Fine-tuning LLM for Classification - OutOfMemory HOT 14
- Unable to train the llama-7b in a machine with two Tesla T4 GPU's using Ray HOT 12
- Unable to train the llama-7b in a machine with two Tesla T4 GPU's using DeepSpeed integration HOT 2
- "Encounted `nan` values in tensor. Will be removed.", UserWarning HOT 3
- CUDA deserialization fails on GPU machine HOT 2
- `UnicodeDecodeError` error when importing ludwig.api HOT 3
- Ray trained AlexNet model performance is worse than locally trained HOT 1
- Need more error info HOT 1
- Test bug
- Support for versions of Ray 2.4+ HOT 3
- Remove dependency on torchdata for torch nightly installation
- Model_type: GBM- "ValueError: num_actors parameter set to 0 - Missing RayParams in lightgbm_ray call" HOT 2
- Cannot install ludwig in Kaggle notebook HOT 4
- Training started failing yesterday. HOT 4
- Softmax missing from Torchvision models HOT 3
- Re-training PEFT model fails after loading with `Linear4bit` error
- Cannot install ray 2.3.1 on Apple M2 macbook HOT 1
- Getting `ValueError: Hyperopt Section not present in config` while loading hyperopt from YAML config HOT 1
- Remove target_module hardcoding for Mixtral model
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ludwig.