Comments (7)
Hi @savi8sant8s! I was able to verify that Ludwig 0.9.3 fixes things. I also made a few changes to your notebook that I believe are important in ensuring good learning/output. Here's the notebook: https://colab.research.google.com/drive/1QwojspiXKVULZ1xsuoUSWDonVS1Ig8JM?usp=sharing
The main thing you'll notice is that I added a code block to profile your data and figure out the distribution of the number of tokens in each of your columns. From this, I learned that the maximum sequence length of your instruction, input and output was 202 tokens. If we also add in the number of tokens for the prompt, it's probably closer to 256 tokens. However, you had set global_max_sequence_length
to 128 instead of 256, meaning that the model would only learn from examples in your dataset where the number of tokens in your prompt + instruction + input was < 128 tokens, which wasn't always the case.
The other thing I added was a new trained parameter called enable_gradient_checkpointing: true
which helps reduce memory usage for longer sequences.
Let me know if the output prediction results in this notebook match your expectation - it seems like it correctly fixed the capitalization and didn't perform the repetition that you were seeing before.
from ludwig.
@savi8sant8s Ah I see, will let them know I responded here!
If this issue is resolved, is it okay if I mark it as closed?
from ludwig.
Hi @savi8sant8s, thanks for the reporting the issue and sorry you're running into it.
Are you able to share which version of Ludwig you were using before downgrading to Ludwig 0.8.6? We actually introduced some regressions in Ludwig 0.9.1 and 0.9.2 that were fixed in Ludwig 0.9.3 released in the last week, specifically related to finetuning outputs not looking as good as expected for a variety of models including Llama, Mistral, Mixtral and Phi.
If you can share your dataset, I'm happy to test it for you with the latest Ludwig version and see if I can reproduce the error and then look into a fix.
from ludwig.
I was using version 0.9.1 @arnavgarg1.
Below is my Notebook and prompts. I was working on Fine-tuning LLama2-7b to create a text corrector in Portuguese.
project.zip
Thank you for the contact.
from ludwig.
Worked perfectly @arnavgarg1 . thank you so much. The issue is in ludwig-docs too. If you can solve it there, the author of it will also know that the new update solved the problem. Thanks again for the help.
from ludwig.
@savi8sant8s I'm glad to hear that it worked perfectly!
Could you explain the issue in Ludwig-docs that you are referring to? Based on what you said, my understanding is that there was no notice on Ludwig docs explaining that this issue exists in Ludwig 0.9/0.9.1/0.9.2 and that we were working on a fix and now it is fixed. Is this understanding right?
from ludwig.
@arnavgarg1 In fact, an issue was created wrongly in ludwig-docs regarding this: ludwig-ai/ludwig-docs#337.
from ludwig.
Related Issues (20)
- Ludwig: Fine-Tune Mistral-7b missing LudwigModel import and/or definition HOT 8
- Errors due to most recent PyTorch Nightly Build (1/16/2024)
- Inference on CPU HOT 3
- Impossibility to use a tokenizer with auto_transformer HOT 4
- Integrating new frameworks
- Unpin DeepSpeed to allow 0.13.0 and greater
- Unpin pandas to allow newer versions >= 2.2.0
- Use torch >= 2.1.1 in Docker images to enable SDPA dispatching via Flash Attention 2 for faster training and inference
- Remove target_module LoRA mapping for Phi-2 model
- RuntimeError in Model Training on Predibase with Ludwig: Data Type Mismatch HOT 5
- Ray parallelization does not work HOT 1
- Unable to create visualizations using Python API HOT 5
- upload_to_hf_hub model path mismatch with model.save HOT 16
- [Errno 36] File name too long exception when running predict HOT 2
- Inappropriate saving of the merged fine tuned llama-2 model HOT 4
- Retrain previously fine tuned adapter HOT 4
- Remove default target modules for Gemma once it's updated in PEFT
- Re-Enable AdaptionPrompt when HuggingFace releases the PEFT fix.
- Unable to fine-tune when not using quantization HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ludwig.