Giter Club home page Giter Club logo

Comments (7)

arnavgarg1 avatar arnavgarg1 commented on June 2, 2024 1

Hi @savi8sant8s! I was able to verify that Ludwig 0.9.3 fixes things. I also made a few changes to your notebook that I believe are important in ensuring good learning/output. Here's the notebook: https://colab.research.google.com/drive/1QwojspiXKVULZ1xsuoUSWDonVS1Ig8JM?usp=sharing

The main thing you'll notice is that I added a code block to profile your data and figure out the distribution of the number of tokens in each of your columns. From this, I learned that the maximum sequence length of your instruction, input and output was 202 tokens. If we also add in the number of tokens for the prompt, it's probably closer to 256 tokens. However, you had set global_max_sequence_length to 128 instead of 256, meaning that the model would only learn from examples in your dataset where the number of tokens in your prompt + instruction + input was < 128 tokens, which wasn't always the case.

The other thing I added was a new trained parameter called enable_gradient_checkpointing: true which helps reduce memory usage for longer sequences.

Let me know if the output prediction results in this notebook match your expectation - it seems like it correctly fixed the capitalization and didn't perform the repetition that you were seeing before.

from ludwig.

arnavgarg1 avatar arnavgarg1 commented on June 2, 2024 1

@savi8sant8s Ah I see, will let them know I responded here!

If this issue is resolved, is it okay if I mark it as closed?

from ludwig.

arnavgarg1 avatar arnavgarg1 commented on June 2, 2024

Hi @savi8sant8s, thanks for the reporting the issue and sorry you're running into it.

Are you able to share which version of Ludwig you were using before downgrading to Ludwig 0.8.6? We actually introduced some regressions in Ludwig 0.9.1 and 0.9.2 that were fixed in Ludwig 0.9.3 released in the last week, specifically related to finetuning outputs not looking as good as expected for a variety of models including Llama, Mistral, Mixtral and Phi.

If you can share your dataset, I'm happy to test it for you with the latest Ludwig version and see if I can reproduce the error and then look into a fix.

from ludwig.

savi8sant8s avatar savi8sant8s commented on June 2, 2024

I was using version 0.9.1 @arnavgarg1.
Below is my Notebook and prompts. I was working on Fine-tuning LLama2-7b to create a text corrector in Portuguese.
project.zip
Thank you for the contact.

from ludwig.

savi8sant8s avatar savi8sant8s commented on June 2, 2024

Worked perfectly @arnavgarg1 . thank you so much. The issue is in ludwig-docs too. If you can solve it there, the author of it will also know that the new update solved the problem. Thanks again for the help.

from ludwig.

arnavgarg1 avatar arnavgarg1 commented on June 2, 2024

@savi8sant8s I'm glad to hear that it worked perfectly!

Could you explain the issue in Ludwig-docs that you are referring to? Based on what you said, my understanding is that there was no notice on Ludwig docs explaining that this issue exists in Ludwig 0.9/0.9.1/0.9.2 and that we were working on a fix and now it is fixed. Is this understanding right?

from ludwig.

savi8sant8s avatar savi8sant8s commented on June 2, 2024

@arnavgarg1 In fact, an issue was created wrongly in ludwig-docs regarding this: ludwig-ai/ludwig-docs#337.

from ludwig.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.