Giter Club home page Giter Club logo

Comments (7)

NickNickGo avatar NickNickGo commented on May 18, 2024

Hi @sshleifer ,

Could you list the steps to reproduce this error? Also please provide environment info.

Thanks,

from fastseq.

sshleifer avatar sshleifer commented on May 18, 2024

Hard to explain the cluster setup, but we fixed with export TORCH_CUDA_ARCH_LIST="6.0;6.1;7.0" before building the extension.

Another question, is there an advantage for NGramRepeatBlock inheriting from nn.Module?

from fastseq.

yuyan2do avatar yuyan2do commented on May 18, 2024

@sshleifer We are open to pull some changes back into 'fairseq'.

I am trying to use your repeat ngram extension, but when I switch GPUs (without rebuilding the extension) it breaks with RuntimeError: CUDA error: no kernel image is available for execution on the device. If I rerun: python setup.py build_ext --inplace it works again. Any clues how to build the extension so that it works on a different GPU (same cuda version, same python version, same torch) than where it was built?

Also, we're considering pulling some of these changes back into fairseq, if that's alright with you guys!

from fastseq.

sshleifer avatar sshleifer commented on May 18, 2024

Awesome! If you guys tell me your twitter handles/or some other link I will make sure to credit you when I tweet. The speedup for ngram blocking is really impressive, it will get merged into fairseq/master soon.

I'm also trying to prioritize including the other changes:

  • MultiheadAttention: einsum
  • SequenceGenerator: parallel post-processing
  • BeamSearch: ?
  • TransformerEncoder, TransformerModel: delete reorder_encoder_out

Are the last two changes important? Do you guys have a sense of why?
Is the MultiheadAttention just to save memory or also faster?

Thanks and sorry for all the questions.

from fastseq.

yuyan2do avatar yuyan2do commented on May 18, 2024

Thanks Sam, it will be great if you mention our project https://github.com/microsoft/fastseq and twitter @fastseq.

  • MultiheadAttention einsum combine with reorder_incremental_state are both faster and save memory under same batch size. Memory copy takes a lot of time, especially when input is long. Remove reorder_encoder_out because don't need duplicate encoder out by beam size times. There are some analysis in here and here
  • SequenceGenerator: parallel post-processing.
  • BeamSearch: combability change for fairseq v0.9.0 only, just replace torch.div to torch.floor_divide.

from fastseq.

feihugis avatar feihugis commented on May 18, 2024

@sshleifer Thanks for your interest! I think this issue has been resolved. I will close it, but feel free to reopen it if you have more questions.

from fastseq.

yuyan2do avatar yuyan2do commented on May 18, 2024

@sshleifer I saw ngram blocking has merged to fairseq/main. Do you get chance to try other changes? We have papers (FastSeq and EL-Attention) to description the changes now. It may give more sense how it gives speedup.

from fastseq.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.