Giter Club home page Giter Club logo

Comments (10)

BSalita avatar BSalita commented on April 29, 2024 5

I've done some research. In python 3, int is unlimited [sic] in magnitude on all platforms. numpy has a different set of numeric types which correspond to C sizes. Most are fixed in size but a few differ in size by implementation (e.g. np.intc). The size of a Tensor type is the same on all platforms.

As you've stated, the issue is that the dataset returns a numpy data type (np.intc?) which has implementation dependent sizes. The size can differ according to machine architecture, OS, C compiler, and other factors. You can't make any assumptions about the size of an np.intc. The size could even differ on the same system and same C compiler. Using np.int32 would make for consistent processing across all platforms -- all platforms would raise an error because a LongTensor is expected. LongTensor's are always 64-bits.

There's at least four possible solutions:

  1. Have the dataset return np.int64.
  2. For places where LongTensor is expected, such as in model.py, force the type to 64-bit. Note that variable.int64() isn't an implemented attribute. variable.long() is implemented (works for both cpu and gpu) but I'm unsure if it guarantees 64-bit. I'll post this question on stackoverflow.
  3. Change the type earlier in the call sequence. This would help a static type checker. It would be more efficient if the variable undergoes multiple type changes.
  4. Change the type later in the call sequence, at the point where C is called. Change calls to torch._C.* (e.g. torch._C._nn.nll_loss) to coerce to the required C data type.

I'll continue working on this issue over the next few days.

http://pytorch-zh.readthedocs.io/en/latest/tensors.html
https://docs.scipy.org/doc/numpy-1.12.0/user/basics.types.html

from fastai.

jph00 avatar jph00 commented on April 29, 2024 1

I think the .long() solution is a good one - I can't see any reason that this should cause problems on Linux or CPU. I'll try it out.

BTW, I'm well aware of the status of pytorch on Windows - the issue is whether I'm ready to support fastai on Windows :) . I suspect I'll endeavor to support it officially after 0.4 is out and Windows CI is done for pytorch, but where we have simple clear solutions to problems in the meantime I'll include them.

from fastai.

jph00 avatar jph00 commented on April 29, 2024

The solution you proposed assumes we're on CUDA, which may not be the case. I'll see if I can think of something...

from fastai.

jph00 avatar jph00 commented on April 29, 2024

I think the right fix is to have the dataset return the correct type (np.int32) in the first place. Closing this issue since Windows isn't something I'm ready to officially support just yet. But if you create a fix that works on Linux and Windows with and without CUDA then I'll certainly consider merging it.

from fastai.

BSalita avatar BSalita commented on April 29, 2024

I'm wondering why the issue is showing up at all? Seems like it should show up everywhere or nowhere. I'm guessing the difference is in some recent change to pytorch which has not caught up to the Windows version.

from fastai.

jph00 avatar jph00 commented on April 29, 2024

It's because on Windows the int sizes are different.

from fastai.

BSalita avatar BSalita commented on April 29, 2024

The maintainer of pytorch for Windows, peterjc123, says "The Windows PRs are actively merged. The official Windows CI is near to be setup, and the official package is planned for 0.4.0." at pytorch/pytorch#494 (comment)

pytorch 1.3 + CUDA 9.0 for Windows is available and works for me. tensorflow-gpu 1.4 for Windows also works but requires CUDA 8.0, 9.0 is not yet supported. I'm running CUDA 8.0 and CUDA 9.0 side-by-side without issue.

from fastai.

davideboschetto avatar davideboschetto commented on April 29, 2024

Just to let you know, was playing with this on Win7 and ended up with the same problem! Going with y.long() fixes it for the moment.
I'm really hoping I'll be able to use Windows for Part2-2018!

from fastai.

nikos-h avatar nikos-h commented on April 29, 2024

@davideboschetto Where did you change to y.long()?

cat, cont, y = next(iter(md.trn_dl))
cat, cont, y = Variable(cat), Variable(cont), Variable(y).long()
pred = model(cat, cont)
for p, true in zip(pred.data.numpy(), torch.max(y, 1)[0].data):
    print('pred log probs: {} -- True: {}'.format(p, true))

Then run lr_find() yields the error. y is set to a long so don't know where to change it.

Running on Bash Ubuntu on Win 10.

from fastai.

JoshuaC3 avatar JoshuaC3 commented on April 29, 2024

I had this issue in Linux, however, my install was a little unusual (I have an old GPU with my own build from source).

@nikos-h
I fixed it with

    if dim == 2:
        return torch._C._nn.nll_loss(
            input, _Variable.long(target)_,
            weight, size_average,
            ignore_index, reduce
        )

in place of,

    if dim == 2:
        return torch._C._nn.nll_loss(input, _target_, weight, size_average, ignore_index, reduce)

but the error on python should be clear enough to tell you the exact place where this fails. you may not have dim == 2, for example.

from fastai.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.