Running on Windows 10 appears to expose a type mismatch issue in model.py. <p dir=

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Expected object of type Variable[torch.cuda.LongTensor] in model.py about fastai HOT 10 CLOSED

BSalita commented on April 29, 2024

Expected object of type Variable[torch.cuda.LongTensor] in model.py

from fastai.

Comments (10)

BSalita commented on April 29, 2024 5

I've done some research. In python 3, int is unlimited [sic] in magnitude on all platforms. numpy has a different set of numeric types which correspond to C sizes. Most are fixed in size but a few differ in size by implementation (e.g. np.intc). The size of a Tensor type is the same on all platforms.

As you've stated, the issue is that the dataset returns a numpy data type (np.intc?) which has implementation dependent sizes. The size can differ according to machine architecture, OS, C compiler, and other factors. You can't make any assumptions about the size of an np.intc. The size could even differ on the same system and same C compiler. Using np.int32 would make for consistent processing across all platforms -- all platforms would raise an error because a LongTensor is expected. LongTensor's are always 64-bits.

There's at least four possible solutions:

Have the dataset return np.int64.
For places where LongTensor is expected, such as in model.py, force the type to 64-bit. Note that variable.int64() isn't an implemented attribute. variable.long() is implemented (works for both cpu and gpu) but I'm unsure if it guarantees 64-bit. I'll post this question on stackoverflow.
Change the type earlier in the call sequence. This would help a static type checker. It would be more efficient if the variable undergoes multiple type changes.
Change the type later in the call sequence, at the point where C is called. Change calls to torch._C.* (e.g. torch._C._nn.nll_loss) to coerce to the required C data type.

I'll continue working on this issue over the next few days.

http://pytorch-zh.readthedocs.io/en/latest/tensors.html
https://docs.scipy.org/doc/numpy-1.12.0/user/basics.types.html

from fastai.

jph00 commented on April 29, 2024 1

I think the .long() solution is a good one - I can't see any reason that this should cause problems on Linux or CPU. I'll try it out.

BTW, I'm well aware of the status of pytorch on Windows - the issue is whether I'm ready to support fastai on Windows :) . I suspect I'll endeavor to support it officially after 0.4 is out and Windows CI is done for pytorch, but where we have simple clear solutions to problems in the meantime I'll include them.

from fastai.

jph00 commented on April 29, 2024

The solution you proposed assumes we're on CUDA, which may not be the case. I'll see if I can think of something...

from fastai.

jph00 commented on April 29, 2024

I think the right fix is to have the dataset return the correct type (np.int32) in the first place. Closing this issue since Windows isn't something I'm ready to officially support just yet. But if you create a fix that works on Linux and Windows with and without CUDA then I'll certainly consider merging it.

from fastai.

BSalita commented on April 29, 2024

I'm wondering why the issue is showing up at all? Seems like it should show up everywhere or nowhere. I'm guessing the difference is in some recent change to pytorch which has not caught up to the Windows version.

from fastai.

jph00 commented on April 29, 2024

It's because on Windows the int sizes are different.

from fastai.

BSalita commented on April 29, 2024

The maintainer of pytorch for Windows, peterjc123, says "The Windows PRs are actively merged. The official Windows CI is near to be setup, and the official package is planned for 0.4.0." at pytorch/pytorch#494 (comment)

pytorch 1.3 + CUDA 9.0 for Windows is available and works for me. tensorflow-gpu 1.4 for Windows also works but requires CUDA 8.0, 9.0 is not yet supported. I'm running CUDA 8.0 and CUDA 9.0 side-by-side without issue.

from fastai.

davideboschetto commented on April 29, 2024

Just to let you know, was playing with this on Win7 and ended up with the same problem! Going with y.long() fixes it for the moment.
I'm really hoping I'll be able to use Windows for Part2-2018!

from fastai.

nikos-h commented on April 29, 2024

@davideboschetto Where did you change to y.long()?

cat, cont, y = next(iter(md.trn_dl))
cat, cont, y = Variable(cat), Variable(cont), Variable(y).long()
pred = model(cat, cont)
for p, true in zip(pred.data.numpy(), torch.max(y, 1)[0].data):
    print('pred log probs: {} -- True: {}'.format(p, true))

Then run lr_find() yields the error. y is set to a long so don't know where to change it.

Running on Bash Ubuntu on Win 10.

from fastai.

JoshuaC3 commented on April 29, 2024

I had this issue in Linux, however, my install was a little unusual (I have an old GPU with my own build from source).

@nikos-h
I fixed it with

    if dim == 2:
        return torch._C._nn.nll_loss(
            input, _Variable.long(target)_,
            weight, size_average,
            ignore_index, reduce
        )

in place of,

    if dim == 2:
        return torch._C._nn.nll_loss(input, _target_, weight, size_average, ignore_index, reduce)

but the error on python should be clear enough to tell you the exact place where this fails. you may not have dim == 2, for example.

from fastai.

Expected object of type Variable[torch.cuda.LongTensor] in model.py about fastai HOT 10 CLOSED

Comments (10)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent