Giter Club home page Giter Club logo

pytorch-best-practice's Issues

--max-epoch = 20 TypeError: ''str' object cannot be interpreted as an integer'

$ CUDA_VISIBLE_DEVICES='2,3' python main.py train --train-data-root=data/train/ --lr=0.005 --batch-size=32 --model='ResNet34' --max-epoch = 20 --use-gpu --env=classifier

TypeError: 'str' object cannot be interpreted as an integer

user config:
env classifier
vis_port 8097
model ResNet34
train_data_root data/train/
test_data_root ./data/test1
load_model_path None
batch_size 32
use_gpu True
num_workers 4
print_freq 20
debug_file /tmp/debug
result_file result.csv
max_epoch =
lr 0.005
lr_decay 0.5
weight_decay 0.0
WARNING:root:Setting up a new session...
WARNING:visdom:Without the incoming socket you cannot receive events from the server or register event handlers to your Visdom client.
Traceback (most recent call last):
  File "main.py", line 168, in <module>
    fire.Fire()
  File "/home/deepliver4/.conda/envs/py36/lib/python3.6/site-packages/fire/core.py", line 127, in Fire
    component_trace = _Fire(component, args, context, name)
  File "/home/deepliver4/.conda/envs/py36/lib/python3.6/site-packages/fire/core.py", line 366, in _Fire
    component, remaining_args)
  File "/home/deepliver4/.conda/envs/py36/lib/python3.6/site-packages/fire/core.py", line 542, in _CallCallable
    result = fn(*varargs, **kwargs)
  File "main.py", line 79, in train
    for epoch in range(opt.max_epoch):
TypeError: 'str' object cannot be interpreted as an integer

iteritems错误

File "main.py", line 171, in
fire.Fire()
File "/home/thinkjoy/anaconda3/lib/python3.6/site-packages/fire/core.py", line 127, in Fire
component_trace = _Fire(component, args, context, name)
File "/home/thinkjoy/anaconda3/lib/python3.6/site-packages/fire/core.py", line 366, in _Fire
component, remaining_args)
File "/home/thinkjoy/anaconda3/lib/python3.6/site-packages/fire/core.py", line 542, in _CallCallable
result = fn(*varargs, **kwargs)
File "main.py", line 49, in train
opt.parse(kwargs)
File "/home/thinkjoy/PycharmProjects/pytorch-best-practice/config.py", line 30, in parse
for k,v in kwargs.iteritems():
AttributeError: 'dict' object has no attribute 'iteritems'

Error in `python': munmap_chunk(): invalid pointer: 0x0000000002a22030

程序在运行的时候出现
"please use transforms.Resize instead.")
/usr/local/lib/python2.7/dist-packages/torchvision/transforms/transforms.py:563: UserWarning: The use of the transforms.RandomSizedCrop transform is deprecated, please use transforms.RandomResizedCrop instead.
"please use transforms.RandomResizedCrop instead.")
1%| | 137/17500 [01:50<3:33:34, 1.35it/s]
1%| | 137/17500 [01:49<3:34:13, 1.35it/s]
1%| | 137/17500 [01:49<3:33:45, 1.35it/s]
1%| | 137/17500 [01:49<3:34:31, 1.35it/s]
1%| | 137/17500 [01:49<3:33:46, 1.35it/s]
1%| | 137/17500 [01:49<3:33:40, 1.35it/s]
1%| | 137/17500 [01:49<3:33:45, 1.35it/s]
1%| | 137/17500 [01:49<3:32:45, 1.36it/s]
1%| | 137/17500 [01:49<3:32:46, 1.36it/s]
1%| | 137/17500 [01:49<3:32:01, 1.36it/s]
*** Error in `python': munmap_chunk(): invalid pointer: 0x0000000002a22030 ***
======= Backtrace: =========
下面还有一大堆
7f17a776c000-7f17a796b000 ---p 0021b000 08:06 92012725 /lib/x86_64-linux-gnu/libcrypto.so.1.0.0
7f17a796b000-7f17a7987000 r--p 0021a000 08:06 92012725 /lib/x86_64-linux-gnu/libcrypto.so.1.0.0已放弃 (核心已转储)
请问这个问题怎么解决?

RuntimeError: cuDNN error: CUDNN_STATUS_ARCH_MISMATCH

在运行python main.py train时出现如下问题,系统环境为ubuntu16.04+cuda9.0+cudnn7.0.5,百度之后发现该问题可能是因为cuda计算能力不够,cudnn需要计算能力达到3.0的cuda,但是cuda9.0的计算能力为2.1,是不足以支持的,但是在配置环境的时候网上有很多教程都是ubuntu16.04+cuda9.0+cudnn7.0.5,想问一下真的是cuda计算能力的问题吗还是别的问题

NameError

python main.py train --data-root=./data/train --use-gpu=True --env=classifier

Traceback (most recent call last):
File "main.py", line 170, in
import fire
File "C:\Users---\Anaconda2\envs\py36\lib\site-packages\fire\core.py", line 127, in Fire
component_trace = _Fire(component, args, context, name)
File "C:\Users---\Anaconda2\envs\py36\lib\site-packages\fire\core.py", line 366, in _Fire
component, remaining_args)
File "C:\Users---\Anaconda2\envs\py36\lib\site-packages\fire\core.py", line 542, in _CallCallable
result = fn(*varargs, **kwargs)
File "main.py", line 48, in train
def train(**kwargs):
NameError: name 'opt' is not defined

为什么val_accuracy始终为50%左右,验证集的混淆矩阵也基本只有一类有值

@chenyuntc 你好,我按照教程的代码自己实践了一下,训练过程中发现visdom的val_accuracy始终在50%左右,验证集的混淆矩阵也基本只有一类有值,我以为自己哪里写错了,又把原代码跑了一遍,发现也是一样的现象,训练过程中的可视化结果如下图,按道理val_accuracy应该会随着训练的进行不断增加,不知道是哪里有问题?如果有遇到类似问题的朋友也请指教一下,先行谢过!
image

发生浮点数溢出问题

在执行的过程中发生了数据溢出,下面是执行过程中的输出:

python main.py train --train-data-root=/home/linux_fhb/data/cat_vs_dog/train --use-gpu --env=classifier
user config:
env classifier
model ResNet34
train_data_root /home/linux_fhb/data/cat_vs_dog/train
test_data_root ./data/test1
load_model_path None
batch_size 32
use_gpu True
num_workers 4
print_freq 20
debug_file /tmp/debug
result_file result.csv
max_epoch 10
lr 0.1
lr_decay 0.95
weight_decay 0.0001
parse <bound method parse of <config.DefaultConfig object at 0x7f3e4a85b400>>
/home/linux_fhb/anaconda3/lib/python3.6/site-packages/torchvision/transforms/transforms.py:188: UserWarning: The use of the transforms.Scale transform is deprecated, please use transforms.Resize instead.
  "please use transforms.Resize instead.")
/home/linux_fhb/anaconda3/lib/python3.6/site-packages/torchvision/transforms/transforms.py:563: UserWarning: The use of the transforms.RandomSizedCrop transform is deprecated, please use transforms.RandomResizedCrop instead.
  "please use transforms.RandomResizedCrop instead.")
  0%|                                                 | 0/17500 [00:00<?, ?it/s]main.py:99: UserWarning: invalid index of a 0-dim tensor. This will be an error in PyTorch 0.5. Use tensor.item() to convert a 0-dim tensor to a Python number
  loss_meter.add(loss.data[0])
  3%|█▏                                   | 547/17500 [02:09<1:05:07,  4.34it/s]
main.py:138: UserWarning: volatile was removed and now has no effect. Use `with torch.no_grad():` instead.
  val_input = Variable(input, volatile=True)
main.py:139: UserWarning: volatile was removed and now has no effect. Use `with torch.no_grad():` instead.
  val_label = Variable(label.type(t.LongTensor), volatile=True)
Traceback (most recent call last):
  File "main.py", line 171, in <module>
    fire.Fire()
  File "/home/linux_fhb/anaconda3/lib/python3.6/site-packages/fire/core.py", line 127, in Fire
    component_trace = _Fire(component, args, context, name)
  File "/home/linux_fhb/anaconda3/lib/python3.6/site-packages/fire/core.py", line 366, in _Fire
    component, remaining_args)
  File "/home/linux_fhb/anaconda3/lib/python3.6/site-packages/fire/core.py", line 542, in _CallCallable
    result = fn(*varargs, **kwargs)
  File "main.py", line 121, in train
    if loss_meter.value()[0] > previous_loss:          
RuntimeError: value cannot be converted to type float without overflow: 10000000000000000159028911097599180468360808563945281389781327557747838772170381060813469985856815104.000000

其中环境的版本号为:

Python 3.6.5 :: Anaconda, Inc.
fire                               0.1.3    
numpy                              1.14.3   
numpydoc                           0.8.0    
torch                              0.4.1    
torchfile                          0.1.0    
torchnet                           0.0.4    
torchvision                        0.2.1    
visdom                             0.1.8.5  

显卡版本为:NVIDIA Corporation GP102 [GeForce GTX 1080 Ti] (rev a1), 11G 显存;

有遇到相同问题的兄弟吗?你们是怎么解决的?

windows下训练loss不下降,

因为我在Python3运行,所以要做一些小的修改,,
win10-64、CPU环境,
1.utils/visualize.py 44行:win=unicode(name) --> win=str(name)
2.main.py 22行: 加 import config
3.main.py 108行:loss_meter.add(loss.data[0]) --> loss_meter.add(loss.item())
4.config.py 10行:load_model_path = 'checkpoints/model.pth' --> load_model_path = None
5.config.py 12行:batch_size = 128 --> batch_size = 8
6.config.py 21行:lr = 0.1 --> lr = 0.001
7.config.py 31行:for k,v in kwargs.iteritems() --> for k,v in kwargs.items()
8.没有执行python -m visdom.server,配置好路径之后直接 python main.py train
打印出loss格式如下,发现loss一直在0.6-1.5之间浮动:
loss: tensor(0.7035, grad_fn=)
也出现了别的同学说的准确率一直在50%左右,也就是学了跟不学一样,

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.