Giter Club home page Giter Club logo

deeplung's People

Contributors

wentaozhu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

deeplung's Issues

about LabelMapping

Hi,
when i use my volume (969648)and label to run the “main.py”,i get this error

”dz = (target[0] - oz[pos[0]]) / anchors[pos[3]]
IndexError: index 4 is out of bounds for axis 0 with size 0”
and the index ,may be 5,6 or 7 in this error。 I have try to print the ”target,oz,anchors ”before run the ”dz = (target[0] - oz[pos[0]]) / anchors[pos[3]]”,but i failed。
meanwhile I can't understand the details in the LabelMapping function. Could you please explain it? thanks very much!

Problems in nodules classification.

Hello, can I only run the code in “nodcls” to complete the benign or malignant classification without detection results. Because I only want to accomplish the diagnosis task.

tofrocwrtdetpepchluna16. Py

Hello, I am now to complete the training and test of the model, the next step to run. / evaluationScript frocwrtdetpepchluna16. py files, before running the file, need to do some what data processing, now I change the path to the project contained in the CSV path, will be submitted to the following error, this file is need to generate its own excuse me?
7f06f02cff0f443957bb3f3b28863c0
Thanks for comment!

about path

Excuse me, does the data preprocessing path need to be written separately? Which of the following two ways is correct?
NO.1
image
NO.2
image
Thank you very much for your help!

Problem about nodule classification using DPN

Recently, I try to run the code you provided about nodule classification using DPN named "main_nodcls.py". But it is strange that the acc and loss (~0.693) does not change around epoch 100, and the acc is low.
The config is:
`

neptime = 2
def get_lr(epoch):
    if epoch < 150 * neptime:
        lr = 0.1
    elif epoch < 250 * neptime:
        lr = 0.01
    else:
        lr = 0.001
    return lr

`

The train log is:
`

Epoch: 136
INFO:root:ep 136 tracc 0.560175054705 lr 0.1 gbtacc 0.890590809628
INFO:root:Saving..
INFO:root:teacc 46.6666666667 bestacc 57.7777777778 ccgbt 0.844444444444 bestgbt 0.888888888889
INFO:root:
Epoch: 137
INFO:root:ep 137 tracc 0.560175054705 lr 0.1 gbtacc 0.890590809628
INFO:root:Saving..
INFO:root:teacc 46.6666666667 bestacc 57.7777777778 ccgbt 0.844444444444 bestgbt 0.888888888889
INFO:root:
Epoch: 138
INFO:root:ep 138 tracc 0.560175054705 lr 0.1 gbtacc 0.890590809628
INFO:root:Saving..
INFO:root:teacc 46.6666666667 bestacc 57.7777777778 ccgbt 0.844444444444 bestgbt 0.888888888889
INFO:root:
Epoch: 139
INFO:root:ep 139 tracc 0.560175054705 lr 0.1 gbtacc 0.890590809628
INFO:root:Saving..
INFO:root:teacc 46.6666666667 bestacc 57.7777777778 ccgbt 0.844444444444 bestgbt 0.888888888889
INFO:root:
Epoch: 140
INFO:root:ep 140 tracc 0.560175054705 lr 0.1 gbtacc 0.890590809628
INFO:root:Saving..
INFO:root:teacc 46.6666666667 bestacc 57.7777777778 ccgbt 0.844444444444 bestgbt 0.888888888889
INFO:root:
Epoch: 141
INFO:root:ep 141 tracc 0.560175054705 lr 0.1 gbtacc 0.890590809628
INFO:root:Saving..
INFO:root:teacc 46.6666666667 bestacc 57.7777777778 ccgbt 0.844444444444 bestgbt 0.888888888889
INFO:root:
Epoch: 142
INFO:root:ep 142 tracc 0.560175054705 lr 0.1 gbtacc 0.890590809628
INFO:root:Saving..
INFO:root:teacc 46.6666666667 bestacc 57.7777777778 ccgbt 0.844444444444 bestgbt 0.888888888889
INFO:root:
Epoch: 143
INFO:root:ep 143 tracc 0.560175054705 lr 0.1 gbtacc 0.890590809628
INFO:root:Saving..
INFO:root:teacc 46.6666666667 bestacc 57.7777777778 ccgbt 0.844444444444 bestgbt 0.888888888889`

Could you please share some time to answer my doubts? What can I do to solve this problem?
Additionally, the "CROPSIZE” is set as 17 in code but 32 in paper, did I misunderstand it?
The attachment of train log is:
log-9.txt

Regarding humanclassification.py and nodclsgbt.py.

Hi, need your help with these:

  1. 85.04% accuracy for nodule classification using GBM, nodule diameter and pixel. According to paper, you are able to achieve 86.12%.
  2. Not able to generate doctor's performance using humanclassification.py. Value of ntot and nacc does not change (value is 0).

Can you list out the probable reasons for these results?

main.py :TypeError: 'float' object cannot be interpreted as an integer

When I run the ./detector/main.py, it reports an error when the code runs to line 178. The data I prepared using the script prepare.py. Could you explain why it is ?Thanks!

Traceback (most recent call last):
  File "main.py", line 388, in <module>
    main()
  File "main.py", line 178, in main
    for i, (data, target, coord) in enumerate(train_loader): # check data consistency
  File "C:\Program Files\Anaconda3\envs\python36\lib\site-packages\torch\utils\data\dataloader.py", line 417, in __iter__
    return DataLoaderIter(self)
  File "C:\Program Files\Anaconda3\envs\python36\lib\site-packages\torch\utils\data\dataloader.py", line 242, in __init__
    self._put_indices()
  File "C:\Program Files\Anaconda3\envs\python36\lib\site-packages\torch\utils\data\dataloader.py", line 290, in _put_indices
    indices = next(self.sample_iter, None)
  File "C:\Program Files\Anaconda3\envs\python36\lib\site-packages\torch\utils\data\sampler.py", line 119, in __iter__
    for idx in self.sampler:
  File "C:\Program Files\Anaconda3\envs\python36\lib\site-packages\torch\utils\data\sampler.py", line 50, in __iter__
    return iter(torch.randperm(len(self.data_source)).long())
TypeError: 'float' object cannot be interpreted as an integer

The problem with _label.npy

In the file

annos = np.array(pandas.read_csv(luna_label))

the luna annotations.csv's order is x,y,z,diam
but in the file
DeepLung/DeepLungDetectionDemo/detector.ipynb

ctdat = np.load('./CT/'+srslst[showid]+'_clean.npy')
ctlab = np.load('./CT/'+srslst[showid]+'_label.npy')
print('Groundtruth')
print(ctdat.shape, ctlab.shape)
for idx in xrange(ctlab.shape[0]):
    if abs(ctlab[idx,0])+abs(ctlab[idx,1])+abs(ctlab[idx,2])+abs(ctlab[idx,3])==0: continue
    fig = plt.figure()
    z, x, y = int(ctlab[idx,0]), int(ctlab[idx,1]), int(ctlab[idx,2])

the order is z,x,y.
Could you tell me where the order has been changed?

The reported total recall rate of 94.6

Dear Team,
In your report, you provide the folowing details for each fold:
"Method Deep 3D Res18 Deep 3D DPN26 Fold 0 0.8610 0.8750 Fold 1 0.8538 0.8783 Fold 2 0.7902 0.8170 Fold 3 0.7863 0.7731 Fold 4 0.8795 0.8850 Fold 5 0.8360 0.8095 Fold 6 0.8959 0.8649 Fold 7 0.8700 0.8816 Fold 8 0.8886 0.8668 Fold 9 0.8041 0.8122"

Each of these numbers is evidently less than 0.9.

Later in the paper, you state that:
"R-CNN has a total recall rate 94.6% for all the detected nodules, while 3D DPN26 Faster R-CNN has a recall rate 95.8%."
2018-03-27 13_18_17-wacv18_final 2 pdf

How were these results calculated, e.g the quoted total recall rates of 94.6% and 95.8%. ?
If each results on each fold is less than 90% how can the average be more than 90%?

Many thanks,

What should I do if I use Your Excellent Work in CPU mode?

Recently, I'm trying to use part of your excellent work in nodule dectection. Be more exact, I use the detection net of Deeplung to test on my CT data. But I need to transform the GPU mode into CPU, what should I modify in the test part of "main.py" in the "detector" folder?
I hope to receive your reply. Thanks.

Execution time for frocwrtdetpepchluna16.py

How long should it take for frocwrtdetpepchluna16.py file to execute for a single fold and eps of about 100 ?
I have set the detp at -1. At -1.5 detp, for a single fold, a single eps and 12 nprocesses took about 15 hours, and yet there was no single result. Is this normal?

Train tianchi datastes

When I use tianchi data set to train detector model;, it often appears "RuntimeError: inconsistent tensor sizes" or "LabelMapping print input_size[2] dimmesion not ok "

Hope to get your help.

when i run ./detector/main.py i find this error

the output of terminal:
CUDA_VISIBLE_DEVICES=0,1 python main.py --model dpn3d26 -b 32 --save-dir res18/retrft969/ --epochs 150 --config config_training9

using gpu 0,1
/home/amax/.local/lib/python2.7/site-packages/torch/cuda/init.py:114: UserWarning:
Found GPU0 TITAN V which requires CUDA_VERSION >= 9000 for
optimal performance and fast startup time, but your PyTorch was compiled
with CUDA_VERSION 8000. Please install the correct PyTorch binary
using instructions from http://pytorch.org

warnings.warn(incorrect_binary_warn % (d, name, 9000, CUDA_VERSION))
/home/amax/.local/lib/python2.7/site-packages/torch/cuda/init.py:114: UserWarning:
Found GPU1 TITAN V which requires CUDA_VERSION >= 9000 for
optimal performance and fast startup time, but your PyTorch was compiled
with CUDA_VERSION 8000. Please install the correct PyTorch binary
using instructions from http://pytorch.org

warnings.warn(incorrect_binary_warn % (d, name, 9000, CUDA_VERSION))
/home/amax/.local/lib/python2.7/site-packages/torch/nn/parallel/data_parallel.py:24: UserWarning:
There is an imbalance between your GPUs. You may want to exclude GPU 0 which
has less than 75% of the memory or cores of GPU 1. You can do so by setting
the device_ids argument to DataParallel, or by setting the CUDA_VISIBLE_DEVICES
environment variable.
warnings.warn(imbalance_warn.format(device_ids[min_pos], device_ids[max_pos]))
['/media/amax/BC1259BD12597D78/sunhaotian/luna16/dataset/subset0/', '/media/amax/BC1259BD12597D78/sunhaotian/luna16/dataset/subset1/', '/media/amax/BC1259BD12597D78/sunhaotian/luna16/dataset/subset2/', '/media/amax/BC1259BD12597D78/sunhaotian/luna16/dataset/subset3/', '/media/amax/BC1259BD12597D78/sunhaotian/luna16/dataset/subset4/', '/media/amax/BC1259BD12597D78/sunhaotian/luna16/dataset/subset5/', '/media/amax/BC1259BD12597D78/sunhaotian/luna16/dataset/subset6/', '/media/amax/BC1259BD12597D78/sunhaotian/luna16/dataset/subset7/', '/media/amax/BC1259BD12597D78/sunhaotian/luna16/dataset/subset8/']
/media/amax/BC1259BD12597D78/sunhaotian/luna16/dataset/subset0/
/media/amax/BC1259BD12597D78/sunhaotian/luna16/dataset/subset1/
/media/amax/BC1259BD12597D78/sunhaotian/luna16/dataset/subset2/
/media/amax/BC1259BD12597D78/sunhaotian/luna16/dataset/subset3/
/media/amax/BC1259BD12597D78/sunhaotian/luna16/dataset/subset4/
/media/amax/BC1259BD12597D78/sunhaotian/luna16/dataset/subset5/
/media/amax/BC1259BD12597D78/sunhaotian/luna16/dataset/subset6/
/media/amax/BC1259BD12597D78/sunhaotian/luna16/dataset/subset7/
/media/amax/BC1259BD12597D78/sunhaotian/luna16/dataset/subset8/
800
800
88
terminate called after throwing an instance of 'std::runtime_error'
what(): NCCL Error 1: unhandled cuda error
已放弃 (核心已转储)

my gpus are 2 nvidia titian v ,total memory is 24g,should i install cuda 9?

How to run frocwrtdetpepchluna16.py?

As we see in frocwrtdetpepchluna16.py ,results_path is a path of a csv which looks like 3DRes18FasterR-CNN.csv in your directory (in the ./evaluationScript/annotations)
But for us ,how to get this csv?
I think this csv can be created by function ‘getcsv()’ in the frocwrtdetpepchluna16.py
but in your code , you comment this function.

best wishes

HERE IS CODE IN frocwrtdetpepchluna16.py

annotations_filename = './annotations/annotations.csv'# path for ground truth annotations for the fold
annotations_excluded_filename = './annotations/annotations_excluded.csv'# path for excluded annotations for the fold
seriesuids_filename = './annotations/seriesuids.csv'# path for seriesuid for the fold
results_path = './'#val' #val' ft96'+'/val'#
sideinfopath = '/media/data1/wentao/tianchi/luna16/preprocess/lunaall/'#subset'+str(fold)+'/' +str(fold)
datapath = '/media/data1/wentao/tianchi/luna16/lunaall/'#subset'+str(fold)+'/'

# getcsv(detp, eps)
def getfroc(detp, eps):

Question on detection performance evaluation with pre-trained models

I was trying to understand how the detection performance, i.e. FROC, was calculated. So I used pre-trained DPN models to generate detection results. For example, the subset0 results were generated with the following bash script:

CUDA_VISIBLE_DEVICES=0,1,2,3 python main.py --gpu '0,1,2,3' -b 16 --model dpn3d26 --resume fd0066.ckpt --test 1 --save-dir results/dpn3d26/retrft960/ --config config_training0

After obtaining detection results for all 10 folds, I used getcsv() in frocwrtdetpepchluna16.py to generate result csv files for each fold. Here I set detp = [-2] as suggested in the paper, and maxeps = 1. I obtained 39,576 candidates in total.

After concatenating 10 csv files, I used noduleCADEvaluationLUNA16.py to calculate FROC, here the annotation_filename is the annotations.csv downloaded from LUNA16 website
seriesuids_filename includes 888 uids for all 10 subsets

I'm not sure how to set up annotations_excluded_filename, so just simply set it as the 3 cases in black_list item of config_training2.py

The FROC I obtained was around 0.74, which is much lower than the result reported in the paper. I guess I must have missed something, please take a look. I also attached CADAnalysis.txt for your reference. Thank you!

CADAnalysis.txt

Inference produces millions of bounding boxes

Dear team,
I am trying to run inference, I did NOT train the detector, I used the CHK files that you provided. For instance, chk064 or res18fd9020.ckpt which is under detector/dpnmodel/ and detector/resmodel/ were used like so:

#!/bin/bash
set -e

# python prepare.py
cd detector
maxeps=2
f=9
CUDA_VISIBLE_DEVICES=0,1

/usr/bin/python3.5 main.py --model res18 -b 4 --resume res18fd9020.ckpt --test 1 --save-dir res18/retrft96$f/ --epochs $maxeps --config con
fig_training$f
  1. Is this the correct way to run inference? if not can you please indicate how to properly do it?
  2. Why do the resulting PBB and LBB contain millions of bounding boxes? is that because the CHK files mentioned above are not good enough?

Many thanks,

RuntimeError: cuda runtime error (77) : an illegal memory access was encountered

Dear Zhu,
when run run_training.sh, I encountered this error, and I have no idea how to deal with it. Could you give me some advice?
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1518238409320/work/torch/lib/THC/generic/THCStorage.cu line=58 error=77 : an illegal memory access was encountered
Traceback (most recent call last):
File "main.py", line 389, in
main()
File "main.py", line 206, in main
train(train_loader, net, loss, epoch, optimizer, get_lr, args.save_freq, save_dir)
File "main.py", line 224, in train
output = net(data, coord)
File "/home/qyan0710/software/anaconda2/lib/python2.7/site-packages/torch/nn/modules/module.py", line 357, in call
result = self.forward(*input, **kwargs)
File "/share/qyan0710/work/DeepLung/detector/res18.py", line 97, in forward
out = self.preBlock(x)#16
File "/home/qyan0710/software/anaconda2/lib/python2.7/site-packages/torch/nn/modules/module.py", line 357, in call
result = self.forward(*input, **kwargs)
File "/home/qyan0710/software/anaconda2/lib/python2.7/site-packages/torch/nn/modules/container.py", line 67, in forward
input = module(input)
File "/home/qyan0710/software/anaconda2/lib/python2.7/site-packages/torch/nn/modules/module.py", line 357, in call
result = self.forward(*input, **kwargs)
File "/home/qyan0710/software/anaconda2/lib/python2.7/site-packages/torch/nn/modules/conv.py", line 388, in forward
self.padding, self.dilation, self.groups)
File "/home/qyan0710/software/anaconda2/lib/python2.7/site-packages/torch/nn/functional.py", line 126, in conv3d
return f(input, weight, bias)
RuntimeError: cuda runtime error (77) : an illegal memory access was encountered at /opt/conda/conda-bld/pytorch_1518238409320/work/torch/lib/THC/generic/THCStorage.cu:58

Could you please explain the PATH in those config files clearly?

In config_training0(~9).py under the master folder, some data paths are given. But I'm a littele confused about the difference among the 10 files, for example:
in config_training0.py, 'train_preprocess_result_path':

'/media/data1/wentao/tianchi/luna16/preprocess/';

but in config_training1.py,

'train_preprocess_result_path':'/media/data1/wentao/tianchi/luna16/preprocess/lunaall/'.

I wonder why the paths are different while the subsets of LUNA16 dataset has equal position?
Moreover, for the bbox_path, why val & test bbox_paths are different between the train_bbox_path?
such as in config_training4.py:

'train_bbox_path':'/media/data1/wentao/tianchi/bbox/train/',
'val_bbox_path':'/media/data1/wentao/tianchi/val/',
'test_bbox_path':'/media/data1/wentao/tianchi/test/',

These paths are important for training, would you please share some time to explain your consideration?

runtime error

My computer is a single GPU 5 gb of memory And it has been modified batch size, normal worker , epoch, and he sent them is set to 1, the computer operation after a period of time appear the runtime error, and prompt memory errors, graphics configuration is not enough or other reasons?
thanks~

about preprocess

why after preprocess,the scan'width,height ,and num of slices all changed,i am very confused about the preprocess,is there any tutorial? ask for help!

Empty paths in frocwrtdetpepchluna16.py

Dear Team,
I am trying to run frocwrtdetpepchluna16.py. Do I have to generate these files for each fold or did you already do that?

annotations_filename = # path for ground truth annotations for the fold
annotations_excluded_filename = # path for excluded annotations for the fold
seriesuids_filename = # path for seriesuid for the fold

Best,

about data extract in nodclsgbt.py

cropdata = np.ones((CROPSIZE, CROPSIZE, CROPSIZE))*170
cropdatatmp = np.array(data[0, bgx:bgx+CROPSIZE, bgy:bgy+CROPSIZE, bgz:bgz+CROPSIZE])
cropdata[CROPSIZE/2-cropdatatmp.shape[0]/2:CROPSIZE/2-cropdatatmp.shape[0]/2+cropdatatmp.shape[0],
CROPSIZE/2-cropdatatmp.shape[1]/2:CROPSIZE/2-cropdatatmp.shape[1]/2+cropdatatmp.shape[1],
CROPSIZE/2-cropdatatmp.shape[2]/2:CROPSIZE/2-cropdatatmp.shape[2]/2+cropdatatmp.shape[2]] = np.array(2-cropdatatmp)
i have two questions
1.why 2-cropdatatmp?
2.CROPSIZE/2-cropdatatmp.shape[0]/2 obviously equals to zero?

Confused about Faster R-CNN

This issue is also posted in uci-cbcl/DeepLung#1

I have read the code and paper. The paper mentioned that Faster R-CNN is used in detection part. But I didn't see Faster R-CNN in the folder ./detector . I search for RPN or ROI Pooling but get nothing.

Did you complete Faster-RCNN function in the data preprocessing part?
Or is it just using RCNN? (If my understanding of the code is correct, the original image was partitioned and data-enhanced during preprocessing, and then used as input to the model. Looks like RCNN)

Questions about training batches

Hello, teacher:

Recently running DeepLung project, the following problems occurred:
RuntimeError: cuda runtime error (2) : out of memory at /b/wheel/pytorch-src/torch/lib/THC/generic/THCStorage.cu:66
It can be seen from the data that the GPU memory is too small, so it is recommended to reduce the training times. I want to ask you, where do you look for these variables?

The problem about ./detector/main.py

@wentaozhu

When I ran./detector/main.py, the following error was reported:
Traceback (most recent call last):
File "main.py", line 390, in
main()
File "main.py", line 70, in main
model = import_module(args.model)
File "/home/user/anaconda3/envs/mhqtf/lib/python3.5/importlib/init.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "", line 985, in _gcd_import
File "", line 968, in _find_and_load
File "", line 955, in _find_and_load_unlocked
ImportError: No module named 'base'

My cudn version is 7.5, is it related to the cudn version?

Out of memory when test with CUDA_VISIBLE_DEVICES=0,1 python main.py --model res18 -b 2 --resume results/wzy_res18/retrft96$f/00$i.ckpt --test 1 --save-dir wzy_res18/retrft96$f/ --config config_training$f

process 1 epoch
using gpu 0,1
/home/ustc/anaconda2/lib/python2.7/site-packages/torch/nn/parallel/data_parallel.py:24: UserWarning:
There is an imbalance between your GPUs. You may want to exclude GPU 1 which
has less than 75% of the memory or cores of GPU 0. You can do so by setting
the device_ids argument to DataParallel, or by setting the CUDA_VISIBLE_DEVICES
environment variable.
warnings.warn(imbalance_warn.format(device_ids[min_pos], device_ids[max_pos]))
/home/ustc/lclin/dataset/LUNA16/subset9/
/home/ustc/lclin/dataset/LUNA16/subset1/
/home/ustc/lclin/dataset/LUNA16/subset2/
/home/ustc/lclin/dataset/LUNA16/subset3/
/home/ustc/lclin/dataset/LUNA16/subset4/
/home/ustc/lclin/dataset/LUNA16/subset5/
/home/ustc/lclin/dataset/LUNA16/subset6/
/home/ustc/lclin/dataset/LUNA16/subset7/
/home/ustc/lclin/dataset/LUNA16/subset8/
89
results/wzy_res18/retrft960/bbox
(18, 1, 208, 208, 208)
THCudaCheck FAIL file=/pytorch/aten/src/THC/generic/THCStorage.cu line=58 error=2 : out of memory
Exception NameError: "global name 'FileNotFoundError' is not defined" in <bound method _DataLoaderIter.del of <torch.utils.data.dataloader._DataLoaderIter object at 0x7fb5965d14d0>> ignored
Traceback (most recent call last):
File "main.py", line 389, in
main()
File "main.py", line 148, in main
test(test_loader, net, get_pbb, save_dir, config)
File "main.py", line 334, in test
output = net(input,inputcoord)
File "/home/ustc/anaconda2/lib/python2.7/site-packages/torch/nn/modules/module.py", line 491, in call
result = self.forward(*input, **kwargs)
File "/home/ustc/anaconda2/lib/python2.7/site-packages/torch/nn/parallel/data_parallel.py", line 114, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/home/ustc/anaconda2/lib/python2.7/site-packages/torch/nn/parallel/data_parallel.py", line 124, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/home/ustc/anaconda2/lib/python2.7/site-packages/torch/nn/parallel/parallel_apply.py", line 65, in parallel_apply
raise output
RuntimeError: cuda runtime error (2) : out of memory at /pytorch/aten/src/THC/generic/THCStorage.cu:58

about detection result

hi @wentaozhu, the nodule you detected is more than 130 thousand, but I only got 4 thousand. What's going wrong with the detection? The number worker is 0, The batch size is set to 4. Other parameters are the same with yours.

problem about frocwrtdetpepchluna16.py

i decomment getcsv(), then the program had run about one day, i see the _pbb.npy has millions of bboxs, and the program has been doing nms, is this normal? should i modify the detp in frocwrtdetpepchluna16.py? make it higher?

the output is like this

(torch04) ustc@ustc:~/lclin/code/DeepLung/evaluationScript$ python frocwrtdetpepchluna16.py
write pbb to csv
ep 1 detp -1.5
88
1.3.6.1.4.1.14519.5.2.1.6279.6001.182192086929819295877506541021_pbb.npy

run frocwrtdetpepchluna16.py with tianchi dataset

The Tianchi dataset did not generate *_extendbox.npy in prepare.py.
in frocwrtdetpepchluna16.py, the convertcsv function need to load *_extendbox.npy and do some matrix operations like this:

Pbb[:, 1:] = np.array(pbb[:, 1:] + np.expand_dims(extendbox[:,0], 1).T)
Pbb[:, 1:] = np.array(pbb[:, 1:] * np.expand_dims(resolution, 1).T / np.expand_dims(spacing, 1).T)

How to deal with this step? thanks.

os.popen('stty size', 'r').read().split()

Hello, in the 45th row of DeepLung/nodcls/utils.py : _, term_width = os.popen('stty size', 'r').read().split()。The version of my python is 3.6, the error will occur if it runs:ValueError: not enough values to unpack (expected 2, got 0). I try to print(os.popen('stty size', 'r').read()),but the output is 'stty' ����...Can you help me ? Many thanks.

About the running time

During the detection phase, 50 training batches were set up by running the run_training.sh file, and the program was stuck in the following interface.

I want to ask, what is this program used for, because I only use two gpus, is the program itself needs time, or is the program stuck here?
d44381a8cb38911dd6c5909d32ee6a4
fd894ff92a3f1c312d80d472abcec04

Out of memory for training with GPU

Hi,

In process of repeating the training, I use 4 TITAN X GPUs with the original parameters from the code. The error message shows: "RuntimeError: cuda runtime error (2) : out of memory at /opt/conda/conda-bld/pytorch_1513363039688/work/torch/lib/THC/generic/THCStorage.cu:58 "
How many GPUs are you using for the training? Do you have the estimating memory size it will occupy?

Thanks,
Jingya

AttributeError: type object 'object' has no attribute '__getattr__'

When I run run_train.sh, get the following error:
Traceback (most recent call last):
File "main.py", line 393, in
main()
File "main.py", line 210, in main
train(train_loader, net, loss, epoch, optimizer, get_lr, args.save_freq, save_dir)
File "main.py", line 238, in train
state_dict = net.module.state_dict()
File "/home/jin/anaconda2/envs/lung/lib/python2.7/site-packages/torch/nn/modules/module.py", line 237, in getattr
return object.getattr(self, name)
AttributeError: type object 'object' has no attribute 'getattr'

How to fix it?

RuntimeError: inconsistent tensor sizes

Hi,
When I run the “main.py”,I get this error:

File "main.py", line 230, in train
for i, (data, target, coord) in enumerate(data_loader):
File "/usr/local/lib/python2.7/dist-packages/torch/utils/data/dataloader.py", line 196, in next
return self._process_next_batch(batch)
File "/usr/local/lib/python2.7/dist-packages/torch/utils/data/dataloader.py", line 230, in _process_next_batch
raise batch.exc_type(batch.exc_msg)
RuntimeError: Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/torch/utils/data/dataloader.py", line 42, in _worker_loop
samples = collate_fn([dataset[i] for i in batch_indices])
File "/usr/local/lib/python2.7/dist-packages/torch/utils/data/dataloader.py", line 119, in default_collate
return [default_collate(samples) for samples in transposed]
File "/usr/local/lib/python2.7/dist-packages/torch/utils/data/dataloader.py", line 96, in default_collate
return torch.stack(batch, 0, out=out)
File "/usr/local/lib/python2.7/dist-packages/torch/functional.py", line 66, in stack
return torch.cat(inputs, dim, out=out)
RuntimeError: inconsistent tensor sizes at /pytorch/torch/lib/TH/generic/THTensorMath.c:2864

It works well when running on LUNA16 dataset. But I get this error when I run on the other dataset.
Could you please explain it? Thanks very much!

How to visualize the detected nodules?

@wentaozhu 你好!打扰你一下。
如果我想输入一个病人的肺部CT数据,可视化模型输出的结果,即想输出一张带有模型预测出来的结节的图片。
可以根据"ID_pbb.npy",输出该文件中概率值最大(即p值最大)的那个位置,就是模型预测疑似结节概率最大的那个位置,是这样的吗?

ID_pbb.npy中的数据分别为[p,x,y,z,r]

Out of memory issue on a single GPU

RuntimeError: CUDA error: out of memory
How to handle this error for running test in run_training.sh on a single gpu, with batch size set to 1?

Questions about ./DeepLung/detector/data.py

In the file ./DeepLung/detector/data.py,
def len(self):
if self.phase == 'train':
return len(self.bboxes)/(1-self.r_rand)
Why allow data with an index length greater than len(self.bboxes)?
And when idx>=len(self.bboxes), isRand=True.
idx = idx%len(self.bboxes)
bbox = self.bboxes[idx]
filename = self.filenames[int(bbox[0])]
imgs = np.load(filename)
self.crop(imgs, bbox[1:], bboxes,isScale,isRandom)
In the class Crop, why is target set to nan ?
target = np.array([np.nan,np.nan,np.nan,np.nan])

This part I really do not understand.

about gpu

hi,dr.zhu.When I run your code, the memory is not enough, I want to change the size of the original picture, and then carry out pre-training, but I have been unable to write MHD files, can you give me some help?

Hi, I have two questions: (1) why are 'val_data_path' and 'test_data_path' in config_training.py the same? Generally, they should be different. (2) due to limited GPU memory, I used a for loop to replace '_=pool.map(partial_savenpy_luna,range(N)' with 'savenpy_luna(id=0, annos=annos, filelist=[single_file],....)' in prepare.py. However, I could get two '_clean.npy' results with the same filename but different image shape (e.g., [1,342,275,315] and [1,342,278,315])'. Are there some advices about the phenomenon?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.