I have been trying to reproduce the results mentioned in the paper using the same libr

hey guys <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard

<a class="user-mention notranslate" data-hovercard-type="user" data-hover

inferior results on all datasets compared to that reported in the paper,about cory-m/dccm

Comments (12)

Cory-M commented on July 29, 2024 2

hey guys @kartikgupta-at-anu @sumo8291 @TsungWeiTsai
the log file I attached above was found in the previous repository, and some modification to the code had been made later so there are some differences in formats. For instance, the "wd" argument that appeared in the log file was actually not used in the training so we removed it in later edition.
For the released code, we just re-ran the cifar10/100 again yesterday and here are the current results:
test_cifar10_041.log
test_cifar100.log
Please kindly refer to these files for this repo. It is trained on PyTorch0.4.1; we tried running it on PyTorch 0.4.0; the results were indeed not optimal and requires hyperparameter tuning (but we are also able to find the right hyperparameters for the reported result). We don't know the reason for this observation yet, but we strongly encourage you to try on PyTorch 0.4.1 and maybe explore the reasons for the instability.

from dccm.

sKamiJ commented on July 29, 2024 2

I also got a different result. I am using a newer version of PyTorch (1.3) and Keras (2.3.1), but the result seems to be similar to the one from Kartik. Also, the training speed is slow, it took me around 1 hour for each epoch on my (single) GeForce RTX 2080 Ti. Besides, I've changed the Keras backend to "channel_first" but still got the warning, would it be the reason why the numbers don't match? Thank you.

Some log for CIFAR10:

[2020-04-16 07:52:12,916][main.py][line:276][INFO][rank:0] Epoch: [10/200] ARI against ground truth label: 0.057
[2020-04-16 07:52:12,916][main.py][line:277][INFO][rank:0] Epoch: [10/200] NMI against ground truth label: 0.112
[2020-04-16 07:52:12,916][main.py][line:278][INFO][rank:0] Epoch: [10/200] ACC against ground truth label: 0.219
/data/anaconda3/envs/torch/lib/python3.7/site-packages/keras_preprocessing/image/numpy_array_iterator.py:127: UserWarning: NumpyArrayIterat
or is set to use the data format convention "channels_last" (channels on axis 3), i.e. expected either 1, 3, or 4 channels on axis 3. However, it w
as passed an array with shape (32, 3, 32, 32) (32 channels).
str(self.x.shape[channels_axis]) + ' channels).')
[2020-04-16 07:53:46,243][main.py][line:238][INFO][rank:0] Epoch: [10/200][0/50] Time: 0.299 (0.306) Data: 1.662 (1.662) Loss: 0.415
4 graph: 0.2718 label: 0.0156 local: 1.3074
[2020-04-16 08:09:45,322][main.py][line:238][INFO][rank:0] Epoch: [10/200][10/50] Time: 0.409 (0.319) Data: 0.115 (0.139) Loss: 0.460
0 graph: 0.2831 label: 0.0222 local: 1.3135
[2020-04-16 08:25:46,428][main.py][line:238][INFO][rank:0] Epoch: [10/200][20/50] Time: 0.267 (0.331) Data: 0.114 (0.133) Loss: 0.391
4 graph: 0.2567 label: 0.0139 local: 1.3009
[2020-04-16 08:41:44,206][main.py][line:238][INFO][rank:0] Epoch: [10/200][30/50] Time: 0.395 (0.305) Data: 0.133 (0.121) Loss: 0.426
6 graph: 0.2700 label: 0.0187 local: 1.2642
[2020-04-16 08:57:34,701][main.py][line:238][INFO][rank:0] Epoch: [10/200][40/50] Time: 0.227 (0.286) Data: 0.131 (0.135) Loss: 0.387
6 graph: 0.2355 label: 0.0178 local: 1.2629

@TsungWeiTsai This is the code of ImageDataGenerator in Keras 2.3.1:

def init(self,
...
data_format='channels_last',
...):
if data_format is None:
data_format = backend.image_data_format()

Note that the default value of data_format is set to 'channels_last' instead of None, so it won't call backend.image_data_format() to get the image_data_format you set.
Just manually set data_format='channels_first' when you create ImageDataGenerator.

from dccm.

kartikgupta-at-anu commented on July 29, 2024 1

All the libraries have the correct version in my installation, also I have used the exact same seed as you mentioned. Such large variation should not be related to randomness. Are you yourself able to reproduce your results again now? Are you sure you have put the correct hyper parameters in the config file? What is this parameter alpha in your log file name?

from dccm.

sumo8291 commented on July 29, 2024 1

After having gone through your log files... there seems to be some differences which I could observe.
(1) when you print out the arguments for architecture in your log files.. 'arch' has 'dac' assigned to it while in the yaml file.. the value of 'arch' is 'cifar_C4_L2'.
(2) There is no such argument called "arch_dim" in either main.py or in the yaml files.
(3) Neither your namespace printout has "c_layer", "layers" and "classifier" as an argument.

Further I hope "alpha" and "coeff[label]" are similar according to your yaml files. However does "coeff[local]" correspond to the hyerparameter beta in the Equation 15 of the paper? If so, could you please clarify its desired value as in the paper ... you have set the value of beta to 0.1, while in the yaml files ... "coeff[local]" value is set to 0.05?

Next, in the log file provided by you, you have provided the value of weight_decay 'wd' to be 1e-05, while in the 'main.py', torch.optim.RMSProp uses the default value 0 as the weight_decay.

So it seems there is something wrong with the code or the log files which you seem to have provided.

hi, attached is the log of CIFAR10 of the 50 Epoch: (the whole log for cifar10/100 is attached below)
test_cifar10_1k_32_alpha005_thres95.log
test_cifar100_1k_32_alpha005_thres95.log

[2019-03-19 16:40:36,527][main_cifar10_bs_1k_32_alpha005_thres95.py][line:385][INFO][rank:0] Epoch: [51][0/60] Time: 3.951 (3.951) Data: 3.921 (3.921)
[2019-03-19 16:40:39,241][main_cifar10_bs_1k_32_alpha005_thres95.py][line:385][INFO][rank:0] Epoch: [51][10/60] Time: 0.079 (0.601) Data: 0.055 (0.576)
[2019-03-19 16:40:40,252][main_cifar10_bs_1k_32_alpha005_thres95.py][line:385][INFO][rank:0] Epoch: [51][20/60] Time: 0.118 (0.361) Data: 0.094 (0.337)
[2019-03-19 16:40:41,783][main_cifar10_bs_1k_32_alpha005_thres95.py][line:385][INFO][rank:0] Epoch: [51][30/60] Time: 0.178 (0.292) Data: 0.155 (0.268)
[2019-03-19 16:40:43,759][main_cifar10_bs_1k_32_alpha005_thres95.py][line:385][INFO][rank:0] Epoch: [51][40/60] Time: 0.212 (0.268) Data: 0.189 (0.244)
[2019-03-19 16:40:46,195][main_cifar10_bs_1k_32_alpha005_thres95.py][line:385][INFO][rank:0] Epoch: [51][50/60] Time: 0.257 (0.262) Data: 0.234 (0.238)
[2019-03-19 16:40:49,632][main_cifar10_bs_1k_32_alpha005_thres95.py][line:227][INFO][rank:0] ARI against ground truth label: 0.407
[2019-03-19 16:40:49,633][main_cifar10_bs_1k_32_alpha005_thres95.py][line:228][INFO][rank:0] NMI against ground truth label: 0.496
[2019-03-19 16:40:49,633][main_cifar10_bs_1k_32_alpha005_thres95.py][line:229][INFO][rank:0] ACC against ground truth label: 0.621
[2019-03-19 16:40:49,633][main_cifar10_bs_1k_32_alpha005_thres95.py][line:230][INFO][rank:0] #######################

It doesn't require that many epochs to get the reported results. I suggest you check the following configs: the version of PyTorch, Cuda, the seed of NumPy, the seed of PyTorch, and etc. It's true that the results seem to be really sensitive, we don't know why yet, but it does change a lot with tiny modifications to configurations, which could be left as potential future works. A lot of clustering-based methods suffer the same issue of instability. Please let me know if you are still struggling reproducing the result, maybe I could help checking with you.

from dccm.

Cory-M commented on July 29, 2024 1

@Cory-M since this (as your latest log files seem to have got the correct results) seems related to some library version mismatch. I have tried using exact same version of libraries. The best is if you can provide a copy of "pip list" so that I can try exact same versions of all other libraries which you have not mentioned. Also let me know what are the cuda and cudnn versions you are using? Currently I have already been using Python 3.6.5, torch 0.4.1, keras 2.0.2. For example, look at my pip list:
absl-py 0.9.0
bleach 1.5.0
certifi 2020.4.5.1
cycler 0.10.0
decorator 4.4.2
easydict 1.9
html5lib 0.9999999
imageio 2.8.0
joblib 0.14.1
Keras 2.0.2
kiwisolver 1.2.0
lmdb 0.94
Markdown 3.2.1
matplotlib 3.2.1
networkx 2.4
numpy 1.16.2
opencv-python 4.0.0.21
Pillow 7.1.1
pip 20.0.2
plyfile 0.7
protobuf 3.11.3
pyparsing 2.4.7
python-dateutil 2.8.1
PyWavelets 1.1.1
PyYAML 5.3.1
scikit-image 0.16.2
scikit-learn 0.22.2.post1
scipy 1.4.1
setuptools 46.1.3.post20200330
six 1.14.0
sklearn 0.0
tensorboardX 1.2
tensorflow-gpu 1.5.0
tensorflow-tensorboard 1.5.1
Theano 1.0.4
torch 0.4.1
torchvision 0.2.1
Werkzeug 1.0.1
wheel 0.34.2

Another thing you could do is to upload data directory at least for cifar10/cifar100 on google drive/drop box so that we can copy the exact same setup to reproduce.

Hey @kartikgupta-at-anu , we uploaded the data to:
https://drive.google.com/file/d/1gP67avGLl5zpeeX1kvkspQdJ3pBwfBNl/view?usp=sharing
and here's how we tried to set up everything from scratch:

Install anaconda3; create a virtual env
install pytorch0.4.1 by 'conda install pytorch==0.4.1 torchvision==0.2.1 cuda80 -c pytorch'
pip install tensorboardX
pip install keras==2.0.2
revise ~/.keras/keras.json to following form:
{
"floatx": "float32",
"epsilon": 1e-07,
"backend": "theano",
"image_dim_ordering": "th",
"image_data_format": "channels_first"
}
install other dependencies.

Here is the pip list:
pip_list.txt

Hope that helps.

from dccm.

Cory-M commented on July 29, 2024

hi, attached is the log of CIFAR10 of the 50 Epoch: (the whole log for cifar10/100 is attached below)
test_cifar10_1k_32_alpha005_thres95.log
test_cifar100_1k_32_alpha005_thres95.log

[2019-03-19 16:40:36,527][main_cifar10_bs_1k_32_alpha005_thres95.py][line:385][INFO][rank:0] Epoch: [51][0/60] Time: 3.951 (3.951) Data: 3.921 (3.921)
[2019-03-19 16:40:39,241][main_cifar10_bs_1k_32_alpha005_thres95.py][line:385][INFO][rank:0] Epoch: [51][10/60] Time: 0.079 (0.601) Data: 0.055 (0.576)
[2019-03-19 16:40:40,252][main_cifar10_bs_1k_32_alpha005_thres95.py][line:385][INFO][rank:0] Epoch: [51][20/60] Time: 0.118 (0.361) Data: 0.094 (0.337)
[2019-03-19 16:40:41,783][main_cifar10_bs_1k_32_alpha005_thres95.py][line:385][INFO][rank:0] Epoch: [51][30/60] Time: 0.178 (0.292) Data: 0.155 (0.268)
[2019-03-19 16:40:43,759][main_cifar10_bs_1k_32_alpha005_thres95.py][line:385][INFO][rank:0] Epoch: [51][40/60] Time: 0.212 (0.268) Data: 0.189 (0.244)
[2019-03-19 16:40:46,195][main_cifar10_bs_1k_32_alpha005_thres95.py][line:385][INFO][rank:0] Epoch: [51][50/60] Time: 0.257 (0.262) Data: 0.234 (0.238)
[2019-03-19 16:40:49,632][main_cifar10_bs_1k_32_alpha005_thres95.py][line:227][INFO][rank:0] ARI against ground truth label: 0.407
[2019-03-19 16:40:49,633][main_cifar10_bs_1k_32_alpha005_thres95.py][line:228][INFO][rank:0] NMI against ground truth label: 0.496
[2019-03-19 16:40:49,633][main_cifar10_bs_1k_32_alpha005_thres95.py][line:229][INFO][rank:0] ACC against ground truth label: 0.621
[2019-03-19 16:40:49,633][main_cifar10_bs_1k_32_alpha005_thres95.py][line:230][INFO][rank:0] #######################

It doesn't require that many epochs to get the reported results. I suggest you check the following configs: the version of PyTorch, Cuda, the seed of NumPy, the seed of PyTorch, and etc. It's true that the results seem to be really sensitive, we don't know why yet, but it does change a lot with tiny modifications to configurations, which could be left as potential future works. A lot of clustering-based methods suffer the same issue of instability. Please let me know if you are still struggling reproducing the result, maybe I could help checking with you.

from dccm.

Cory-M commented on July 29, 2024

yes we can, we've tested it a lot of times before releasing...

from dccm.

TsungWeiTsai commented on July 29, 2024

I also got a different result. I am using a newer version of PyTorch (1.3) and Keras (2.3.1), but the result seems to be similar to the one from Kartik. Also, the training speed is slow, it took me around 1 hour for each epoch on my (single) GeForce RTX 2080 Ti. Besides, I've changed the Keras backend to "channel_first" but still got the warning, would it be the reason why the numbers don't match? Thank you.

Some log for CIFAR10:

[2020-04-16 07:52:12,916][main.py][line:276][INFO][rank:0] Epoch: [10/200] ARI against ground truth label: 0.057
[2020-04-16 07:52:12,916][main.py][line:277][INFO][rank:0] Epoch: [10/200] NMI against ground truth label: 0.112
[2020-04-16 07:52:12,916][main.py][line:278][INFO][rank:0] Epoch: [10/200] ACC against ground truth label: 0.219
/data/anaconda3/envs/torch/lib/python3.7/site-packages/keras_preprocessing/image/numpy_array_iterator.py:127: UserWarning: NumpyArrayIterat
or is set to use the data format convention "channels_last" (channels on axis 3), i.e. expected either 1, 3, or 4 channels on axis 3. However, it w
as passed an array with shape (32, 3, 32, 32) (32 channels).
str(self.x.shape[channels_axis]) + ' channels).')
[2020-04-16 07:53:46,243][main.py][line:238][INFO][rank:0] Epoch: [10/200][0/50] Time: 0.299 (0.306) Data: 1.662 (1.662) Loss: 0.415
4 graph: 0.2718 label: 0.0156 local: 1.3074
[2020-04-16 08:09:45,322][main.py][line:238][INFO][rank:0] Epoch: [10/200][10/50] Time: 0.409 (0.319) Data: 0.115 (0.139) Loss: 0.460
0 graph: 0.2831 label: 0.0222 local: 1.3135
[2020-04-16 08:25:46,428][main.py][line:238][INFO][rank:0] Epoch: [10/200][20/50] Time: 0.267 (0.331) Data: 0.114 (0.133) Loss: 0.391
4 graph: 0.2567 label: 0.0139 local: 1.3009
[2020-04-16 08:41:44,206][main.py][line:238][INFO][rank:0] Epoch: [10/200][30/50] Time: 0.395 (0.305) Data: 0.133 (0.121) Loss: 0.426
6 graph: 0.2700 label: 0.0187 local: 1.2642
[2020-04-16 08:57:34,701][main.py][line:238][INFO][rank:0] Epoch: [10/200][40/50] Time: 0.227 (0.286) Data: 0.131 (0.135) Loss: 0.387
6 graph: 0.2355 label: 0.0178 local: 1.2629

from dccm.

Cory-M commented on July 29, 2024

All the libraries have the correct version in my installation, also I have used the exact same seed as you mentioned. Such large variation should not be related to randomness. Are you yourself able to reproduce your results again now? Are you sure you have put the correct hyper parameters in the config file? What is this parameter alpha in your log file name?

@kartikgupta-at-anu Most of the clustering methods suffer this problem of instability and large variation of performance, which is one of their defects. They are commonly super sensitive to initialization and hyperparameters, especially on smaller datasets like CIFAR: the accuracy would vary a lot if a part of the samples is misclassified because of the bad initialization.

from dccm.

Cory-M commented on July 29, 2024

I also got a different result. I am using a newer version of PyTorch (1.3) and Keras (2.3.1), but the result seems to be similar to the one from Kartik. Also, the training speed is slow, it took me around 1 hour for each epoch on my (single) GeForce RTX 2080 Ti. Besides, I've changed the Keras backend to "channel_first" but still got the warning, would it be the reason why the numbers don't match? Thank you.

Some log for CIFAR10:

[2020-04-16 07:52:12,916][main.py][line:276][INFO][rank:0] Epoch: [10/200] ARI against ground truth label: 0.057
[2020-04-16 07:52:12,916][main.py][line:277][INFO][rank:0] Epoch: [10/200] NMI against ground truth label: 0.112
[2020-04-16 07:52:12,916][main.py][line:278][INFO][rank:0] Epoch: [10/200] ACC against ground truth label: 0.219
/data/anaconda3/envs/torch/lib/python3.7/site-packages/keras_preprocessing/image/numpy_array_iterator.py:127: UserWarning: NumpyArrayIterat
or is set to use the data format convention "channels_last" (channels on axis 3), i.e. expected either 1, 3, or 4 channels on axis 3. However, it w
as passed an array with shape (32, 3, 32, 32) (32 channels).
str(self.x.shape[channels_axis]) + ' channels).')
[2020-04-16 07:53:46,243][main.py][line:238][INFO][rank:0] Epoch: [10/200][0/50] Time: 0.299 (0.306) Data: 1.662 (1.662) Loss: 0.415
4 graph: 0.2718 label: 0.0156 local: 1.3074
[2020-04-16 08:09:45,322][main.py][line:238][INFO][rank:0] Epoch: [10/200][10/50] Time: 0.409 (0.319) Data: 0.115 (0.139) Loss: 0.460
0 graph: 0.2831 label: 0.0222 local: 1.3135
[2020-04-16 08:25:46,428][main.py][line:238][INFO][rank:0] Epoch: [10/200][20/50] Time: 0.267 (0.331) Data: 0.114 (0.133) Loss: 0.391
4 graph: 0.2567 label: 0.0139 local: 1.3009
[2020-04-16 08:41:44,206][main.py][line:238][INFO][rank:0] Epoch: [10/200][30/50] Time: 0.395 (0.305) Data: 0.133 (0.121) Loss: 0.426
6 graph: 0.2700 label: 0.0187 local: 1.2642
[2020-04-16 08:57:34,701][main.py][line:238][INFO][rank:0] Epoch: [10/200][40/50] Time: 0.227 (0.286) Data: 0.131 (0.135) Loss: 0.387
6 graph: 0.2355 label: 0.0178 local: 1.2629

@TsungWeiTsai hi, I checked my log and found that we didn't encounter this warning before. I am not that sure but I would say it could be.... I think you could try setting a breakpoint and check whether the images are augmented correctly (for the keras dataloader)

from dccm.

kartikgupta-at-anu commented on July 29, 2024

@Cory-M since this (as your latest log files seem to have got the correct results) seems related to some library version mismatch. I have tried using exact same version of libraries. The best is if you can provide a copy of "pip list" so that I can try exact same versions of all other libraries which you have not mentioned. Also let me know what are the cuda and cudnn versions you are using? Currently I have already been using Python 3.6.5, torch 0.4.1, keras 2.0.2. For example, look at my pip list:
absl-py 0.9.0
bleach 1.5.0
certifi 2020.4.5.1
cycler 0.10.0
decorator 4.4.2
easydict 1.9
html5lib 0.9999999
imageio 2.8.0
joblib 0.14.1
Keras 2.0.2
kiwisolver 1.2.0
lmdb 0.94
Markdown 3.2.1
matplotlib 3.2.1
networkx 2.4
numpy 1.16.2
opencv-python 4.0.0.21
Pillow 7.1.1
pip 20.0.2
plyfile 0.7
protobuf 3.11.3
pyparsing 2.4.7
python-dateutil 2.8.1
PyWavelets 1.1.1
PyYAML 5.3.1
scikit-image 0.16.2
scikit-learn 0.22.2.post1
scipy 1.4.1
setuptools 46.1.3.post20200330
six 1.14.0
sklearn 0.0
tensorboardX 1.2
tensorflow-gpu 1.5.0
tensorflow-tensorboard 1.5.1
Theano 1.0.4
torch 0.4.1
torchvision 0.2.1
Werkzeug 1.0.1
wheel 0.34.2

Another thing you could do is to upload data directory at least for cifar10/cifar100 on google drive/drop box so that we can copy the exact same setup to reproduce.

from dccm.

kartikgupta-at-anu commented on July 29, 2024

I tried with exact same libraries set and your dataset. But still could not reproduce the results. If anyone else except @Cory-M gets to reproduce, please let me know.
I have attached my log file so that other people can see whether they get inferior results similar to mine.
cifar10_log (copy).txt

My pip list:
certifi 2020.4.5.1
cffi 1.14.0
cycler 0.10.0
decorator 4.4.2
easydict 1.9
imageio 2.8.0
joblib 0.14.1
Keras 2.0.2
kiwisolver 1.2.0
lmdb 0.94
matplotlib 3.2.1
mkl-fft 1.0.15
mkl-random 1.1.0
mkl-service 2.3.0
networkx 2.4
numpy 1.18.1
olefile 0.46
opencv-python 4.0.0.21
Pillow 6.2.2
pip 20.0.2
plyfile 0.7
protobuf 3.11.3
pycparser 2.20
pyparsing 2.4.7
python-dateutil 2.8.1
PyWavelets 1.1.1
PyYAML 5.3.1
scikit-image 0.16.2
scikit-learn 0.22.2.post1
scipy 1.4.1
setuptools 46.1.3.post20200330
six 1.14.0
sklearn 0.0
tensorboardX 2.0
Theano 1.0.4
torch 0.4.1
torchvision 0.2.1
wheel 0.34.2

from dccm.

inferior results on all datasets compared to that reported in the paper about dccm HOT 12 CLOSED

Comments (12)

Related Issues (11)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent