Hi! Thank you for your EXCELLENT job! It really helps a lot! I have a quesiont abo

Does the amount of training data in MBM_pretrain_stage matter a lot? about mind-vis HOT 4 CLOSED

dlmdlmacy commented on July 21, 2024

Does the amount of training data in MBM_pretrain_stage matter a lot?

from mind-vis.

Comments (4)

dlmdlmacy commented on July 21, 2024 2

OMG!! Congratulations!!!! And thank you for your detailed explanations!!!!!

from mind-vis.

zjc062 commented on July 21, 2024

Thank you for your question and kind words! We're glad that you find our work helpful.

Regarding your question about SC-MBM, we did not specifically test pretraining the model using only the paired dataset of 6360 fMRI samples. However, we did find that using a large dataset was crucial for achieving good results. In our experiments, we found that not pretraining the model at all (as shown in Table 1, Model 1 & 2) resulted in poor performance.

In our experience, the number of training samples is important for effective representation learning, and using a much smaller dataset may limit the ability of the model to learn meaningful representations. However, if you do decide to use the 6360 fMRI samples for pretraining, data augmentation may help to increase the effective size of the dataset. Additionally, you may need to adjust the size of the encoder/decoder in SC-MBM.

Good luck with your project!

from mind-vis.

dlmdlmacy commented on July 21, 2024

Thank you for your question and kind words! We're glad that you find our work helpful.

Regarding your question about SC-MBM, we did not specifically test pretraining the model using only the paired dataset of 6360 fMRI samples. However, we did find that using a large dataset was crucial for achieving good results. In our experiments, we found that not pretraining the model at all (as shown in Table 1, Model 1 & 2) resulted in poor performance.

In our experience, the number of training samples is important for effective representation learning, and using a much smaller dataset may limit the ability of the model to learn meaningful representations. However, if you do decide to use the 6360 fMRI samples for pretraining, data augmentation may help to increase the effective size of the dataset. Additionally, you may need to adjust the size of the encoder/decoder in SC-MBM.

Good luck with your project!

Thank you sooooo much for your detailed answer and useful suggestions! And I have two more questions to bother, thank you for your time!

If I want to cite this paper, is there any citations besides arxiv link now?
I found this line of code is executed when I changed nothing and ran stageA2_mbm_finetune.py:

mind-vis/code/stageA2_mbm_finetune.py

Line 134 in 7aece57

test_set.fmri = test_set.fmri[:, :num_voxels]

I can understand that this might because of the model is pretrained on solely HCP dataset, whose length of data is 4192. However, since the Appendix E has demonstrated that cutting may not be the best way to handle unequal length, I was wondering will this be harmful to fineruning?
Thank you again! It would be a more beautiful world if every research team can be as professional and patient as you!

Because my English writing is not so good, I will translate into Chinese to avoid misunderstanding:
感谢您耐心细致的解答和相当实用的建议！我还有两个问题想要叨扰：
1、如果我想引用这篇文章，现在是否只有arxiv链接可用？
2、我看到stageA2_mbm_finetune.py中的这行代码被运行了：

mind-vis/code/stageA2_mbm_finetune.py

Line 134 in 7aece57

test_set.fmri = test_set.fmri[:, :num_voxels]

我猜测原因是因为提供的pretrains/GOD/fmri_encoder是在HCP上训练的，而HCP数据集上的数据长度为4192。而GOD数据集sub3的长度为4656。但是鉴于附录里提到直接对数据进行剪裁会影响性能，所以我在想执行这一行代码会不会影响finetune的质量呢？
感谢您的回复！如果每位研究者都能像您的team一样专业、耐心，科研狗的世界会更加美好:)

from mind-vis.

zjc062 commented on July 21, 2024

哈哈哈哈哈谢谢您的夸奖！

Yes, we updated the citation as below:

@InProceedings{Chen_2023_CVPR,
author = {Chen, Zijiao and Qing, Jiaxin and Xiang, Tiange and Yue, Wan Lin and Zhou, Juan Helen},
title = {Seeing Beyond the Brain: Masked Modeling Conditioned Diffusion Model for Human Vision Decoding},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2023}
}

For the second question, most of the subjects have less number of voxels than the HCP one. That's why we trim the end a bit to handle the unequal length. Although there may be some differences, we do not think it will be harmful to fine-tuning. In fact, we found that the number of voxels in each subject does not have a significant impact on performance, as the number of voxels in the BOLD5000 dataset ranges from 1k to 2k, but the performance is still quite good.

Please let us know if there's any further issue :)

from mind-vis.

Does the amount of training data in MBM_pretrain_stage matter a lot? about mind-vis HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent