Giter Club home page Giter Club logo

vpt's People

Contributors

kmnp avatar tsingularity avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

vpt's Issues

vtab1k dataset splits

hi, for the vtab-1k benchmark, we need to use the tensorflow api to get the exact dataset splits, which is quite hard for people from mainland, China.

i was wondering if you could upload the splits txt files to this repo? thx

Will the prompts in layer L enter in the layer L+1?

We can see the forward process in the code below. It seems the prompts will enter into the next layer . If it is true, it seems not consistent with the Fig.2 of the original paper.

def forward_deep_prompt(self, embedding_output):
        attn_weights = []
        hidden_states = None
        weights = None
        B = embedding_output.shape[0]
        num_layers = self.vit_config.transformer["num_layers"]

        for i in range(num_layers):
            if i == 0:
                hidden_states, weights = self.encoder.layer[i](embedding_output)
            else:
                if i <= self.deep_prompt_embeddings.shape[0]:
                    deep_prompt_emb = self.prompt_dropout(self.prompt_proj(
                        self.deep_prompt_embeddings[i-1]).expand(B, -1, -1))

                    hidden_states = torch.cat((
                        hidden_states[:, :1, :],
                        deep_prompt_emb,
                        hidden_states[:, (1+self.num_tokens):, :]
                    ), dim=1)


                hidden_states, weights = self.encoder.layer[i](hidden_states)

            if self.encoder.vis:
                attn_weights.append(weights)

        encoded = self.encoder.encoder_norm(hidden_states)
        return encoded, attn_weights

core dumped ?

image
What does this error mean please? how to solve it, thanks

GPU issue

Can you provide the solution while initializing the model with multiple GPUs?
By using the code for multiple GPUs error-
RuntimeError: Default process group has not been initialized, please make sure to call init_process_group.

Several warnings from Tensorflow during VTAB experiments.

Thanks for the wonderful code and paper!

I am trying to run VTAB experiments. But I got the error message below.

The code can be running on sever, but it seems they mostly use CPU and the performance is bad (I follow vtab-dmlab in your demo.md, but the performance is around 20%).

does this error message affect the performance?

Thanks for your help in advance :)

``
[01/24 22:36:29 visual_prompt]: Loading training data (final training data for vtab)..

2023-01-24 22:36:29.146201: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F AVX512_VNNI FMA

To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.

2023-01-24 22:36:29.319868: I tensorflow/core/util/port.cc:104] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDN N_OPTS=0.

2023-01-24 22:36:30.140945: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory;

2023-01-24 22:36:30.141141: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory;

2023-01-24 22:36:30.141192: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.

2023-01-24 22:36:32.136349: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusolver.so.11'; dlerror: libcusolver.so.11: cannot open shared object file: No such file or directory;

2023-01-24 22:36:32.136564: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudnn.so.8'; dlerror: libcudnn.so.8: cannot open shared object file: No such file or directory;

2023-01-24 22:36:32.136631: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1934] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
``

All results for each dataset in Vtab-1k

Hello, I want to konw if you have all results for each dataset with CNN, because I am researching about the prompt learning with CNN. If you have these, I will be thankful.

Ensemble seems degrades the performance on CIFAR-100

Hi, thanks for your great work and thorough experiments. I found that in Fig. 15., the results show ensemble can improve the performance of VPT. I reimplemented the ensemble method in a quick way: I trained two sets of visual prompts in parallel on the CIFAR-100 dataset. And during the testing, I directly ensembled those prompts, i.e., using 2X prompts to perform inference. However, I found the ensemble surely degraded the performance, i.e., two sets of visual prompts (6 tokens each set) achieved accuracies 80.36, 80.85 respectively. But after ensembling those prompts (6*2=12 tokens), the accuracy was 74.75. That's a little strange to me because the result of the ensemble was much worse than the separate results. Could you please share some ideas on this phenomenon? Thanks a lot!

By the way, the best numbers of tokens of VPT are different for different datasets. And it seems that sometimes too many prompts would lead to performance degradation. And I noticed the 5 sets of different prompts were ensembled in the paper, which means maybe the number of prompts is up to 500. Will it degrade the final performance?

Tune with scaled batch size

Hi Dr. Jia,

I noticed that in tune_fgvc/tune_vtab.py, the learning rate is scaled as 【lr = lr / 256 * cfg.DATA.BATCH_SIZE】for choosing the best learning rate while in train.py, this operation is not used (lr is kept as [5, 10, 50, etc]). I wonder the reason for doing this. Are the reported results based on the unscaled learning rate? (lr is given in your fgvc excel file)

Best,

Metric for the total parameters

Thanks for the great work!

I noticed that you are using Total Params as one metric in your paper to measure the trainable parameters, as below.

image

However, I am quite confused about how to derive these scores. For example, LINEAR only tunes the classification heads, so the total trainable parameters should be (sum_of_classes * 768 + sum_of_classes = 0.72M) for the 19 VTAB dataset added together, while FULL should have (85.8 * 19 = 1630.2M) trainable parameters. It seems a little bit far from 1.01x for LINEAR and 19.01x for FULL finetuning.

Besides, is it possible to share the number of prompts for each task used to get the results in Table 4?

Kind regards,
Charles

Training with concat-channel method

Thank you for your good work, I like how the paper validated many ideas particularly for transformers.

I felt intrigued by some of the ablation studies as well, especially the various locations for the prompt. I was trying to replicate all of them but it didn't seem obvious to me how I should go about training the version with additional channel concatenated to the input image as well as having the embedding layer unfrozen. Could the team provide some advice on this?

What is the relationship between train.py and tune*.py?

Hi,in my understanding, train.py is training prompts and heads. I think this is a fine-tuning process, so what is the role of tune*. py? I have read the explanation in README.md"call this one for tuning learning rate and weight decay for a model with a specified transfer type", but I am still a bit confused. Is tune*.py necessary for every task?And are there any order between train.py and tune*.py?

How to use multiple GPUs?

If I set NUM_GPUS =2, there are following mistakes.
Could you please tell me how to use multiple GPUs?

Traceback (most recent call last):
  File "train.py", line 132, in <module>
    main(args)
  File "train.py", line 127, in main
    train(cfg, args)
  File "train.py", line 102, in train
    train_loader, val_loader, test_loader = get_loaders(cfg, logger)
  File "train.py", line 69, in get_loaders
    train_loader = data_loader.construct_trainval_loader(cfg)
  File "/home/haoc/wangyidong/vpt/src/data/loader.py", line 79, in construct_trainval_loader
    drop_last=drop_last,
  File "/home/haoc/wangyidong/vpt/src/data/loader.py", line 39, in _construct_loader
    sampler = DistributedSampler(dataset) if cfg.NUM_GPUS > 1 else None
  File "/home/haoc/miniconda3/envs/prompt/lib/python3.7/site-packages/torch/utils/data/distributed.py", line 65, in __init__
    num_replicas = dist.get_world_size()
  File "/home/haoc/miniconda3/envs/prompt/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 638, in get_world_size
    return _get_group_size(group)
  File "/home/haoc/miniconda3/envs/prompt/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 220, in _get_group_size
    _check_default_pg()
  File "/home/haoc/miniconda3/envs/prompt/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 211, in _check_default_pg
    "Default process group is not initialized"
AssertionError: Default process group is not initialized

What is the use of grid search

Hi,

Thanks for the great work!

image

In tune_vtab.py, I noted the code above. I am confused the use of the code, since the best lr and wd are not saved for final 5 times.

Kind regards,

Total tunable parameters

image
Hello, I'd like to ask you a quesetion, what is the unit of the "Total params" in the experiment table? is it "M"? For example, the "Total params" of VPT-deep is 1.18x, does it mean 1.18M?

Question about the training speed acceleration.

Dear authors,

I sincerely thank you for your impressive work and invaluable contributions. I recently had the opportunity to read your paper, and I have a question regarding the practical implementation of VPT. Specifically, I am curious whether VPT has a training acceleration in practice.

As you know, VPT relies on a learnable shallow visual prompt, which necessitates a full gradient backward pass during training. However, this full backward pass can be more time-consuming than just fine-tuning the head layer. Consequently, I am curious to know whether VPT's training time is comparable to that of FULL-fine-tuning.

I would greatly appreciate it if you could kindly confirm whether my understanding is correct. Thank you again for your time and contribution to the field.

Best regards,
Shun Lu

Is train.py not for pretraining ?

I have saw the issue, so I wonder that this repo did not provide any script or code for pretraining (supervised pretraining, or MAE, MoCo pretraining for imagenet22k) ?

Instead, this repo directly load the pretrained weights and apply for various kind of tuning ?
So, train.py is the script for Tuning on VTAB-Structured, VTAB-Natural, VTAB-Specialized subset ?
And, tune_vtab.py, tune_fgvc.py are also for Tuning on VTAB-caltech101 subset and CUB dataset, respectively ?

Any clarification will be appreciated!!

Optimal hyperparameters

Hello!
The paper contains information that the optimal hyperparameter values for each experiment can be found in Appendix C.
However, there is no such information in Appendix C.
Could you share the optimal hyperparameter values for each experiment to save compute power required for grid search?

How to train successfully in Cifar100

I have tried a long time for training Cifar100, but it still does not work. please help me to run successfully, thanks very much!

1.Have downloaded the Cifar100 dataset:
1660829391508

2.Have downloaded the pretrained_model and rename
1660829299011

3.Configs file:
vpt/configs/prompt/cifar100.yaml

1660829612675

Configs file:
vpt/src/configs/config.py

1660829798311

4.Train model:
/vpt/run.sh

1660829880467

5. Error log
W

How to process VTAB

Hi, I am seeking for help about processing VTAB datasets.

Since it is not convenient for me to directly run your code to download vtab dataset, I borrow the extracted vtab datasets from others, where each file folder (19 datasets in total) contains the following files:

images
--train800
----000000.jpg
----000001.jpg
...
--train800val200
--val200
--test
test.txt
train800.txt
train800val200.txt
val200.txt

, where *.txt contains image path and label such as 'images/train800/000000.jpg 307'. Is this the correct format of the data needed?

Further, even though I have the data, when I run your code, it still tried to download the data using tf, which is intractable for me. Hence, I wonder is there any way for me to run your code when I already have the data (if correct)?

Thanks!

TypeError: int() argument must be a string, a bytes-like object or a number, not 'NoneType'

When I follow the demo.ipynb in running the `for seed in ["42", "44", "82", "100", "800"]:

model_type = f"adapter_{r}"

files = glob.glob(f"{root}/seed{seed}/*/sup_vitb16_imagenet21k/*/*/{LOG_NAME}")
for f in files:
    df = get_df(files, f"seed{seed}", root, is_best=False, is_last=True)
    if df is None:
        continue
    df["seed"] = seed
df_list.append(df)`

There was a error:Traceback (most recent call last): File "/home/isalab303/.conda/envs/vpt/lib/python3.7/code.py", line 90, in runcode exec(code, self.locals) File "<input>", line 4, in <module> File "/home/isalab303/GY/vpt-main/src/utils/vis_utils.py", line 161, in get_df for job_path in tqdm(files, desc=model_type): File "/home/isalab303/GY/vpt-main/src/utils/vis_utils.py", line 120, in get_training_data # "lr": float(lr) * 256 / int(batch_size), TypeError: int() argument must be a string, a bytes-like object or a number, not 'NoneType'
So what should be done to fix this problem?

About the pre-trained VIT model

Hello, why do pre-train the VIT by yourself rather than use the VIT pre-trained by timm or Google? It seems that using their VIT is more convenient.

How to apply data augmentation for VTAB

Hi, I want to apply some data augmentation using torchvision.transform for VTAB dataset.
Does your repo support that? If yes, how to implement it?
Thanks a lot!

AssertionError: Datasets/Stanford_Dogs/train.json dir not found

I run this command and got an Error:

Tune CUB with VPT:

python tune_fgvc.py
--train-type "prompt"
--config-file configs/prompt/dogs.yaml
MODEL.TYPE "vit"
DATA.BATCH_SIZE "16"
MODEL.PROMPT.DEEP "True"
MODEL.PROMPT.DROPOUT "0.1"
MODEL.PROMPT.NUM_TOKENS "10"
DATA.FEATURE "sup_vitb16_imagenet21k"
DATA.DATAPATH "Datasets/Stanford_Dogs"
MODEL.MODEL_ROOT "models/"
OUTPUT_DIR "./output/"

[06/22 15:18:50 visual_prompt]: Loading training data (final training data for vtab)...
[06/22 15:18:50 visual_prompt]: Constructing StanfordDogs dataset train...
anno_path Datasets/Stanford_Dogs/train.json
Traceback (most recent call last):
File "tune_fgvc.py", line 208, in
main(args)
File "tune_fgvc.py", line 199, in main
prompt_main(args)
File "tune_fgvc.py", line 165, in prompt_main
train_main(cfg, args)
File "/home/multiai3/Jiuqing/VPT/train.py", line 102, in train
train_loader, val_loader, test_loader = get_loaders(cfg, logger)
File "/home/multiai3/Jiuqing/VPT/train.py", line 71, in get_loaders
train_loader = data_loader.construct_train_loader(cfg)
File "/home/multiai3/Jiuqing/VPT/src/data/loader.py", line 64, in construct_train_loader
drop_last=drop_last,
File "/home/multiai3/Jiuqing/VPT/src/data/loader.py", line 36, in _construct_loader
dataset = _DATASET_CATALOG[dataset_name](cfg, split)
File "/home/multiai3/Jiuqing/VPT/src/data/datasets/json_dataset.py", line 151, in init
super(DogsDataset, self).init(cfg, split)
File "/home/multiai3/Jiuqing/VPT/src/data/datasets/json_dataset.py", line 34, in init
self._construct_imdb(cfg)
File "/home/multiai3/Jiuqing/VPT/src/data/datasets/json_dataset.py", line 59, in _construct_imdb
anno = self.get_anno()
File "/home/multiai3/Jiuqing/VPT/src/data/datasets/json_dataset.py", line 46, in get_anno
assert os.path.exists(anno_path), "{} dir not found".format(anno_path)
AssertionError: Datasets/Stanford_Dogs/train.json dir not found

I search for this Error in Issues but no result. If you are available, could you tell me why I get this Error?

value definition for prompts

val = math.sqrt(6. / float(3 * reduce(mul, self.patch_embed.patch_size, 1) + self.embed_dim)) # noqa

Hello! The value defined in the line above seems not distinctly elaborated in the paper. Could you explain the reason why you define the value in this way? Many thx in advance.

The tunable parameters of VPT+Bias for Semantic Segmentation

Can you provide more details of different methods and their tunable parameters in Table 4 for semantic segmentation?

Except for header parameters:
1)the tunable parameter number of BIAS is 13.46-13.18=0.28M
2)the tunable parameter number of VPT is 13.43-13.18=0.25M
3) why is the tunable parameter number of VPT+BIAS 15.79-13.18=2.61M, rather than 0.28+0.25=0.53M?

It seems to me that BIAS was reimplemented based on the paper [5] (fine-tunes only the bias terms).
However, was VPT+BIAS reimplemented based on the paper [8] (fine-tunes the bias terms and introduces some lightweight residual layers)?

image
image

About when to add prompt parameters

I have a question: When training the VPT model, is the prompt parameter added from scratch during pre-training, or is it added during fine-tuning using the pre-trained ViT model? I'm looking forward to your response,thanks.

How did you set the random seed of Table 1.

Hi,

I'm reproducing your result in Table 1 with the hyper-parameter reported in the shared CSV file in google drive.
Now I train all models in A100 GPU, but there exists a small gap between the reported performance.

  1. So, I'm curious about how you set the random seed in Table 1.

And also, when I ran vtab-svhn dataset with random seed [42, 44, 82, 100, 800], I got unstable top1 results as follows:

seed42 : 79.78
seed44 : 65.91
seed82 : 80.37
seed100 : 81.75
seed800 : 81.47

and the command is as follows:

11 for seed in "42" "44" "82" "100" "800"; do
12 CUDA_VISIBLE_DEVICES=3 python3 train.py
13 --config-file configs/prompt/cub.yaml
14 MODEL.TYPE "vit"
15 DATA.BATCH_SIZE "128"
16 MODEL.PROMPT.NUM_TOKENS "50"
17 MODEL.PROMPT.DEEP "True"
18 MODEL.PROMPT.DROPOUT "0.1"
19 DATA.FEATURE "sup_vitb16_imagenet21k"
20 DATA.NAME 'vtab-svhn'
21 DATA.NUMBER_CLASSES "10"
22 SOLVER.BASE_LR "1.25"
23 SOLVER.WEIGHT_DECAY "0.0"
24 SEED ${seed}
25 MODEL.MODEL_ROOT "./weights"
26 DATA.DATAPATH "./dataset"
27 OUTPUT_DIR "output/seed${seed}"
28 done

  1. So, I'm curious about does the visual prompt tuning is sensitive to the hyper-parameter(e.g. random seed). Or, does the command wrong?

Stanford Cars dataset split

I am unable to download the StanfordCars database from the official source. The provided download link (http://ai.stanford.edu/~jkrause/cars/car_dataset.html) is currently not accessible. It seems that the URL is either outdated or the database is no longer publicly available. Could you kindly provide an alternative download source?

Additionally, I have come across a shared version of the StanfordCars database from another user. However, I noticed that the name of each image in this shared database does not match the split (train, test, validation) in the dataset documentation. It would be greatly appreciated if split information could be provided or how to build the split.

Thank you for your understanding and support.

How can I tune the fgvc or vtab datasets as mentioned in Table. 1

Thanks for the great work. I am trying to rerun some of the work you mentioned in Table. 1. As I have already gone through the FGVC datasets separately, how can I tune FGVC dataset as a whole as in Table. 1? Same question to VTAB-1k as well.

Plus: I am here reproducing some of your work in command like:
CUDA_VISIBLE_DEVICES=1 PORT=20000 python train.py --config-file /vpt/configs/prompt/cars.yaml MODEL.TRANSFER_TYPE "prompt" MODEL.PROMPT.DEEP "True" MODEL.PROMPT.NUM_TOKENS "10" MODEL.PROMPT.DROPOUT "0.0"
(For FGVC stanford-Cars as an example)

Can you provide the config lines or did I miss something important?

Thanks.

BTW, it seems there is a typo in demo.ipynb tune*.py when you are tuning vtab-caltech101, is there a reason why the config-file is still cub.yaml? I am getting confused about that.

vtab1k Dataset accuracy

Hello
I have done the vtab1k experiment on three datasets, but the experimental results are much different from the paper. The result of cifar100 dataset is 72.4, the result of smallnorb/azimuth dataset is 15.7, and the result of smallnorb/elevation dataset is 22.6.
I don't know why. Is my profile wrong? My profile is as follows:

          NUM_GPUS: 1
          NUM_SHARDS: 1
          OUTPUT_DIR: ""
          RUN_N_TIMES: 1
          MODEL:
            TRANSFER_TYPE: "prompt"
            TYPE: "vit"
            LINEAR:
              MLP_SIZES: []
          SOLVER:
            SCHEDULER: "cosine"
            PATIENCE: 300
            LOSS: "softmax"
            OPTIMIZER: "sgd"
            MOMENTUM: 0.9
            WEIGHT_DECAY: 0.0001
            LOG_EVERY_N: 100
            WARMUP_EPOCH: 10
            TOTAL_EPOCH: 100
          DATA:
            NAME: "vtab-cifar(num_classes=100)"
            NUMBER_CLASSES: 100
            DATAPATH: "/home/vpt/dataset"
            FEATURE: "sup_vitb16_224"
            BATCH_SIZE: 128

About stanford cars

image

Hi, the length of the Image ID in Stanford_cars is 6, but In Kaggle, it is 5. Besides, There are two folders in Kaggle, namely cars_train, and cars_test. It confused me. In your '.json' split, you call car_ims. What does car_ims mean?

Why visual prompts can outperform full?

Thanks for this wonderful work. The paper contents a lot of details but i still i want to know why the learned visual prompt can achieve such good performance even outperform the Full. i am confused about it and wondering you can help solve this problem.

Sent from PPHub

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.