yuanli2333 / hadamard-matrix-for-hashing Goto Github PK

View Code? Open in Web Editor NEW

232.0 9.0 47.0 10.58 MB

CVPR2020/TNNLS2023: Central Similarity Quantization/Hashing for Efficient Image and Video Retrieval

License: MIT License

Python 91.87% Jupyter Notebook 7.16% Shell 0.97%

hash-centers hadamard-matrix

hadamard-matrix-for-hashing's Introduction

Codes for paper: Central Similarity Quantization for Efficient Image and Video Retrieval, arxiv

We release all codes and configurations for image hashing.

Update: Video hashing has been updated in here

Prerequisties

Ubuntu 16.04

NVIDIA GPU + CUDA and corresponidng Pytorch framework (v0.4.1)

Python 3.6

Datasets

Download database for the retrieval list of imagenet in the anonymous link here, and put database.txt in 'data/imagenet/'
Download MS COCO, ImageNet2012, NUS_WIDE in their official website: COCO, ImageNet, NUS_WIDE. Unzip all data and put in 'data/dataset_name/'.

Hash center (target)

Here, we put hash centers for imagenet we used in 'data/imagenet/hash_centers'. The methods to generate hash centers are given in the tutorial: Tutorial_ hash_center_generation.ipynb

Test

Pretrained models are Google Drive, or you can directly download it from the release.

It will take a long time to generate hash codes for database, because of the large-scale data size for database

Test for imagenet:

Download pre-trained model 'imagenet_64bit_0.8734_resnet50.pkl' for imagenet, put it in 'data/imagenet/', then run:

python test.py --data_name imagenet --gpus 0,1  --R 1000  --model_name 'imagenet_64bit_0.8734_resnet50.pkl'

Test for coco:

Download pre-trained model 'coco_64bit_0.8612_resnet50.pkl' for coco, put it in 'data/coco/', then run:

python test.py --data_name coco --gpus 0,1  --R 5000  --model_name 'coco_64bit_0.8612_resnet50.pkl'

Test for nus_wide:

Download pre-trained model 'nus_wide_64bit_0.8391_resnet50.pkl' for nus_wide, put it in 'data/nus_wide/', then run:

python test.py --data_name nus_wide --gpus 0,1  --R 5000  --model_name 'nus_wide_64bit_0.8391_resnet50.pkl'

The MAP of retrieval on the three datasets are shown in the following:

Dataset	MAP(16bit)	MAP(32bit)	MAP(16bit)
ImageNet	0.851	0.865	0.873
MS COCO	0.796	0.838	0.861
NUS WIDE	0.810	0.825	0.839

Train

Train on imagenet, hash bit: 64bit

Trained model will be saved in 'data/imagenet/models/'

python train.py --data_name imagenet --hash_bit 64 --gpus 0,1 --model_type resnet50 --lambda1 0  --lambda2 0.05  --R 1000

Train on coco, hash bits: 64bit

Trained model will be saved in 'data/coco/models/'

python train.py --data_name coco --hash_bit 64 --gpus 0,1 --model_type resnet50 --lambda1 0  --lambda2 0.05 --multi_lr 0.05  --R 5000

Train on nus_wide, hash bit: 64bit

Trained model will be saved in 'data/nus_wide/models/'

python train.py --data_name nus_wide --hash_bit 64 --gpus 0,1 --model_type resnet50 --lambda1 0  --lambda2 0.05  --multi_lr 0.05 --R 5000

AlexNet as backbone.

Pretrained models of AlexNet are here. Pre-trained models for COCO will be given in the future

The MAP of retrieval on ImageNet and NUS_WIDE are shown in the following:

Dataset	MAP(16bit)	MAP(32bit)	MAP(64bit)
ImageNet	0.601	0.653	0.695
NUS_WIDE	0.744	0.785	0.789

Train on ImageNet, 16bit

python train.py --data_name imagenet --hash_bit 16 --gpus 2 --model_type Alexnet --lambda1 0  --lambda2 0.001  --R 1000 --eval_frequency 1 --lr 0.0001

Train on ImageNet, 32bit

python train.py --data_name imagenet --hash_bit 32 --gpus 2 --model_type Alexnet --lambda1 0  --lambda2 0.001  --R 1000 --eval_frequency 1 --lr 0.0001

Train on ImageNet, 64bit

python train.py --data_name imagenet --hash_bit 64 --gpus 2 --model_type Alexnet --lambda1 0  --lambda2 0.0001  --R 1000 --eval_frequency 1 --lr 0.0001

Train on NUS_WIDE, 16bit

python train.py --data_name nus_wide --hash_bit 16 --gpus 2 --model_type Alexnet --lambda1 0  --lambda2 0.001  --R 5000 --eval_frequency 1 --lr 0.0001

Train on NUS_WIDE, 32bit

python train.py --data_name nus_wide --hash_bit 32 --gpus 2 --model_type Alexnet --lambda1 0  --lambda2 0.001  --R 5000 --eval_frequency 1 --lr 0.0001

Train on NUS_WIDE, 64bit

python train.py --data_name nus_wide --hash_bit 64 --gpus 2 --model_type Alexnet --lambda1 0  --lambda2 0.001  --R 5000 --eval_frequency 1 --lr 0.0001

Reference

If you find this repo useful, please consider citing:

@inproceedings{yuan2020central,
  title={Central Similarity Quantization for Efficient Image and Video Retrieval},
  author={Yuan, Li and Wang, Tao and Zhang, Xiaopeng and Tay, Francis EH and Jie, Zequn and Liu, Wei and Feng, Jiashi},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={3083--3092},
  year={2020}
}

hadamard-matrix-for-hashing's People

Contributors

Stargazers

Watchers

hadamard-matrix-for-hashing's Issues

I'm sorry to disturb you. Can you provide the supplementary materials mentioned in the paper?

hello，原论文中说网络框架后续会在补充材料中详细说明，请问可以给一下吗，最近在做实验，应该要引用文章，十分感谢！

About loss function

Why is the code different from the loss function in the paper? Q_loss is completely different from LQ in the text.

可以提供下torch0.4.1的docker吗，这个版本好低，在加载模型时候报错了

About Pretrained models

Hello, because I can't download the pretrained model in China, can you will advance training model to my E-mail: [email protected]
Thank you very much!

pairwise loss

hi,

The paper doesn't use any pairwise loss. Is this a joke? why are you implementing something different that the paper's claim?

thanks in advance

Video Hashing

Thanks for the great work!
I 'm interested in your video hashing work. Can you release your video hashing code?
My email is [email protected]. Hope to communicate with me.

hi~ I want to train CSQ on a person reid task.
the class num is more than 40,000
the code which was used to set the hash target center can be extremly time consumming,
do you have any suggestion?
@yuanli2333

  if H_2K.shape[0] < n_class:
        hash_targets.resize_(n_class, bit)
        for k in range(20):
            for index in range(H_2K.shape[0], n_class):
                ones = torch.ones(bit)
                # Bernouli distribution
                sa = random.sample(list(range(bit)), bit // 2)
                ones[sa] = -1
                hash_targets[index] = ones
            # to find average/min  pairwise distance
            c = []
            for i in range(n_class):
                for j in range(n_class):
                    if i < j:
                        TF = sum(hash_targets[i] != hash_targets[j])
                        c.append(TF)
            c = np.array(c)

            # choose min(c) in the range of K/4 to K/3
            # see in https://github.com/yuanli2333/Hadamard-Matrix-for-hashing/issues/1
            # but it is hard when bit is  small
            if c.min() > bit / 4 and c.mean() >= bit / 2:
                print(c.min(), c.mean())
                break

How to create the train.txt and test.txt?

I have a question that how ro create the train.txt and test.txt the format of which are different from my own dataset . And what's the meaning of the hyper-parameters: "lambda0","lambda1","lambda2". Thank you!

AttributeError: 'DataParallel' object has no attribute 'src_device_obj'

Hello, how can I solve this problem?

give an image to retrieval its similar images

I run the source code,the train.py and test.py,and the results are great.And the paper show a example which give an image to find other similar images, but it seems the code doesn't include it. Could you tell me how to achive it?Is it just take an image as input or need label?

The model use pretrain weight for ImageNet training?

Hi, thank you for good work.

I have some question on ImageNet model training.
I think the model use ImageNet pretrain weight for ImageNet training. [The code line]

If there are options for training ImageNet which I missing, or wrong information, please let me know.

Anyway, Thank you for your wonderful work agin.

parameters for cifar10

hello!
Thanks for the great work!
I 'm interested in your hashing work. I tried to run this code with AlexNet for cifar10，but it didn't work well.
Can you release your parameters for cifar10 or give some advice for running in cifar10 with AlexNet ?

Additional Files for Kinetics400

Hi,

First and foremost, amazing work! Thanks for having the code and models publicly available.

I was wondering if you could have the dataset/Kinetics directory and train_kinetics.py file uploaded as well?

Regards
Arun George

crop size

The crop size is set to be 224x224, which is not the same crop size as in HashNet (227x227). Is there a reason for that?

About loss function

Why is the code different from the loss function in the paper? Q_loss is completely different from LQ in the text.

Can you share the results based on AlexNet?

Most deep hashing methods adopt AlexNet as the backbone. For a fair comparison, can you share the results of this method under AlexNet?

nus_wide MAP ?

May I know your specific parameters in NUS_ Wide？I use your library's parameters to run out the result is only 0.8188 using resnet model and 64 bit hash code in NUS_ Wide dataset

数据不知道为什么下载不了

Why is the database.txt file of ImageNet not given?

Why is the database.txt file of ImageNet not given? Can you give me a database.txt file?

Could you provide other download links for pre-trained models? Thank you.

hello, I‘am in China. I can't download pre-trained models by links your provided. Could you provide other download links for pre-trained models? Thank you.

About ablation study and L_Q loss

Hi,I'm wondering why ablation study in paper doesn't give result of using L_Q loss only.If the result is obviously worse than using L_C only,it will demonstrate what a role that L_C is playing in this task.
Also,I'm also curious about why using a loss that forcely pull prediction points to predefined hash center,which is a singular point that is hard to converge to(and I know that part is talked in paper).I just want to know why create a such a loss function and how much the importance is.
Your reply will be highly appreciated.

Can I use datasets from other sources? If yes, what should I do to pre-process? Thank you!

Video hashing

Thanks for the great work!

Super excited to play around with video hashing. Are there any plans for releasing the video code and configurations?

AlexNet on COCO

can please share the train parameter for Alexnet on coco?

about the similar result compare

thanks for you work. I test pair image by the model:imagenet_64bit_0.8734_resnet50.pkl. and get the hamming distance:0
This is my test code:

class CSQ(object):
    def __init__(self):
        self.model_path = 'checkpoint/imagenet_64bit_0.8734_resnet50.pkl'
        self.device = "cuda" if torch.cuda.is_available() else "cpu"
        self.transform = prep.image_test(resize_size=255, crop_size=224)
        self.load_model()

    def load_model(self, ):
        self.model = torch.load(self.model_path)
        self.model = self.model.module
        self.model.to(device=self.device)
        self.model.eval()

    def forward(self, img_path):
        tensor_img = self.transform(Image.open(img_path).convert('RGB')).unsqueeze(0).to(self.device)
        with torch.no_grad():
            out = self.model(tensor_img)

        hash_code = out.cpu().numpy()
        hash_code[hash_code < 0] = -1
        hash_code[hash_code >= 0] = 1

        code_list = ''.join(['1' if item == 1.0 else '0' for item in hash_code[0].tolist()])
        return code_list

The value of min(c) in "Tutorial_ hash_center_generation.ipynb"

Hi,

You set the value min(c)=20 for ImageNet in '.ipynb' file.
How to set the value of min(c) for other datasets, such as COCO or NUS_WIDE?

About training my own dataset

How can I get my own 'database. txt', 'train. txt' and 'test. txt' when training my own dataset? What is the relationship between them? For example, are 'train. txt' and 'test. txt' included in 'database. txt'? Then what do the 0 and 1 after the file name in these three files represent?

I really need your answer，Thank you very much!

number of training images for imagenet

Hi, first really appreciate for sharing codes. I have a question about the experimental settings.
in your paper, experimental settings for imagenet is

ImageNet image 10,000 5,000 128,495 100:1

where the number of training images is 10,000. However, the number of training images in this repo is 13,000.
Which one is right to reproduce your result in the paper?

Could you provide codes of reimplementing other models? Thanks

Hi~, I am a student. I am studying the video hash. I have read the article . the model of yours performs well. But I have trouble in reimplementing the other models. Could you provide codes of other models. Really appreciate for sharing codes. My email is :[email protected]

Thank you so much!