shuyu-xjtu / aptm Goto Github PK

The official code of "Towards Unified Text-based Person Retrieval: A Large-scale Multi-Attribute and Language Search Benchmark"

Home Page: https://arxiv.org/abs/2306.02898

License: MIT License

Python 100.00%

aptm's Issues

作者们好，请问可以放出训练MALS的log吗

我想对比一下验证自己训练过程有没有问题

ModuleNotFoundError : from reTools import evaluation, mAPNo module named 'gnn_reranking'

大佬们，我想知道文本编码器使用bert前六层和交叉编码器使用的bert后六层在代码中哪里体现

Zero-shot evaluation for downstream datasets?

Firstly, I would like to express my appreciation for the insightful research presented in the paper.

The paper shows the results:

baseline: training the model without pretraining on MALS
APTM: training the model with pretraining on MALS

I am interested in understanding the zero-shot performance, where no further training is performed after pretraining on MALS. Specifically, I would like to see evaluation results on the downstream datasets(CUHK-PEDES, ICFG-PEDES, and RST-PReid):

zero-shot: no training, just the model pretraining on MALS, then test on downstream datasets

预训练模型的zero-shot能力

请问该APTM预训练模型直接用于domain generalization下游ReID任务测试的zero-shot能力如何？用于预训练的生成行人数据是否会与下游真实行人数据存在较大的domain gap呢

APL 损失设置

作者您好，
非常感谢贵团队带来的杰出工作APTM。我在尝试基于APTM框架和MALS数据集进行预训练时，发现你们提供的默认配置文件Retrieval_gene.yaml当中将APL损失以及其对应的权重β注释掉了，即配置文件中的attr、t两个变量，也就是说你们默认的配置文件未开启APL损失的计算。目前训练下来发现：
（1）按照你们的默认配置文件（不开启APL损失计算）是能够正常训练的，但预训练的结果无法与你们对比，因为你们提供的预训练模型权重文件的共享链接失效了，无法测试你们aptm框架基于MALS数据集进行预训练的性能。请问你们后续是否能够更新一下模型权重文件的共享链接，非常感谢。
（2）若按照你们论文中的设置，将默认配置文件的APL损失开启，模型无法正常训练，apl损失值出现了nan现象，请问apl损失的计算是需要什么其他的设置遗漏了吗，或者我使用的方式有什么需要修改吗，目前是完全按照你们的代码和github上的配置文件进行复现的。或者说，出现nan现象，对于aptm框架来说是正常的吗？

非常感谢抽空阅读这个issue。

pre-trained models

Whether the pre-trained model is open source?

Adding License file

Can you please add License file?
Thanks

作者们好，请问我如何从模型获取好的特征向量

我使用自己的这个Evaluator进行测试

class Evaluator():
    def __init__(self, img_loader, txt_loader):
        self.img_loader = img_loader  # gallery
        self.txt_loader = txt_loader  # query

    def _compute_embedding(self, model):
        model = model.eval()
        device = next(model.parameters()).device

        qids, gids, qfeats, gfeats = [], [], [], []
        # text
        for pid, caption in self.txt_loader:

            for k, v in caption.items():
                caption[k] = v.to(device)
            with torch.no_grad():
                caption["input_ids"] = caption["input_ids"].squeeze(1)
                text_embeds = model.get_text_embeds(caption["input_ids"], caption["attention_mask"])
                text_feat =  model.text_proj(text_embeds[:, 0, :])
            qids.append(pid.view(-1))  # flatten
            qfeats.append(text_feat)
        qids = torch.cat(qids, 0)
        qfeats = torch.cat(qfeats, 0)
        # image
        for pid, img in self.img_loader:
            img = img.to(device)
            with torch.no_grad():
                image_embeds, image_atts = model.get_vision_embeds(img)
                img_feat = model.vision_proj(image_embeds[:, 0, :])
            gids.append(pid.view(-1))  # flatten
            gfeats.append(img_feat)
        gids = torch.cat(gids, 0)
        gfeats = torch.cat(gfeats, 0)
        return qfeats, gfeats, qids, gids

结果发现CUHK-PEDES的rank1只有55，我使用了bert的tokenizer。
代码中没有直接测试，而是对相似度矩阵进行了某种迭代，如何理解这一过程？我该如何获取好的文本向量和图像向量呢？

Issues with Running Evaluation Run.py

I was running the code for just evaluation after downloading the datasets and checkpoints required. However, I am not able to run the evaluation python script. Does anyone have the same problem too? and do you have any solution to this?
To Reproduce the error:
python3 run.py --task "itr_cuhk" --evaluate --dist "f4" --output_dir "output/ft_cuhk/test" --checkpoint "output/ft_cuhk/checkpoint_best.pth"

I am wondering if there is a version error in this issue, i am using PyYAML: 6.0.1, PyTorch: 2.2.1, ruamel.yaml: 0.18.6
ruamel.yaml.clib : 0.2.8

Error

NNODES, 1
NPROC_PER_NODE, 4
MASTER_ADDR, 127.0.0.1
MASTER_PORT, 3000
NODE_RANK, 0
/home/default/miniconda3/envs/aptm/lib/python3.8/site-packages/torch/distributed/launch.py:183: FutureWarning: The module torch.distributed.launch is deprecated
and will be removed in future. Use torchrun.
Note that --use-env is set by default in torchrun.
If your script expects --local-rank argument to be set, please
change it to read from os.environ['LOCAL_RANK'] instead. See
https://pytorch.org/docs/stable/distributed.html#launch-utility for
further instructions

warnings.warn(
[2024-03-25 15:15:00,947] torch.distributed.run: [WARNING]
[2024-03-25 15:15:00,947] torch.distributed.run: [WARNING] *****************************************
[2024-03-25 15:15:00,947] torch.distributed.run: [WARNING] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
[2024-03-25 15:15:00,947] torch.distributed.run: [WARNING] *****************************************
Traceback (most recent call last):
File "Retrieval.py", line 296, in
config = yaml.load(open(args.config, 'r'), Loader=yaml.Loader)
File "/home/default/miniconda3/envs/aptm/lib/python3.8/site-packages/ruamel/yaml/main.py", line 1085, in load
error_deprecation('load', 'load', arg=_error_dep_arg, comment=_error_dep_comment)
File "/home/default/miniconda3/envs/aptm/lib/python3.8/site-packages/ruamel/yaml/main.py", line 1037, in error_deprecation
raise AttributeError(s)
AttributeError:
"load()" has been removed, use

yaml = YAML(typ='rt')
yaml.load(...)

and register any classes that you use, or check the tag attribute on the loaded data,
instead of file "Retrieval.py", line 296

config = yaml.load(open(args.config, 'r'), Loader=yaml.Loader)

Traceback (most recent call last):
File "Retrieval.py", line 296, in
config = yaml.load(open(args.config, 'r'), Loader=yaml.Loader)
File "/home/default/miniconda3/envs/aptm/lib/python3.8/site-packages/ruamel/yaml/main.py", line 1085, in load
error_deprecation('load', 'load', arg=_error_dep_arg, comment=_error_dep_comment)
File "/home/default/miniconda3/envs/aptm/lib/python3.8/site-packages/ruamel/yaml/main.py", line 1037, in error_deprecation
raise AttributeError(s)
AttributeError:
"load()" has been removed, use

yaml = YAML(typ='rt')
yaml.load(...)

and register any classes that you use, or check the tag attribute on the loaded data,
instead of file "Retrieval.py", line 296

config = yaml.load(open(args.config, 'r'), Loader=yaml.Loader)

yaml = YAML(typ='rt')
yaml.load(...)

and register any classes that you use, or check the tag attribute on the loaded data,
instead of file "Retrieval.py", line 296

config = yaml.load(open(args.config, 'r'), Loader=yaml.Loader)

yaml = YAML(typ='rt')
yaml.load(...)

and register any classes that you use, or check the tag attribute on the loaded data,
instead of file "Retrieval.py", line 296

config = yaml.load(open(args.config, 'r'), Loader=yaml.Loader)

[2024-03-25 15:15:11,033] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 0 (pid: 142176) of binary: /home/default/miniconda3/envs/aptm/bin/python3
Traceback (most recent call last):
File "/home/default/miniconda3/envs/aptm/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/default/miniconda3/envs/aptm/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/default/miniconda3/envs/aptm/lib/python3.8/site-packages/torch/distributed/launch.py", line 198, in
main()
File "/home/default/miniconda3/envs/aptm/lib/python3.8/site-packages/torch/distributed/launch.py", line 194, in main
launch(args)
File "/home/default/miniconda3/envs/aptm/lib/python3.8/site-packages/torch/distributed/launch.py", line 179, in launch
run(args)
File "/home/default/miniconda3/envs/aptm/lib/python3.8/site-packages/torch/distributed/run.py", line 803, in run
elastic_launch(
File "/home/default/miniconda3/envs/aptm/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 135, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/home/default/miniconda3/envs/aptm/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 268, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

Retrieval.py FAILED

Failures:
[1]:
time : 2024-03-25_15:15:11
host : default-Pulse-15-B13VFK
rank : 1 (local_rank: 1)
exitcode : 1 (pid: 142177)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[2]:
time : 2024-03-25_15:15:11
host : default-Pulse-15-B13VFK
rank : 2 (local_rank: 2)
exitcode : 1 (pid: 142178)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[3]:
time : 2024-03-25_15:15:11
host : default-Pulse-15-B13VFK
rank : 3 (local_rank: 3)
exitcode : 1 (pid: 142179)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

Root Cause (first observed failure):
[0]:
time : 2024-03-25_15:15:11
host : default-Pulse-15-B13VFK
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 142176)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

PA100K Result

请教这里为什么说对比不公平，不都是 Swin Base 吗？
SOLIDER ：https://github.com/tinyvision/SOLIDER-PersonAttributeRecognition

【评估代码的一些疑问】

1, 请问评估代码中的 evaluation_attr_only_img_classifier()、evaluation_attr() 和 evaluation()，三种评估函数有什么区别？
2. 请问 evaluation() 评估函数中的 score_matrix_t2i 与 score_sim_t2i 有什么区别？

使用pre_trained model进行微调

请问为什么使用pre_trained model在cuhk-pedes这些数据集进行微调时，一直会报out of memory的错误，我的设备如下

感谢大佬，我有个关于图像生成captions的疑问

您之前回复说是直接使用BLIP的预训练版本来生成captions，您具体是用的哪个BLIP的预训练版本？我用BLIP生成出来的captions好像都很简略，得不到穿着颜色之类的细粒度信息。不知道您具体是如何操作的，是否有对BLIP进行生成细粒度信息的微调。

incorrect predictions in attribute recognition for pa100k images

Thanks for your great job.
I used the "checkpoint_best.pth" model that was released in Google Drive/checkpoints/ft_pa100k.zip to recognize attributes of pa100k images. I changed Retrieval_pa100k.yaml file in 3 keys as below :

pa100k: False
pa100k_only_img_classifier: True
dop: 0.1

It seems that every thing is correct but I don't now why don't get the correct result in output. The model return similar probability (equals to 0.5) for all attributes:

[0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5
0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5]

additionally when I loaded the model I got this logs:

missing_keys: ['img_cls.1.weight', 'img_cls.1.bias', 'img_cls.2.weight', 'img_cls.2.bias', 'img_cls.2.running_mean', 'img_cls.2.running_var', 'img_cls.4.weight', 'img_cls.4.bias']
vision_encoder missing_keys: []
unexpected_keys: ['temp', 'text_encoder.bert.embeddings.position_ids', 'text_encoder.bert.embeddings.word_embeddings.weight', 'text_encoder.bert.embeddings.position_embeddings.weight', 'text_encoder.bert.embeddings.token_type_embeddings.weight', 'text_encoder.bert.embeddings.LayerNorm.weight', 'text_encoder.bert.embeddings.LayerNorm.bias', 'text_encoder.bert.encoder.layer.0.attention.self.query.weight', 'text_encoder.bert.encoder.layer.0.attention.self.query.bias', 'text_encoder.bert.encoder.layer.0.attention.self.key.weight', 'text_encoder.bert.encoder.layer.0.attention.self.key.bias', 'text_encoder.bert.encoder.layer.0.attention.self.value.weight', 'text_encoder.bert.encoder.layer.0.attention.self.value.bias', 'text_encoder.bert.encoder.layer.0.attention.output.dense.weight', 'text_encoder.bert.encoder.layer.0.attention.output.dense.bias', 'text_encoder.bert.encoder.layer.0.attention.output.LayerNorm.weight', 'text_encoder.bert.encoder.layer.0.attention.output.LayerNorm.bias', 'text_encoder.bert.encoder.layer.0.intermediate.dense.weight', 'text_encoder.bert.encoder.layer.0.intermediate.dense.bias', 'text_encoder.bert.encoder.layer.0.output.dense.weight', 'text_encoder.bert.encoder.layer.0.output.dense.bias', 'text_encoder.bert.encoder.layer.0.output.LayerNorm.weight', 'text_encoder.bert.encoder.layer.0.output.LayerNorm.bias', 'text_encoder.bert.encoder.layer.1.attention.self.query.weight', 'text_encoder.bert.encoder.layer.1.attention.self.query.bias', 'text_encoder.bert.encoder.layer.1.attention.self.key.weight', 'text_encoder.bert.encoder.layer.1.attention.self.key.bias', 'text_encoder.bert.encoder.layer.1.attention.self.value.weight', 'text_encoder.bert.encoder.layer.1.attention.self.value.bias', 'text_encoder.bert.encoder.layer.1.attention.output.dense.weight', 'text_encoder.bert.encoder.layer.1.attention.output.dense.bias', 'text_encoder.bert.encoder.layer.1.attention.output.LayerNorm.weight', 'text_encoder.bert.encoder.layer.1.attention.output.LayerNorm.bias', 'text_encoder.bert.encoder.layer.1.intermediate.dense.weight', 'text_encoder.bert.encoder.layer.1.intermediate.dense.bias', 'text_encoder.bert.encoder.layer.1.output.dense.weight', 'text_encoder.bert.encoder.layer.1.output.dense.bias', 'text_encoder.bert.encoder.layer.1.output.LayerNorm.weight', 'text_encoder.bert.encoder.layer.1.output.LayerNorm.bias', 'text_encoder.bert.encoder.layer.2.attention.self.query.weight', 'text_encoder.bert.encoder.layer.2.attention.self.query.bias', 'text_encoder.bert.encoder.layer.2.attention.self.key.weight', 'text_encoder.bert.encoder.layer.2.attention.self.key.bias', 'text_encoder.bert.encoder.layer.2.attention.self.value.weight', 'text_encoder.bert.encoder.layer.2.attention.self.value.bias', 'text_encoder.bert.encoder.layer.2.attention.output.dense.weight', 'text_encoder.bert.encoder.layer.2.attention.output.dense.bias', 'text_encoder.bert.encoder.layer.2.attention.output.LayerNorm.weight', 'text_encoder.bert.encoder.layer.2.attention.output.LayerNorm.bias', 'text_encoder.bert.encoder.layer.2.intermediate.dense.weight', 'text_encoder.bert.encoder.layer.2.intermediate.dense.bias', 'text_encoder.bert.encoder.layer.2.output.dense.weight', 'text_encoder.bert.encoder.layer.2.output.dense.bias', 'text_encoder.bert.encoder.layer.2.output.LayerNorm.weight', 'text_encoder.bert.encoder.layer.2.output.LayerNorm.bias', 'text_encoder.bert.encoder.layer.3.attention.self.query.weight', 'text_encoder.bert.encoder.layer.3.attention.self.query.bias', 'text_encoder.bert.encoder.layer.3.attention.self.key.weight', 'text_encoder.bert.encoder.layer.3.attention.self.key.bias', 'text_encoder.bert.encoder.layer.3.attention.self.value.weight', 'text_encoder.bert.encoder.layer.3.attention.self.value.bias', 'text_encoder.bert.encoder.layer.3.attention.output.dense.weight', 'text_encoder.bert.encoder.layer.3.attention.output.dense.bias', 'text_encoder.bert.encoder.layer.3.attention.output.LayerNorm.weight', 'text_encoder.bert.encoder.layer.3.attention.output.LayerNorm.bias', 'text_encoder.bert.encoder.layer.3.intermediate.dense.weight', 'text_encoder.bert.encoder.layer.3.intermediate.dense.bias', 'text_encoder.bert.encoder.layer.3.output.dense.weight', 'text_encoder.bert.encoder.layer.3.output.dense.bias', 'text_encoder.bert.encoder.layer.3.output.LayerNorm.weight', 'text_encoder.bert.encoder.layer.3.output.LayerNorm.bias', 'text_encoder.bert.encoder.layer.4.attention.self.query.weight', 'text_encoder.bert.encoder.layer.4.attention.self.query.bias', 'text_encoder.bert.encoder.layer.4.attention.self.key.weight', 'text_encoder.bert.encoder.layer.4.attention.self.key.bias', 'text_encoder.bert.encoder.layer.4.attention.self.value.weight', 'text_encoder.bert.encoder.layer.4.attention.self.value.bias', 'text_encoder.bert.encoder.layer.4.attention.output.dense.weight', 'text_encoder.bert.encoder.layer.4.attention.output.dense.bias', 'text_encoder.bert.encoder.layer.4.attention.output.LayerNorm.weight', 'text_encoder.bert.encoder.layer.4.attention.output.LayerNorm.bias', 'text_encoder.bert.encoder.layer.4.intermediate.dense.weight', 'text_encoder.bert.encoder.layer.4.intermediate.dense.bias', 'text_encoder.bert.encoder.layer.4.output.dense.weight', 'text_encoder.bert.encoder.layer.4.output.dense.bias', 'text_encoder.bert.encoder.layer.4.output.LayerNorm.weight', 'text_encoder.bert.encoder.layer.4.output.LayerNorm.bias', 'text_encoder.bert.encoder.layer.5.attention.self.query.weight', 'text_encoder.bert.encoder.layer.5.attention.self.query.bias', 'text_encoder.bert.encoder.layer.5.attention.self.key.weight', 'text_encoder.bert.encoder.layer.5.attention.self.key.bias', 'text_encoder.bert.encoder.layer.5.attention.self.value.weight', 'text_encoder.bert.encoder.layer.5.attention.self.value.bias', 'text_encoder.bert.encoder.layer.5.attention.output.dense.weight', 'text_encoder.bert.encoder.layer.5.attention.output.dense.bias', 'text_encoder.bert.encoder.layer.5.attention.output.LayerNorm.weight', 'text_encoder.bert.encoder.layer.5.attention.output.LayerNorm.bias', 'text_encoder.bert.encoder.layer.5.intermediate.dense.weight', 'text_encoder.bert.encoder.layer.5.intermediate.dense.bias', 'text_encoder.bert.encoder.layer.5.output.dense.weight', 'text_encoder.bert.encoder.layer.5.output.dense.bias', 'text_encoder.bert.encoder.layer.5.output.LayerNorm.weight', 'text_encoder.bert.encoder.layer.5.output.LayerNorm.bias', 'text_encoder.bert.encoder.layer.6.attention.self.query.weight', 'text_encoder.bert.encoder.layer.6.attention.self.query.bias', 'text_encoder.bert.encoder.layer.6.attention.self.key.weight', 'text_encoder.bert.encoder.layer.6.attention.self.key.bias', 'text_encoder.bert.encoder.layer.6.attention.self.value.weight', 'text_encoder.bert.encoder.layer.6.attention.self.value.bias', 'text_encoder.bert.encoder.layer.6.attention.output.dense.weight', 'text_encoder.bert.encoder.layer.6.attention.output.dense.bias', 'text_encoder.bert.encoder.layer.6.attention.output.LayerNorm.weight', 'text_encoder.bert.encoder.layer.6.attention.output.LayerNorm.bias', 'text_encoder.bert.encoder.layer.6.crossattention.self.query.weight', 'text_encoder.bert.encoder.layer.6.crossattention.self.query.bias', 'text_encoder.bert.encoder.layer.6.crossattention.self.key.weight', 'text_encoder.bert.encoder.layer.6.crossattention.self.key.bias', 'text_encoder.bert.encoder.layer.6.crossattention.self.value.weight', 'text_encoder.bert.encoder.layer.6.crossattention.self.value.bias', 'text_encoder.bert.encoder.layer.6.crossattention.output.dense.weight', 'text_encoder.bert.encoder.layer.6.crossattention.output.dense.bias', 'text_encoder.bert.encoder.layer.6.crossattention.output.LayerNorm.weight', 'text_encoder.bert.encoder.layer.6.crossattention.output.LayerNorm.bias', 'text_encoder.bert.encoder.layer.6.intermediate.dense.weight', 'text_encoder.bert.encoder.layer.6.intermediate.dense.bias', 'text_encoder.bert.encoder.layer.6.output.dense.weight', 'text_encoder.bert.encoder.layer.6.output.dense.bias', 'text_encoder.bert.encoder.layer.6.output.LayerNorm.weight', 'text_encoder.bert.encoder.layer.6.output.LayerNorm.bias', 'text_encoder.bert.encoder.layer.7.attention.self.query.weight', 'text_encoder.bert.encoder.layer.7.attention.self.query.bias', 'text_encoder.bert.encoder.layer.7.attention.self.key.weight', 'text_encoder.bert.encoder.layer.7.attention.self.key.bias', 'text_encoder.bert.encoder.layer.7.attention.self.value.weight', 'text_encoder.bert.encoder.layer.7.attention.self.value.bias', 'text_encoder.bert.encoder.layer.7.attention.output.dense.weight', 'text_encoder.bert.encoder.layer.7.attention.output.dense.bias', 'text_encoder.bert.encoder.layer.7.attention.output.LayerNorm.weight', 'text_encoder.bert.encoder.layer.7.attention.output.LayerNorm.bias', 'text_encoder.bert.encoder.layer.7.crossattention.self.query.weight', 'text_encoder.bert.encoder.layer.7.crossattention.self.query.bias', 'text_encoder.bert.encoder.layer.7.crossattention.self.key.weight', 'text_encoder.bert.encoder.layer.7.crossattention.self.key.bias', 'text_encoder.bert.encoder.layer.7.crossattention.self.value.weight', 'text_encoder.bert.encoder.layer.7.crossattention.self.value.bias', 'text_encoder.bert.encoder.layer.7.crossattention.output.dense.weight', 'text_encoder.bert.encoder.layer.7.crossattention.output.dense.bias', 'text_encoder.bert.encoder.layer.7.crossattention.output.LayerNorm.weight', 'text_encoder.bert.encoder.layer.7.crossattention.output.LayerNorm.bias', 'text_encoder.bert.encoder.layer.7.intermediate.dense.weight', 'text_encoder.bert.encoder.layer.7.intermediate.dense.bias', 'text_encoder.bert.encoder.layer.7.output.dense.weight', 'text_encoder.bert.encoder.layer.7.output.dense.bias', 'text_encoder.bert.encoder.layer.7.output.LayerNorm.weight', 'text_encoder.bert.encoder.layer.7.output.LayerNorm.bias', 'text_encoder.bert.encoder.layer.8.attention.self.query.weight', 'text_encoder.bert.encoder.layer.8.attention.self.query.bias', 'text_encoder.bert.encoder.layer.8.attention.self.key.weight', 'text_encoder.bert.encoder.layer.8.attention.self.key.bias', 'text_encoder.bert.encoder.layer.8.attention.self.value.weight', 'text_encoder.bert.encoder.layer.8.attention.self.value.bias', 'text_encoder.bert.encoder.layer.8.attention.output.dense.weight', 'text_encoder.bert.encoder.layer.8.attention.output.dense.bias', 'text_encoder.bert.encoder.layer.8.attention.output.LayerNorm.weight', 'text_encoder.bert.encoder.layer.8.attention.output.LayerNorm.bias', 'text_encoder.bert.encoder.layer.8.crossattention.self.query.weight', 'text_encoder.bert.encoder.layer.8.crossattention.self.query.bias', 'text_encoder.bert.encoder.layer.8.crossattention.self.key.weight', 'text_encoder.bert.encoder.layer.8.crossattention.self.key.bias', 'text_encoder.bert.encoder.layer.8.crossattention.self.value.weight', 'text_encoder.bert.encoder.layer.8.crossattention.self.value.bias', 'text_encoder.bert.encoder.layer.8.crossattention.output.dense.weight', 'text_encoder.bert.encoder.layer.8.crossattention.output.dense.bias', 'text_encoder.bert.encoder.layer.8.crossattention.output.LayerNorm.weight', 'text_encoder.bert.encoder.layer.8.crossattention.output.LayerNorm.bias', 'text_encoder.bert.encoder.layer.8.intermediate.dense.weight', 'text_encoder.bert.encoder.layer.8.intermediate.dense.bias', 'text_encoder.bert.encoder.layer.8.output.dense.weight', 'text_encoder.bert.encoder.layer.8.output.dense.bias', 'text_encoder.bert.encoder.layer.8.output.LayerNorm.weight', 'text_encoder.bert.encoder.layer.8.output.LayerNorm.bias', 'text_encoder.bert.encoder.layer.9.attention.self.query.weight', 'text_encoder.bert.encoder.layer.9.attention.self.query.bias', 'text_encoder.bert.encoder.layer.9.attention.self.key.weight', 'text_encoder.bert.encoder.layer.9.attention.self.key.bias', 'text_encoder.bert.encoder.layer.9.attention.self.value.weight', 'text_encoder.bert.encoder.layer.9.attention.self.value.bias', 'text_encoder.bert.encoder.layer.9.attention.output.dense.weight', 'text_encoder.bert.encoder.layer.9.attention.output.dense.bias', 'text_encoder.bert.encoder.layer.9.attention.output.LayerNorm.weight', 'text_encoder.bert.encoder.layer.9.attention.output.LayerNorm.bias', 'text_encoder.bert.encoder.layer.9.crossattention.self.query.weight', 'text_encoder.bert.encoder.layer.9.crossattention.self.query.bias', 'text_encoder.bert.encoder.layer.9.crossattention.self.key.weight', 'text_encoder.bert.encoder.layer.9.crossattention.self.key.bias', 'text_encoder.bert.encoder.layer.9.crossattention.self.value.weight', 'text_encoder.bert.encoder.layer.9.crossattention.self.value.bias', 'text_encoder.bert.encoder.layer.9.crossattention.output.dense.weight', 'text_encoder.bert.encoder.layer.9.crossattention.output.dense.bias', 'text_encoder.bert.encoder.layer.9.crossattention.output.LayerNorm.weight', 'text_encoder.bert.encoder.layer.9.crossattention.output.LayerNorm.bias', 'text_encoder.bert.encoder.layer.9.intermediate.dense.weight', 'text_encoder.bert.encoder.layer.9.intermediate.dense.bias', 'text_encoder.bert.encoder.layer.9.output.dense.weight', 'text_encoder.bert.encoder.layer.9.output.dense.bias', 'text_encoder.bert.encoder.layer.9.output.LayerNorm.weight', 'text_encoder.bert.encoder.layer.9.output.LayerNorm.bias', 'text_encoder.bert.encoder.layer.10.attention.self.query.weight', 'text_encoder.bert.encoder.layer.10.attention.self.query.bias', 'text_encoder.bert.encoder.layer.10.attention.self.key.weight', 'text_encoder.bert.encoder.layer.10.attention.self.key.bias', 'text_encoder.bert.encoder.layer.10.attention.self.value.weight', 'text_encoder.bert.encoder.layer.10.attention.self.value.bias', 'text_encoder.bert.encoder.layer.10.attention.output.dense.weight', 'text_encoder.bert.encoder.layer.10.attention.output.dense.bias', 'text_encoder.bert.encoder.layer.10.attention.output.LayerNorm.weight', 'text_encoder.bert.encoder.layer.10.attention.output.LayerNorm.bias', 'text_encoder.bert.encoder.layer.10.crossattention.self.query.weight', 'text_encoder.bert.encoder.layer.10.crossattention.self.query.bias', 'text_encoder.bert.encoder.layer.10.crossattention.self.key.weight', 'text_encoder.bert.encoder.layer.10.crossattention.self.key.bias', 'text_encoder.bert.encoder.layer.10.crossattention.self.value.weight', 'text_encoder.bert.encoder.layer.10.crossattention.self.value.bias', 'text_encoder.bert.encoder.layer.10.crossattention.output.dense.weight', 'text_encoder.bert.encoder.layer.10.crossattention.output.dense.bias', 'text_encoder.bert.encoder.layer.10.crossattention.output.LayerNorm.weight', 'text_encoder.bert.encoder.layer.10.crossattention.output.LayerNorm.bias', 'text_encoder.bert.encoder.layer.10.intermediate.dense.weight', 'text_encoder.bert.encoder.layer.10.intermediate.dense.bias', 'text_encoder.bert.encoder.layer.10.output.dense.weight', 'text_encoder.bert.encoder.layer.10.output.dense.bias', 'text_encoder.bert.encoder.layer.10.output.LayerNorm.weight', 'text_encoder.bert.encoder.layer.10.output.LayerNorm.bias', 'text_encoder.bert.encoder.layer.11.attention.self.query.weight', 'text_encoder.bert.encoder.layer.11.attention.self.query.bias', 'text_encoder.bert.encoder.layer.11.attention.self.key.weight', 'text_encoder.bert.encoder.layer.11.attention.self.key.bias', 'text_encoder.bert.encoder.layer.11.attention.self.value.weight', 'text_encoder.bert.encoder.layer.11.attention.self.value.bias', 'text_encoder.bert.encoder.layer.11.attention.output.dense.weight', 'text_encoder.bert.encoder.layer.11.attention.output.dense.bias', 'text_encoder.bert.encoder.layer.11.attention.output.LayerNorm.weight', 'text_encoder.bert.encoder.layer.11.attention.output.LayerNorm.bias', 'text_encoder.bert.encoder.layer.11.crossattention.self.query.weight', 'text_encoder.bert.encoder.layer.11.crossattention.self.query.bias', 'text_encoder.bert.encoder.layer.11.crossattention.self.key.weight', 'text_encoder.bert.encoder.layer.11.crossattention.self.key.bias', 'text_encoder.bert.encoder.layer.11.crossattention.self.value.weight', 'text_encoder.bert.encoder.layer.11.crossattention.self.value.bias', 'text_encoder.bert.encoder.layer.11.crossattention.output.dense.weight', 'text_encoder.bert.encoder.layer.11.crossattention.output.dense.bias', 'text_encoder.bert.encoder.layer.11.crossattention.output.LayerNorm.weight', 'text_encoder.bert.encoder.layer.11.crossattention.output.LayerNorm.bias', 'text_encoder.bert.encoder.layer.11.intermediate.dense.weight', 'text_encoder.bert.encoder.layer.11.intermediate.dense.bias', 'text_encoder.bert.encoder.layer.11.output.dense.weight', 'text_encoder.bert.encoder.layer.11.output.dense.bias', 'text_encoder.bert.encoder.layer.11.output.LayerNorm.weight', 'text_encoder.bert.encoder.layer.11.output.LayerNorm.bias', 'text_encoder.cls.predictions.bias', 'text_encoder.cls.predictions.transform.dense.weight', 'text_encoder.cls.predictions.transform.dense.bias', 'text_encoder.cls.predictions.transform.LayerNorm.weight', 'text_encoder.cls.predictions.transform.LayerNorm.bias', 'text_encoder.cls.predictions.decoder.weight', 'text_encoder.cls.predictions.decoder.bias', 'vision_proj.weight', 'vision_proj.bias', 'text_proj.weight', 'text_proj.bias', 'itm_head.0.weight', 'itm_head.0.bias', 'itm_head.1.weight', 'itm_head.1.bias', 'itm_head.3.weight', 'itm_head.3.bias']

Total Params: 87022610

would you mind helping me with this?

`c_g_a_9.zip` is not downloadable

I'm keep failing to download one file c_g_a_9.zip

不同的normlize是怎么得来的

我注意到你们用在CUHK，ICFG和RSTP上的normlize是不一样的，这些不同的normlize是怎么得来的？

itr_pa100k task Test Result

I modified the 'batch_size_train' parameter in the 'Retrieval_pa100k.yaml' file to 12. After running the script, I obtained the following test results. However, the result of mAP in the paper is 82.58.

请教一下生成caption的prompt

我想请教一下，您文章中使用BLIP对图片生成caption的prompt是怎么设置的？方便告诉一下吗

CUHK-PEDES

请问CUHK-PEDES数据集怎么下载的？点链接进去啥都没有

gene_crop/c_g_a_1/文件夹下有多个图像缺失

您好，非常感谢您能够开源代码和数据集！在我预训练的过程中，我发现gene_crop/c_g_a_1/路径下，缺失了多个图像。是不是您上传json的时候，上传了未整理的版本呢？

Dataset Drive Link

Hello,
Thanks for your awesome study!
I was wondering if there is any chance you could share a Google Drive link for the dataset for the community of ReID?

多数据集混合微调训练

作者你好！如果把不同数据集混合微调，比如cuhk+icfg，需要做以下工作：

把icfg的caption离线处理成cuhk格式（全部小写，标点符号替换为空格。）
icfg数据集的image_id累加cuhk的最大id
（需要计算新的均值方差吗）

除此之外还需要其他修改吗？这样多数据集的微调模型泛华效果是否会更好呢？期待回复，谢谢！

Issue with Missing JSON Files in gene_attrs Folder after Extracting finetune.zip

I have organized the files according to "Organize data folder as follows," but when I extracted the finetune.zip file, there were no JSON files inside the gene_attrs folder (such as g_4x_attrs.json). How can I resolve this issue?

Comparison with BLIP and importance of Attribute Learning

Hi, I have tried APTM it is awesome!

Now, I want to finetune APTM for text-based person retrieval task over custom data which contains more person attributes, like different poses and more objects. I'm trying to understand how important attribute learning is, and whether I can finetune BLIP instead of APTM.

In the paper, it is mentioned that "BLIP is used to produce more fitting captions for every synthetic image and form the final
image-text pairs" in MALS dataset. I have following questions regarding the approach:

Did you use the pretrained version of BLIP or did you finetune it over some person image-text pairs before labelling MARS dataset?
If you're using BLIP generated captions as ground truth and then pretraining APTM, doesn't that mean pretrained APTM is trying to achieve BLIP performance in pretraining phase? Perhaps in the finetuning phase, APTM may be better than BLIP. Is this thought process correct?
Do you have APTM comparison results with BLIP (either pretrained or fine-tuned over some person image-text pairs)?

What approach would you suggest if I want to APTM to understand more person attributes, like different poses and more objects?

Finetune for ITC + ITM + MLM. This would require only image-text pairs. (Can be done be BLIP too)
Finetune for IAC + IAM + MAM + ITC + ITM + MLM. This would require attribute labels to be prepared.

Thanks! Please correct me if I understood anything incorrectly.

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.