br-idl / paddlevit Goto Github PK
View Code? Open in Web Editor NEW:robot: PaddleViT: State-of-the-art Visual Transformer and MLP Models for PaddlePaddle 2.0+
Home Page: https://github.com/BR-IDL/PaddleViT
License: Apache License 2.0
:robot: PaddleViT: State-of-the-art Visual Transformer and MLP Models for PaddlePaddle 2.0+
Home Page: https://github.com/BR-IDL/PaddleViT
License: Apache License 2.0
Describe your feature request
Add and debug the multi scale sampler
Describe the reference code or paper
Refer to the source code listed in the original paper, from here
Describe the possible solution
Now has an initial version, needs debug and test
Additional context
N/A
Describe your feature request
Current validation (config.EVAL is True) mode still create and load training dataset and dataloader, which is not flexible when users only have val set.
So main method needs support val mode without touching training set.
Describe the reference code or paper
N/A
Describe the possible solution
I have a fix in ViT model, which can be used to other classification model.
Please refer to commit 9a7c105 for details.
在 vit transformer
的实现中(ViT Transformer Attention),多头注意力的 attn_head_size
的计算是由传入的 embed_dim
和 num_heads
计算得到的:
self.attn_head_size = int(embed_dim / self.num_heads)
我认为这里的实现至少有两个问题:
embed_dim
是否能num_heads
整除做检查。当embed_dim
不能被num_heads
整除,或者num_heads > embed_dim
时,transpose_multihead
的操作会出现异常: def transpose_multihead(self, x):
new_shape = x.shape[:-1] + [self.num_heads, self.attn_head_size]
x = x.reshape(new_shape)
x = x.transpose([0, 2, 1, 3])
return x
attn_head_size
的大小受到 embed_dim
和 num_heads
的限制,当预训练模型时,不能随意设置 attn_head_size
的大小,代码不够灵活。解决上述问题的办法,就是为 Attention
的 __init__
方法添加一个 attn_head_size
的参数,这样即不影响现有预训练模型的加载,又可以在预训练时,灵活设置 attn_head_size
的大小。由于 attn_head_size
与输入维度 embed_dim
无关,也不需要验证 embed_dim
是否能被 num_heads
整除。
目前主流框架中,两种实现都有:
第一种,由 embed_dim
和 num_heads
参数计算 attn_head_size
的实现,包括:
PaddlePaddle
: https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/nn/layer/transformer.py#L109
PyTorch
: https://github.com/pytorch/pytorch/blob/master/torch/nn/modules/transformer.py
transformers
: https://github.com/huggingface/transformers/blob/master/src/transformers/models/bert/modeling_bert.py#L226
第二种,将 attn_head_size
作为参数传入的实现,包括:
TensorFlow
: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/keras/layers/multi_head_attention.py#L126
TensorFlow Addons
: https://github.com/tensorflow/addons/blob/master/tensorflow_addons/layers/multihead_attention.py
我个人非常推荐第二种实现方式,API
使用起来更加灵活,代码看起来也非常顺畅,更加合理。
比如,原实现中 all_head_size
的定义:
self.all_head_size = self.attn_head_size * self.num_heads
all_head_size == embed_dim
,完全没有必要定义。这个变量,只在 __init__
:
self.qkv = nn.Linear(embed_dim,
self.all_head_size*3, # weights for q, k, and v
weight_attr=w_attr_1,
bias_attr=b_attr_1 if qkv_bias else False)
和 forward
:
new_shape = z.shape[:-2] + [self.all_head_size]
中用到。__init__
中的 qkv
映射的输出维度 self.all_head_size*3
可改为 embed_dim*3
,forward
中的 new_shape
用到的 self.all_head_size
,可以在方法的开始,取出输入 x
的维度,修改如下:
embed_dim = x.shape[-1]
……
new_shape = z.shape[:-2] + [embed_dim]
以上是我对源码中定义 self.all_head_size
的质疑。
还有最后输出加一层 Linear Layer
的必要性:
self.out = nn.Linear(embed_dim,
embed_dim,
weight_attr=w_attr_2,
bias_attr=b_attr_2)
在 forward
中,最后输出执行线性映射操作的上面由一行注释 reshape
,
z = z.reshape(new_shape)
# reshape
z = self.out(z)
意思应该是将维度映射回输入维度 embed_dim
,方面后面的残差连接。不过既然 all_head_size == embed_dim
,那何来 reshape
?
所以,我认为这里对输出的线性映射是不必要的。
不过,如果我们使用第二种方式实现,将 attn_head_size
作为参数传入,不依赖 embed_size
和 num_heads
来计算,以上代码看起来就顺畅多了,合理多了。
第二种实现,将 attn_head_size
作为参数传入,只需在源代码基础上更改几行代码即可,实现如下:
from typing import Tuple, Union
import paddle
import paddle.nn as nn
from paddle import ParamAttr
from paddle import Tensor
class Attention(nn.Layer):
""" Attention module
Attention module for ViT, here q, k, v are assumed the same.
The qkv mappings are stored as one single param.
Attributes:
num_heads: number of heads
attn_head_size: feature dim of single head
all_head_size: feature dim of all heads
qkv: a nn.Linear for q, k, v mapping
scales: 1 / sqrt(single_head_feature_dim)
out: projection of multi-head attention
attn_dropout: dropout for attention
proj_dropout: final dropout before output
softmax: softmax op for attention
"""
def __init__(self,
embed_dim: int,
num_heads: int,
attn_head_size: int,
qkv_bias: Union[bool, ParamAttr],
dropout: float = 0.,
attention_dropout: float = 0.):
super().__init__()
"""
增加了一个attn_head_size的参数,attn_head_size和num_heads的大小不受embed_dim的限制,使API的使用更灵活。
"""
self.num_heads = num_heads
# self.attn_head_size = int(embed_dim / self.num_heads)
self.attn_head_size = attn_head_size
self.all_head_size = self.attn_head_size * self.num_heads # Attention Layer's hidden_size
w_attr_1, b_attr_1 = self._init_weights()
self.qkv = nn.Linear(embed_dim,
self.all_head_size*3, # weights for q, k, and v
weight_attr=w_attr_1,
bias_attr=b_attr_1 if qkv_bias else False)
self.scales = self.attn_head_size ** -0.5
w_attr_2, b_attr_2 = self._init_weights()
# self.out = nn.Linear(embed_dim,
# embed_dim,
# weight_attr=w_attr_2,
# bias_attr=b_attr_2)
# 汇总多头注意力信息,并将维度映射回输入维度embed_dim,方便残差连接
self.out = nn.Linear(self.all_head_size,
embed_dim,
weight_attr=w_attr_2,
bias_attr=b_attr_2)
self.attn_dropout = nn.Dropout(attention_dropout)
self.proj_dropout = nn.Dropout(dropout)
self.softmax = nn.Softmax(axis=-1)
def _init_weights(self) -> Tuple[ParamAttr, ParamAttr]:
weight_attr = paddle.ParamAttr(initializer=nn.initializer.KaimingUniform())
bias_attr = paddle.ParamAttr(initializer=nn.initializer.KaimingUniform())
return weight_attr, bias_attr
def transpose_multihead(self, x: Tensor) -> Tensor:
new_shape = x.shape[:-1] + [self.num_heads, self.attn_head_size]
x = x.reshape(new_shape)
x = x.transpose([0, 2, 1, 3])
return x
def forward(self, x: Tensor) -> Tuple[Tensor, Tensor]:
qkv = self.qkv(x).chunk(3, axis=-1)
q, k, v = map(self.transpose_multihead, qkv)
attn = paddle.matmul(q, k, transpose_y=True)
attn = attn * self.scales
attn = self.softmax(attn)
attn_weights = attn
attn = self.attn_dropout(attn)
z = paddle.matmul(attn, v)
z = z.transpose([0, 2, 1, 3])
new_shape = z.shape[:-2] + [self.all_head_size]
z = z.reshape(new_shape)
# 汇总多头注意力信息,并将维度映射回输入维度embed_dim,方便残差连接
z = self.out(z)
z = self.proj_dropout(z)
return z, attn_weights
测试:
def main():
t = paddle.randn([4, 16, 96]) # [batch_size, num_patches, embed_dim]
print('input shape = ', t.shape)
model = Attention(embed_dim=96,
num_heads=8,
attn_head_size=128,
qkv_bias=False,
dropout=0.,
attention_dropout=0.)
print(model)
out, attn_weights = model(t)
print(out.shape)
print(attn_weights.shape)
for name, param in model.named_parameters():
print(f'param name: {name},\tparam shape: {param.shape} ')
if __name__ == "__main__":
main()
输出:
input shape = [4, 16, 96]
Attention(
(qkv): Linear(in_features=96, out_features=3072, dtype=float32)
(out): Linear(in_features=1024, out_features=96, dtype=float32)
(attn_dropout): Dropout(p=0.0, axis=None, mode=upscale_in_train)
(proj_dropout): Dropout(p=0.0, axis=None, mode=upscale_in_train)
(softmax): Softmax(axis=-1)
)
[4, 16, 96]
[4, 8, 16, 16]
param name: qkv.weight, param shape: [96, 3072]
param name: out.weight, param shape: [1024, 96]
param name: out.bias, param shape: [96]
以上是我个人的一点儿不成熟的小建议,望官方评估采纳~
Describe the bug
For the classification model which has EMA training,
error occurs when start resuming training.
The EMA model name and loading is not correct.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
Model EMA should be loaded without error
Additional context
Describe your feature request
TransGAN: training from scratch
Describe the reference code or paper
official repo: https://github.com/VITA-Group/TransGAN
Describe the possible solution
TODO
Additional context
N/A
Lines 155 to 162 in 9877f36
I can't find package detail
and yaml
in PyPI.
Describe the bug
resume training error
AttributeError: 'Momentum' object has no attribute 'set_dict'
To Reproduce
Steps to reproduce the behavior:
1.Go to 'PaddleViT/object_detection/Swin/'
2.Run 'python main_single_gpu.py -resume='./output/train-20211210-09-50-43/Swin-Epoch-45'
The recovery of model can pass
Screenshots
Traceback (most recent call last): File "C:\Program Files\JetBrains\PyCharm Community Edition 2021.2.2\plugins\python-ce\helpers\pydev\pydevd.py", line 1483, in _exec pydev_imports.execfile(file, globals, locals) # execute the script File "C:\Program Files\JetBrains\PyCharm Community Edition 2021.2.2\plugins\python-ce\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile exec(compile(contents+"\n", file, 'exec'), glob, loc) File "F:/***/pp_swin/main_single_gpu.py", line 400, in <module> main() File "F:/***/pp_swin/main_single_gpu.py", line 313, in main optimizer.set_dict(opt_state) AttributeError: 'Momentum' object has no attribute 'set_dict'
Version (please complete the following information):
PaddleViT/image_classification/ViT/transformer.py Encoder
初始化创建 encoder_layer
时,深拷贝的意义是什么呢?
class Encoder(nn.Layer):
def __init__(self,
embed_dim,
num_heads,
depth,
qkv_bias=True,
mlp_ratio=4.0,
dropout=0.,
attention_dropout=0.,
droppath=0.):
super(Encoder, self).__init__()
# stochatic depth decay
depth_decay = [x.item() for x in paddle.linspace(0, droppath, depth)]
layer_list = []
for i in range(depth):
encoder_layer = EncoderLayer(embed_dim,
num_heads,
qkv_bias=qkv_bias,
mlp_ratio=mlp_ratio,
dropout=dropout,
attention_dropout=attention_dropout,
droppath=depth_decay[i])
layer_list.append(copy.deepcopy(encoder_layer)) # 这里对encoder_layer做深拷贝的意义是什么呢?
self.layers = nn.LayerList(layer_list)
……
for
循环创建 encoder_layer
,每一个 encoder_layer
都是不同的对象,也不存在参数共享的问题,这里对 encoder_layer
做深拷贝是基于什么样的考量呢?
我个人认为这里对 encoder_layer
的深拷贝是不必要的:
layer_list.append(encoder_layer)
可能是我想的还不够深,期待官方的答疑解惑~
Describe your feature request
Cascade Mask R-CNN
Describe the reference code or paper
Swin detection official code (from mmdet) here
Describe the possible solution
Mask R-CNN is already implemented in PaddleViT here
Additional context
Add any other context or screenshots about the feature request here.
PaddleViT/image_classification/ViT/transformer.py 的 L300
创建 EncoderLayer
时,参数 qkv_bias
, mlp_ratio
, dropout
, attention_dropout
存在硬编码的问题,导致 Encoder
的 __init__
方法传入的参数形同虚设:
class Encoder(nn.Layer):
"""Transformer encoder
Encoder encoder contains a list of EncoderLayer, and a LayerNorm.
Attributes:
layers: nn.LayerList contains multiple EncoderLayers
encoder_norm: nn.LayerNorm which is applied after last encoder layer
"""
def __init__(self,
embed_dim,
num_heads,
depth,
qkv_bias=True,
mlp_ratio=4.0,
dropout=0.,
attention_dropout=0.,
droppath=0.):
super(Encoder, self).__init__()
# stochatic depth decay
depth_decay = [x.item() for x in paddle.linspace(0, droppath, depth)]
layer_list = []
for i in range(depth):
encoder_layer = EncoderLayer(embed_dim,
num_heads,
qkv_bias=True,
mlp_ratio=4.,
dropout=0.,
attention_dropout=0.,
droppath=depth_decay[i])
layer_list.append(copy.deepcopy(encoder_layer))
self.layers = nn.LayerList(layer_list)
……
应该改成:
class Encoder(nn.Layer):
"""Transformer encoder
Encoder encoder contains a list of EncoderLayer, and a LayerNorm.
Attributes:
layers: nn.LayerList contains multiple EncoderLayers
encoder_norm: nn.LayerNorm which is applied after last encoder layer
"""
def __init__(self,
embed_dim,
num_heads,
depth,
qkv_bias=True,
mlp_ratio=4.0,
dropout=0.,
attention_dropout=0.,
droppath=0.):
super(Encoder, self).__init__()
# stochatic depth decay
depth_decay = [x.item() for x in paddle.linspace(0, droppath, depth)]
layer_list = []
for i in range(depth):
encoder_layer = EncoderLayer(embed_dim,
num_heads,
qkv_bias=qkv_bias,
mlp_ratio=mlp_ratio,
dropout=dropout,
attention_dropout=attention_dropout,
droppath=depth_decay[i])
layer_list.append(copy.deepcopy(encoder_layer))
self.layers = nn.LayerList(layer_list)
……
以上~
Does this model reproduced so far ?
BEiT-L (ViT+UperNet, ImageNet-22k pretrain) mIoU-57.0%
https://paperswithcode.com/paper/beit-bert-pre-training-of-image-transformers
Describe your feature request
Reproduce the CvT model
Describe the reference code or paper
Paper: https://arxiv.org/pdf/2103.15808.pdf
official repo: https://github.com/VITA-Group/TransGAN
Describe the possible solution
Additional context
N/A
Describe your feature request
Now dataset.py has hard coded imagenet mean and var, which is not flexible and not easy to find for new users.
This can be set with an argument in config file, users can set the default mean, var values.
Describe your feature request
Token labeling is used in VOLO model training, implement related classes and methods.
Describe the reference code or paper
Describe the possible solution
Implemented according to the official code
Additional context
N/A
Describe your feature request
Add linear_scale_lr arguments in config and control the batch_size for linear lr scale
Describe the reference code or paper
N/A
Describe the possible solution
Add argument in config.py
Add if-condition in main_single_gpu.py and main_multi_gpu.py
Additional context
N/A
train with 8 cards, but the result is less than expected:
#--------------------------------------------------
2021-11-18 14:47:19 [INFO] [EVAL] #Images: 2000 mIoU: 0.0108 Acc: 0.3663 Kappa: 0.2675
2021-11-18 14:47:19 [INFO] [EVAL] Class IoU:
[2.903e-01 3.102e-01 6.999e-01 0.000e+00 3.243e-01 0.000e+00 2.000e-04
0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
0.000e+00 0.000e+00 0.000e+00]
2021-11-18 14:47:19 [INFO] [EVAL] Class Acc:
[0.3054 0.3275 0.7344 0. 0.3642 0. 1. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. ]
#--------------------------------------------------
Change ViT training script according to new version:
https://github.com/BR-IDL/PaddleViT/blob/develop/image_classification/SwinTransformer/main_multi_gpu.py
Describe the question
In the classification directory, when MIXUP prerequisites are performed in main_single_gpu / main_multi_gpu.py
, the NUM_CLASSES
parameter is not passed, resulting in a non-IMAGENET1K
data set to load, the existing main_single_gpu / main_multi_gpu.py
will result in an error
——since the number of categories of the current model is not equal to 1000, the Mixup default classification number is still 1000, which will cause loss calculations that cannot be performed at this time.
Expected behavior
Introducing the class number parameters, so that the training code is easier to use.
当我在 AIStudio
(经典版)运行 README.md
"Quick Demo for Image Classification" 中的示例:
%cd PaddleViT/image_classification/ViT/
import paddle
from config import get_config
from transformer import build_vit as build_model
# config files in ./configs/
config = get_config('./configs/vit_base_patch16_224.yaml')
# build model
model = build_model(config)
# load pretrained weights, .pdparams is NOT needed
model_state_dict = paddle.load('./vit_base_patch16_224')
model.set_dict(model_state_dict)
加载预训练权重时,出现了如下错误:
/home/aistudio/PaddleViT/image_classification/ViT
merging config from ./configs/vit_base_patch16_224.yaml
W1123 07:14:56.871081 9894 device_context.cc:447] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 10.1, Runtime API Version: 10.1
W1123 07:14:56.875849 9894 device_context.cc:465] device: 0, cuDNN Version: 7.6.
---------------------------------------------------------------------------ValueError Traceback (most recent call last)/tmp/ipykernel_9894/1201834454.py in <module>
12 model = build_model(config)
13 # load pretrained weights, .pdparams is NOT needed
---> 14 model_state_dict = paddle.load('./vit_base_patch16_224')
15 model.set_dict(model_state_dict)
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/framework/io.py in load(path, **configs)
983
984 else:
--> 985 load_result = _legacy_load(path, **configs)
986
987 return load_result
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/framework/io.py in _legacy_load(path, **configs)
1001 else:
1002 # file prefix and directory are compatible cases
-> 1003 model_path, config = _build_load_path_and_config(path, config)
1004 # check whether model file exists
1005 if config.model_filename is None:
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/framework/io.py in _build_load_path_and_config(path, config)
159 "example, it should be written as `paddle.load('model.pdparams')` instead of " \
160 "`paddle.load('model')`."
--> 161 raise ValueError(error_msg % path)
162 else:
163 if prefix_format_exist:
ValueError: The ``path`` (./vit_base_patch16_224) to load model not exists. If you want to load the results saved by `fluid.save_dygraph`, please specify the full file name, not just the file name prefix. For example, it should be written as `paddle.load('model.pdparams')` instead of `paddle.load('model')`.
错误信息显示,加载预训练权重时,需要指定预训练文件的完整文件名,即 .pdparams
是必须的。
加载预训练权重时,加上 .pdparams
后缀就没有问题了:
model_state_dict = paddle.load('./vit_base_patch16_224.pdparams')
我发现 PaddleViT
所有模型中的 README.md
都存在两个问题(以下均以 PaddleViT/image_classification/BEiT/ BEiT 模型的 README.md
为例):
Usage
示例代码中,加载预训练权重时少了后缀 .pdparams
,而且注释中提到 .pdparams is NOT needed
也是不对的,应该是在下面的命令行参数中 -pretrained
的值是不需要 .pdparams
,二者搞混了。from config import get_config
from beit import build_beit as build_model
# config files in ./configs/
config = get_config('./configs/beit_base_patch16_224.yaml')
# build model
model = build_model(config)
# load pretrained weights, .pdparams is NOT needed
model_state_dict = paddle.load('./beit_base_patch16_224_ft22kto1k')
model.set_dict(model_state_dict)
应该讲注释注释中的 , .pdparams is NOT needed
删去,并在模型加载时,加上后缀 .pdparams
:
from config import get_config
from beit import build_beit as build_model
# config files in ./configs/
config = get_config('./configs/beit_base_patch16_224.yaml')
# build model
model = build_model(config)
# load pretrained weights
model_state_dict = paddle.load('./beit_base_patch16_224_ft22kto1k.')
model.set_dict(model_state_dict)
Evaluation
和 Training
的命令行参数值多加了一个单引号,如果在终端直接执行,会出现 FileNotFoundError
错误:FileNotFoundError: [Errno 2] No such file or directory: "'./configs/beit_base_patch16_224.yaml'"
我之前在终端预训模型训练和验证的命令时,出现过这个错误,群里也有其他同学出现了这样的问题。出现这个错误的原因是因为 argparse
在解析命令行参数时,为字符串类型的参数值自动加上了一个双引号。所以,在为命令行参数赋值时,不需要加上引号。所以,应该去掉 Evaluation
和 Training
命令行参数值中的单引号。
单 GPU
验证:
CUDA_VISIBLE_DEVICES=0 \
python main_single_gpu.py \
-cfg='./configs/beit_base_patch16_224.yaml' \
-dataset='imagenet2012' \
-batch_size=16 \
-data_path='/dataset/imagenet' \
-eval \
-pretrained='./beit_base_patch16_224_ft22kto1k'
我修改为:
CUDA_VISIBLE_DEVICES=0 \
python main_single_gpu.py \
-cfg=./configs/beit_base_patch16_224.yaml \
-dataset=imagenet2012 \
-batch_size=16 \
-data_path=/path/to/dataset/imagenet/val \
-eval \
-pretrained=/path/to/pretrained/model/beit_base_patch16_224_ft22kto1k # .pdparams is NOT needed
多 GPU
验证:
CUDA_VISIBLE_DEVICES=0,1,2,3 \
python main_multi_gpu.py \
-cfg='./configs/beit_base_patch16_224.yaml' \
-dataset='imagenet2012' \
-batch_size=16 \
-data_path='/dataset/imagenet' \
-eval \
-pretrained='./beit_base_patch16_224_ft22kto1k'
我修改为:
CUDA_VISIBLE_DEVICES=0,1,2,3 \
python main_multi_gpu.py \
-cfg=./configs/beit_base_patch16_224.yaml \
-dataset=imagenet2012 \
-batch_size=16 \
-data_path=/path/to/dataset/imagenet/val \
-eval \
-pretrained=/path/to/pretrained/model/beit_base_patch16_224_ft22kto1k # .pdparams is NOT needed
单 GPU
训练:
CUDA_VISIBLE_DEVICES=0 \
python main_single_gpu.py \
-cfg='./configs/beit_base_patch16_224.yaml' \
-dataset='imagenet2012' \
-batch_size=32 \
-data_path='/dataset/imagenet' \
我修改为:
CUDA_VISIBLE_DEVICES=0 \
python main_single_gpu.py \
-cfg=./configs/beit_base_patch16_224.yaml \
-dataset=imagenet2012 \
-batch_size=32 \
-data_path=/path/to/dataset/imagenet/train \
多 GPU
训练:
CUDA_VISIBLE_DEVICES=0,1,2,3 \
python main_multi_gpu.py \
-cfg='./configs/beit_base_patch16_224.yaml' \
-dataset='imagenet2012' \
-batch_size=16 \
-data_path='/dataset/imagenet' \
我修改为:
CUDA_VISIBLE_DEVICES=0,1,2,3 \
python main_multi_gpu.py \
-cfg=./configs/beit_base_patch16_224.yaml \
-dataset=imagenet2012 \
-batch_size=16 \
-data_path=/path/to/dataset/imagenet/train \
一会儿,我再提交个 PR
,请官方审查~
PaddleViT/image_classification/ViT/transformer.py VisualTransformer
模型的输出是不是少了一个 attn
:
class VisualTransformer(nn.Layer):
……
def forward(self, x):
x = self.patch_embedding(x)
x, attn = self.encoder(x)
logits = self.classifier(x[:, 0]) # take only cls_token as classifier
return logits
我个人认为,模型的输出应该同时返回 attn
的:
class VisualTransformer(nn.Layer):
……
def forward(self, x):
x = self.patch_embedding(x)
x, attn = self.encoder(x)
logits = self.classifier(x[:, 0]) # take only cls_token as classifier
return logits, attn
理由如下:
attn
,在 Attention
,EncoderLayer
和 Encoder
中一直都是由返回的,如果模型输出不返回 attn
,那么前面几个类的返回将会是多余的,可能毫无意义。attn
,在后期的可视化中可能回用到。我猜前面几个类返回每层注意力权重,这样的设计可能也是基于可视化的考量的。综上,建议模型输出同时返回每层的注意力权重 attn
~
The source code is empty. I want a reference. Please check.
Describe the bug
There is one file named 'stat.py', which has the same name as the system file. The stat.py
needs to be renamed or the VIT code would not run.
Describe your feature request
Add the multi-scale sampler for multi gpu batch sampling
Describe the reference code or paper
Refer to MobileViT paper: https://arxiv.org/pdf/2110.02178.pdf
Page 18.
Describe the possible solution
Refer to original paper.
Additional context
N/A
Describe the bug
在使用ViT基于Imagenet2012数据集进行模型验证时,因为数据集比较大,仅下载了验证数据集。执行模型验证命令,报错,提示没有训练集数据。
经排查,
main_single_gpu.py 在进行模型验证前,需要先加载训练集和验证集。
建议单独增加一个模型验证的脚本。
Describe your feature request
LabelSmoothingCrossEntropyLoss增加可选参数
losses.py中(以DeiT为例)LabelSmoothingCrossEntropyLoss可选参数较少
PaddleViT/image_classification/DeiT/losses.py
Lines 21 to 46 in dd437d4
Describe the reference code or paper
Describe the possible solution
直接调用paddle.nn.functional.cross_entropy计算,可以设置更多参数。
调用前先利用paddle.nn.functional.one_hot将标签转为on-hot形式,再用 paddle.nn.functional.label_smooth将标签平滑,最后将paddle.nn.functional.cross_entropy的soft_label设为True即可实现。
代码如下:
class LabelSmoothingCrossEntropyLoss(nn.Layer):
def __init__(self,
smoothing=0.1,
weight=None,
ignore_index=-100,
reduction='mean',
soft_label=True,
axis=-1,
use_softmax=True,
name=None):
super(LabelSmoothingCrossEntropyLoss, self).__init__()
assert 0 <= smoothing < 1.0
self.smoothing = smoothing
self.weight = weight
self.reduction = reduction
self.ignore_index = ignore_index
self.soft_label = soft_label
self.axis = axis
self.use_softmax = use_softmax
self.name = name
def forward(self, input, label):
label = paddle.nn.functional.one_hot(label, num_classes=input.shape[1])
label = paddle.nn.functional.label_smooth(label, epsilon=self.smoothing)
ret = paddle.nn.functional.cross_entropy(
input,
label,
weight=self.weight,
ignore_index=self.ignore_index,
reduction=self.reduction,
soft_label=self.soft_label,
axis=self.axis,
use_softmax=self.use_softmax,
name=self.name)
return ret
目前,经过简单测试结果和现有方法计算结果一致。
Additional context
Paddle ViT课程训练ResNet18作业中发现,过拟合比较严重,所以想尝试利用Label Smoothing方法缓解。但是搜索paddle api官方文档后发现没有专门的LabelSmoothingCrossEntropyLoss,利用paddle现成的one_hot和label_smooth函数实现了一下。
Describe the bug
I found that the code was copied from the source author's code and some pytorch's code was transformed to paddle error.
To Reproduce
1:
VIT_custom.py line59
return input * paddle.rsqrt(paddle.mean(input ** 2, dim=2, keepdim=True) + 1e-8)
should be
return input * paddle.rsqrt(paddle.mean(input ** 2, axis=2, keepdim=True) + 1e-8)
2:
VIT_custom.py line88
class CustomAct(nn.Layer):
""" CustomAct layer
Custom act method set, defalut "gelu"
"""
def __init__(self, act_layer):
super().__init__()
if act_layer == "gelu":
self.act_layer = gelu
elif act_layer == "leakyrelu":
self.act_layer = leakyrelu
else:
self.act_layer = gelu
which
leakyrelu has not been defined
Describe your feature request
Styleformer training from scratch
Describe the reference code or paper
official repo: https://github.com/Jeeseung-Park/Styleformer
Describe the possible solution
TODO
Additional context
Add any other context or screenshots about the feature request here.
Add HaloNet and align ported weights performance.
Reference paper: https://arxiv.org/pdf/2103.12731.pdf
Timm impl: https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/byoanet.py
交叉熵loss应该计算预测分布概率与真实分布之间的距离,按道理是应该在softmax计算之后再进行Loss计算的,但是训练代码中都是直接计算loss loss=criterion(output, label)
,后面才再计算softmax。请问不应该是先计算softmax再计算Loss,还是其实这两者的顺序对训练没有区别
Describe your feature request
Add onecycle scheduler, which is used in convmixer
Describe the reference code or paper
N/A
Describe the possible solution
refered to onecycle impl in timm
Additional context
N/A
Describe the bug
Some of the main_multi_gpu.py in image classification model miss the paddle.DataParalle(model)
, which may cause the multi-gpu training performs not well.
E.g., in DeiT:
PaddleViT/image_classification/DeiT/main_multi_gpu.py
Lines 326 to 329 in 0573849
Expected behavior
Add paddle.DataParallel(model)
Describe your feature request
Check and modify the training settings for CrossViT models, add missing processing and training methods.
Describe the reference code or paper
Describe the possible solution
Additional context
Add any other context or screenshots about the feature request here.
do you have any model for this paper "line Segment detection using transformer without edges"
Error:
Training will fail when using larger batch:
SystemError: (Fatal) Operator set_value raises an thrust::system::system_error exception. The exception content is :parallel_for failed: cudaErrorInvalidConfiguration: invalid configuration argument. (at /paddle/paddle/fluid/imperative/tracer.cc:192)
Reason:
The reason is explained by the following issues from PaddlePaddle:
PaddlePaddle/Paddle#33057 (comment)
In short, this error is raised because of cuda thrust bug, which is ignored in newer version cuda.
Solution:
install paddle dev version will fix the problem.
You will find the following instructions of how to install it:
https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/develop/install/pip/linux-pip.html
In detail, the problem is fixed by the following patch:
https://github.com/PaddlePaddle/Paddle/pull/33748/files/617e3eda9dfcd76cb6a7ebaa1535340f1023d3f1
Describe your feature request
Will PaddleViT recently add Mobile-Former and release pretrained weights on ImageNet?
Describe the reference code or paper
Paper -> Mobile-Former: Bridging MobileNet and Transformer
Describe the possible solution
Additional context
Add any other context or screenshots about the feature request here.
if nranks > 1:
# Initialize parallel environment if not done.
if not paddle.distributed.parallel.parallel_helper._is_parallel_ctx_initialized():
logger.info("using dist training")
# 初始化动态图模式下的并行训练环境,目前同时初始化NCCL和GLOO上下文用于通信。
paddle.distributed.init_parallel_env()
ddp_model = paddle.DataParallel(model)
else:
ddp_model = paddle.DataParallel(model)
不理解:if not paddle.distributed.parallel.parallel_helper._is_parallel_ctx_initialized(): 这个判断是什么意思,难道paddle.distributed.init_parallel_env()不应该是必须初始化的吗?
另外,还有如果要运行多机多卡需要修改什么代码吗?
谢谢
https://github.com/BR-IDL/PaddleViT/blob/develop/image_classification/SwinTransformer/swin_transformer.py line224
The size of the tensor should be '2, window_h * window_w, window_h * window_w',not '2, window_h * window_w, window_h * window_h'
代码中说Dataset related classes and methods for ViT training and validation
Cifar10, Cifar100 and ImageNet2012 are supported。
我想问下,我想训练自己的数据集需要自己重新写datase类吗?
Steps to reproduce the behavior:
Additional context
issue
github-PaddleViT/object_detection/Swin/中
main_single_gpu.py : line391 缺少train_loss定义
utils.py : 缺少import math
在最后一步
cd PaddleViT/semantic_segmentation pip3 install -r requirements.txt
报错,提示如下
(paddlevit) D:\PyWorkspace\PaddleViT\PaddleViT\semantic_segmentation>pip3 install -r requirements.txt Collecting cityscapesScripts==2.2.0 Using cached cityscapesScripts-2.2.0-py3-none-any.whl (472 kB) ERROR: Could not find a version that satisfies the requirement detail==4.0 (from versions: none) ERROR: No matching distribution found for detail==4.0
尝试直接安装detail包
(paddlevit) D:\PyWorkspace\PaddleViT\PaddleViT\semantic_segmentation>pip3 install detail==4.0 ERROR: Could not find a version that satisfies the requirement detail==4.0 (from versions: none) ERROR: No matching distribution found for detail==4.0 (paddlevit) D:\PyWorkspace\PaddleViT\PaddleViT\semantic_segmentation>pip3 install detail ERROR: Could not find a version that satisfies the requirement detail (from versions: none) ERROR: No matching distribution found for detail
更新pip
(paddlevit) D:\PyWorkspace\PaddleViT\PaddleViT\semantic_segmentation>pip install --user --upgrade pip Requirement already satisfied: pip in d:\anaconda\envs\paddlevit\lib\site-packages (21.3.1)
仍然无法安装
(paddlevit) D:\PyWorkspace\PaddleViT\PaddleViT\semantic_segmentation>pip3 install detail ERROR: Could not find a version that satisfies the requirement detail (from versions: none) ERROR: No matching distribution found for detail
百度没有找到相关包
最后安装了details-0.2.0的工具包,不知道如何使用。
Describe your feature request
Add recompute for dygraph model training, which aims enlarge the batchsize during the training by free intermediate memories.
Describe the reference code or paper
N/A
@jarygrace Let's add this feature asap, thanks!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.