Giter Club home page Giter Club logo

movenet.pytorch's Introduction

movenet.pytorch's People

Contributors

fire717 avatar superbayes avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

movenet.pytorch's Issues

About fine tuning

Hi, I want to use your pre-trained model to fine tuning my own models by using my own dataset.
My dataset just has 9points not same as coco, is it possible to do it?

The model for thunder

Hi Author,

我拜读了有关代码。看起来目前的版本只支持Lightning。所以基于目前的代码,我做了一些修改。使其可以训练thunder版本,重要的修改如下:

  1. 采用coco 2017, 用make_coco_data_17keypooints.py进行过滤产生新的dataset
  2. 修改movenet_mobilenetv2.py,根据width mult 1.75调整网络参数,以及相应的upsample参数
  3. 修改movenet_loss.py,加载thunder的权重矩阵。这个权重矩阵我自己写代码产生,核心算法如下:
    ft_size = 64
    delta = 1.8
    weight_to_center = torch.zeros((ft_size, ft_size))
    y, x = np.ogrid[0:ft_size, 0:ft_size]
    center_y, center_x = ft_size / 2.0, ft_size/ 2.0
    y = y - center_y
    x = x - center_x
    weight_to_center = 1 / (np.sqrt(y * y + x * x) + delta)
    weight_to_center = weight_to_center.astype(np.float32).reshape(ft_size, ft_size, 1)
  4. 修改其它有关代码,主要是hard code的for lightning的地方,比如48修改成64,192改成256.

目前训练没有结束,看起来loss在持续降低,epoch 27的时候,val精度达到72%,还在继续训练。

目前遇到的问题是,用上面的checkpoint进行预测,看起来一个关键点都找不到,kpt hm的值都很小。
想请教作者,是否还有什么关键的地方需要做调整?Thanks!!

Confidence values

Hey @fire717 ,

Thanks for building this model. It has been very helpful to my work.
Do you know how to get the confidence values out of the model, in the way that the original Movenet does it?

Thanks

Finetune official models

Hi, Can this repo be used to fine-tune the official tflite models of lightning and thunder? Please guide me through the process if this is possible. Thanks

question about “center_weight_origin.npy "

请教下 decode的时候 这个
_center_weight = np.load('lib/data/center_weight_origin.npy').reshape(48,48)
权重的作用是什么?突出中心点的值?
这个npy是怎么生成的呢
谢谢

backbone:mobilenetv2的结构问题

非常感谢您的工作,作为一个菜鸟想请问一下网络结构总体是mobilenetv2的backbone+fpn,但Mobilenetv2以matlab的结构版本或者最常见的版本是224的input,而且依据原论文中stride为1(可残差连接)的有1+3+1=5个,而在您的代码Mobilenetv2的forward中我注意到仅有3次残差连接,且原论文中mobilenetv2无上采样层unsample,这是什么原因导致您选择了将mobilnetv2进行了以下改动?或者是我理解错误?或者是说采用了Inverted Residuals这种mobilenet的特色结构的在业界均可称作以mobilenet为backbone

有关训练集的问题

在尝试使用别的数据集时,如Human3.6m这类若未标注是否被遮挡的数据集进行训练,应该如果处理呀?
如全标注为可见对结果的影响大吗?

python predict.py没结果

python predict.py后只产生了空的output/predict文件夹,没有预测结果。是哪里需要修改一下吗

Found two problems, please take a look

您好,为了节省时间,我就用中文说了。
首先感谢您的工作,节省了我们很多时间。
我今天才发现您开源了代码,在这之前我自己也参照您的文章进行了复现,但我还没有调出一个好的效果。
今天发现您开源之后挺激动的,立马用您的代码进行了复现,并阅读了核心实现部分。但我发现了2个问题:

  1. other_keypoints的标注生成有问题,这会导致生成heatmap时other_keypoints的高斯全部都挤在第17个channel上。具体由下面这行代码导致(应该是kid2而不是kid):
    other_keypoints[kid].append([kx,ky])
  2. 在生成keypoint regression target时,您对center周围一圈都赋值了,这与您的文章里一致。但是在计算loss时,似乎还是只使用了center那一个点的值。另外,在计算loss时,您使用了网络实际输出的center和keypoint位置来计算regression loss 和offset loss,这与centernet的做法不一致,centernet使用的是ground truth的位置。请问这些都是有意而为之的吗?

关于修改模型实现多分类以及数据集格式转化和部署的相关问题?

@fire717 大佬你好,就是我是一名大二学生,然后是在中北大学的robomaster战队里负责用神经网络识别装甲板实现自动瞄准,不过就是之前我用yolo系列训练出来的模型最后实际测试时得到的bbox和装甲板的轮廓并不能很好的拟合,导致后续使用pnp进行姿态解算时会有较大误差,所以我想将传统yolo的数据集格式改为用四个角点的归一化坐标,现在的数据集格式是像这样:1 0.673029 0.373564 0.678429 0.426232 0.830433 0.401262 0.824525 0.351212,第一个数字是类别id,后面八个数字是归一化后的装甲板的四个角点坐标,之前我使用yolov5-face已经训练出来一个可以直接定位装甲板四个角点的模型,效果如下:
ca84c03809b033d4-1.jpg
所以我想请教请教一下如何将我现在的数据集标注格式转化为您使用的coco格式,然后因为我们需要同时识别数字和颜色,所以我想将颜色和数字解耦,就是在head中增添一个1x1conv来单独输出颜色,之前以yolox为基础修改过,但是对于您的模型也不知道如何下手修改,最后就是我之前是通过openvino的c++接口来部署模型,所以不知道大佬能否提供c++实现后处理的相关思路,想参考一下,问的有点多,但还是希望大佬不吝赐:-)

预测精度acc时报错

FileNotFoundError: [Errno 2] No such file or directory: ../data/eval/mype . json'
大佬,是不是我处理数据的时候导致的问题,这个json文件要怎么生成

运行evaluate.py出现报错mypc.json

我按照readme执行到训练完,开始测试时,执行python evaluate.py出现报错问题Traceback (most recent call last):
File "evaluate.py", line 57, in
main(cfg)
File "evaluate.py", line 28, in main
data_loader = data.getEvalDataloader()
File "/movenet.pytorch-master/lib/data/data.py", line 159, in getEvalDataloader
with open(self.cfg['eval_label_path'], 'r') as f:
FileNotFoundError: [Errno 2] No such file or directory: '../data/eval/mypc.json'
我找了所有的文件夹都没有这个mypc.json文件

关于 reg loss

Hi fire717,感谢无私的分享。我看你在 dataset 中对 reg 不仅仅是在当前中心赋值,也在周围赋值,据你所说是为了加快收敛。但是在实际的 reg loss中,你只是用的当前中心的值,这是为什么呢?

代码如下:

 def regsLoss(self, pred, target, cx0, cy0,  kps_mask, batch_size, num_joints):
     #[64, 14, 48, 48]
     _dim0 = torch.arange(0,batch_size).long()
     _dim1 = torch.zeros(batch_size).long()
     loss = 0
     for idx in range(num_joints):
        gt_x = target[_dim0,_dim1+idx*2,cy0,cx0]
        gt_y = target[_dim0,_dim1+idx*2+1,cy0,cx0]
            
        pre_x = pred[_dim0,_dim1+idx*2,cy0,cx0]
        pre_y = pred[_dim0,_dim1+idx*2+1,cy0,cx0]

        loss+=self.l1(gt_x,pre_x,kps_mask[:,idx])
        loss+=self.l1(gt_y,pre_y,kps_mask[:,idx])
     return loss / num_joints

Question regarding InvertedResidual block implementation

感谢fire717老师分享的代码实现和解析文章。我拜读之后感到受益匪浅。

这边对于mobilenet_v2的代码实现有一点细节方面的问题,不知道老师您能否为我解惑?
这是本repo对于InvertedResidual模块的实现

def forward(self, x):
    x = self.conv1(x)
    for _ in range(self.n):
         x = x + self.conv2(x)
    return x

以下是参考torchvision中的mobilenet_v2实现中对应的部分:

# building inverted residual blocks
for t, c, n, s in inverted_residual_setting:
    output_channel = _make_divisible(c * width_mult, round_nearest)
    for i in range(n):
        stride = s if i == 0 else 1
        features.append(block(input_channel, output_channel, stride, expand_ratio=t, norm_layer=norm_layer))
        input_channel = output_channel

我目前理解的两者之间的区别是,本实现版本中的conv2部分是网络结构相同且共用相同权重的InvertedResidual block,输入通过conv1后将通过(n-1)个权重相同的模块。而在torchvision的实现版本中,因为block会调用InvertedResidua的constructor,生成的是网络结构相同但不共用权重的InvertedResidualBlock。不知道我这样的理解是否正确?想请教一下原版本movenet的实现也采用了类似的设计吗?

ACC计算问题

#不存在的点设为-1 后续不参与acc计算

您好大佬,打扰了,首先感谢您的分享,我想问一下您在这里说的“设为-1 且后续不参与acc计算的点” 是不是后续还是在算acc时参与计算了呀,还是说您在其他哪个地方将这些为-1的点去掉了呢?因为我其实print看这些-1的点还是参与计算了呀
res = np.power(pre[:,:,0]-labels[:,:,0],2)+np.power(pre[:,:,1]-labels[:,:,1],2)

希望能得到您的解释谢谢!

weights for the movenet_mobilenetv3.py

Hi, I am not able to match the pre-trained weights for the above model file from the output path. Could you guide me on how i can use the v3 model with its pre-trained weights. Thanks

reg label 生成

for j in range(cy-2,cy+3):
if j<0 or j>img_size//4-1:
continue
for k in range(cx-2,cx+3):
if k<0 or k>img_size//4-1:
continue
if cx<img_size//4/2-1:
heatmaps[i*2][j][k] = reg_x-(cx-k)#/(img_size//4)
else:
heatmaps[i*2][j][k] = reg_x+(cx-k)#/(img_size//4)
if cy<img_size//4/2-1:
heatmaps[i*2+1][j][k] = reg_y-(cy-j)#/(img_size//4)
else:
heatmaps[i*2+1][j][k] = reg_y+(cy-j)

您好,感谢您的分享,看了这段关于reg生成heatmap的代码,是在cx,cy周围(左与上边2个pixel;右与下边3个pixel)也附上reg值,但在不同位置时调整偏移是根据 img_size//4/2-1,相当与图的一半。这样的操作导致比如 - cx,cy在图上半部分,reg值在cx,cy周围反而偏多或少了?

movenet tensorflowjs uint8量化

大神,请教一下,你是否有尝试过实现movenet模型的量化,比如在hub上提供的tensorflowjs 模型是半精度的即float16的,你这边是否有再进一步量化成uint8精度的?若有,请问是怎么操作的?效果如何?

关于regs译码会导致标签偏移的问题

请问下regs译码的时候,为何要给regs_origin+0.5再int32呢,我验证了一张图片,其编码时对于48*48而言,最后一个关键点的标签是H W =42 16,但译码以后变成了42 17,原因就在于cx cy是24 24,原regsx本来是-8,但+0.5再int32导致其出现偏移,即对于x<cx和y<cy的都会因此导致偏移, 请问这是有原因的还是错误

Implement into Real Time

Hi, currently i wanted to implement this into a real time using webcam. May i know is it possible? and how?

关于key_points 的mirror 方法

你好作者,非常感谢你的开源code。
背景:
我想修改识别的key points 的总数,因此在改动的时候,mirror 这个地方不甚了解

问题:
在修改到 data_augment 的时候,就发现,mirror 的方法 不太了解
该代码位于lib/data/data_augment.py

`
def Mirror(src, label=None):
"""
item = {
"img_name":save_name,
"keypoints":save_keypoints, relative position
"center":save_center,
"other_centers":other_centers,
"other_keypoints":other_keypoints,
}
# mirror 后左右手顺序就变了!
"""
keypoints = label['keypoints']
center = label['center']
other_centers = label['other_centers']
other_keypoints = label['other_keypoints']

img = cv2.flip(src, 1)
if label is None:
    return img, label

for i in range(len(keypoints)):
    if i % 3 == 0:
        keypoints[i] = 1 - keypoints[i]
keypoints = [
    keypoints[0], keypoints[1], keypoints[2],
    keypoints[6], keypoints[7], keypoints[8],
    keypoints[3], keypoints[4], keypoints[5],
    keypoints[12], keypoints[13], keypoints[14],
    keypoints[9], keypoints[10], keypoints[11],
    keypoints[18], keypoints[19], keypoints[20],
    keypoints[15], keypoints[16], keypoints[17],
    keypoints[24], keypoints[25], keypoints[26],
    keypoints[21], keypoints[22], keypoints[23],
    keypoints[30], keypoints[31], keypoints[32],
    keypoints[27], keypoints[28], keypoints[29],
    keypoints[36], keypoints[37], keypoints[38],
    keypoints[33], keypoints[34], keypoints[35],
    keypoints[42], keypoints[43], keypoints[44],
    keypoints[39], keypoints[40], keypoints[41],
    keypoints[48], keypoints[49], keypoints[50],
    keypoints[45], keypoints[46], keypoints[47]]

`

可以看到代码中keypoints 是写死的,但是这个规律没有找到。
如果我是 4个关键点 又或者 60 个关键点,他们都是左右分布的。我想问下,这个应该怎么修改,或者 原先是为啥要这样写的?

關於ground truth的decode

請問一下 關於下面這行ground truth

gt = movenetDecode(labels, kps_mask,mode='label')

請問為何需要經過decode??
我的理解是直接讀取jason file得到位置
但這邊跟predict一樣 似乎丟進model去decode
請問有特別的原因嗎? 這樣算loss時會不會有誤差?
謝謝

关于MoveNet和HRNet系列对比

您好,您有对比过MoveNet和HRNet系列在coco上的精度对比吗?想做个基于骨骼关键点的步态识别,需要轻量化准确度比较好的姿态估计器,moveNet相比大型HRNet有差很多吗?

About post processing of multipose

fire老兄,请问你了解multipose的后处理吗?通过观察官方的multipose模型,看起来后处理跟singlepose颇有不同,不过有一些细节还不是很明白。

关于reg操作在训练时体现

reg_x = np.reshape(reg_x, (reg_x.shape[0],1,1))
reg_y = np.reshape(reg_y, (reg_y.shape[0],1,1))
# print(reg_x.shape,reg_x,reg_y)
reg_x = reg_x.repeat(48,1).repeat(48,2)
reg_y = reg_y.repeat(48,1).repeat(48,2)
#print(reg_x.repeat(48,1).repeat(48,2).shape)
#bb
#### 根据center得到关键点回归位置,然后加权heatmap
range_weight_x = np.reshape(_range_weight_x,(1,48,48)).repeat(reg_x.shape[0],0)
range_weight_y = np.reshape(_range_weight_y,(1,48,48)).repeat(reg_x.shape[0],0)
tmp_reg_x = (range_weight_x-reg_x)**2
tmp_reg_y = (range_weight_y-reg_y)**2
# print(tmp_reg_x.shape, _range_weight_x.shape, reg_x.shape)
tmp_reg = (tmp_reg_x+tmp_reg_y)**0.5+1.8#origin 1.8
#print(tmp_reg.shape,heatmaps[:,n,...].shape)(1, 48, 48)
# print(heatmaps[:,n,...][0][19:25,19:25])
# cv2.imwrite("t.jpg",heatmaps[:,n,...][0]*255)
# print(tmp_reg[0][19:25,19:25])
tmp_reg = heatmaps[:,n,...]/tmp_reg

您好,在prediction 阶段可以到 reg 进行了这样的一个处理, 也是TF的处理方式
48x48 根据x,y轴 以0到47的生成的坐标
x' = (rangeweight - reg)^2
tmp_reg = (x'+y')^0.5 + 1.8
keypoint_heatmap/tmp_reg -> 在取 maxpoint 的 reg_x, reg_y

这样的一个操作请问在训练哪部分体现呢?
另外,这样得到的reg_x, reg_y 和 reg heatmap 直接得到的坐标有什么区别呢?
多谢

edit:请问在微信交流一下其他细节方便吗?

img_size 變更後error

我將config 的img_size 變更為384之後 先是發生 cuda memory 爆掉
RuntimeError: CUDA out of memory. Tried to allocate 82.00 MiB (GPU 0; 7.93 GiB total capacity; 6.47 GiB already allocated; 63.56 MiB free; 6.50 GiB reserved in total by PyTorch)

然後我變更batch size = 32之後發生下面error
File "/home/alan/Downloads/movenet.pytorch/lib/loss/movenet_loss.py", line 304, in maxPointPth
heatmap = heatmap*self.center_weight[:heatmap.shape[0],...]
RuntimeError: The size of tensor a (96) must match the size of tensor b (48) at non-singleton dimension 3

請問 img_size 能否加大 還是只能用192
加大後哪邊需要更改呢 感謝

无法复现coco指标

这套codebase在coco17上根本无法复现acc90多的指标。可视化效果也很差。

請問單色背景的錯誤predict / Single color background predict error.

請問各位有沒有遇過在單色背景(如紅色)下,predict點位應該位於人物身上的被拉過去背景的情形??
要如何解決?? 感謝
I met a situation like single color background such as red color, predict keypoints are located at red color not on human.
Is there any solution for this situation ? thanks.
699pic_2ao1f7_xy

Support for Apple Silicon / M1

Support for GPU-accelerated training on Apple Silicon is apparently only available as of PyTorch version 1.12

I also have other problems with running it, and before I spend hours on trying to fix it for my setup, did anybody already do it?

Sorry for already filing an issue, if there was a Q/A section I would have used that before.

thanks in advance

How to fine tune the pre-trained model?

Thank you for creating this project. I am looking into fine-tuning the pretrained model and wanted to ask how I should approach this? Is it enough to just load the pretrained model in train.py and use my custom dataset to adapt to? Or are there more steps involved?

Output rendering inconsistency with API

HI,

While using the tensorflow api,
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
interpreter.set_tensor(input_details[0]['index'], np.array(input_image))
interpreter.invoke()
print( interpreter.get_tensor(output_details[0]['index']).shape)

The output is 1,1,17,3 that is 17 keypoints and their respective y-x, coordinates and confidence scores

I used the pretrained model in this repositories output folder called e91_valacc0.79763.pth
converted to onnx then to tf and finally to tflite. The only change I made in pth2onnx.py is opset_version=10

torch.onnx.export(run_task.model, dummy_input1, "output/pose.onnx",
verbose=True, input_names=input_names, output_names=output_names,
do_constant_folding=True,opset_version=10)

now i did the same for the tflite model- that is the first 5 lines of code.
The output shape is 1,34,48,48.

How can I obtain the output in the same format as that we receive while using the API?

有没有预训练的模型?

作者大佬,能否提供您基于人体姿态的模型,作为预训练的基础模型?
这样当我们训练自己特定的数据,比如手部关键点识别,就可以在预训练的模型上再次训练,加快训练速度.
thanks a lot!!!

Question regarding the pre-trained models

Hey Fire,

Thanks for the repository. It looks well made and to the point.

About the pre-trained models, are they the same ones that google has uploaded, converted into pytorch-compatible format or are they different ones? If they are different, what are they pre-trained on?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.