- Dive into Computer Vision
- 公众号:FireAICV
- 算法项目落地多项
- AI算法竞赛爱好者
fire717 / movenet.pytorch Goto Github PK
View Code? Open in Web Editor NEWA Pytorch implementation of MoveNet from Google. Include training code and pre-trained model.
License: MIT License
A Pytorch implementation of MoveNet from Google. Include training code and pre-trained model.
License: MIT License
我生成了自己的新数据集(32点)。
然后参考以下做了数据集转换。
movenet.pytorch/scripts/make_coco_data_17keypoints.py
然后报了这个异常:
RuntimeError: stack expects each tensor to be equal size, but got [161, 48, 48] at entry 0 and [86, 48, 48] at entry 1
暂时没能解决,有解决的思路吗?
eval AP是多少啊
Hi, I want to use your pre-trained model to fine tuning my own models by using my own dataset.
My dataset just has 9points not same as coco, is it possible to do it?
作者您好,我使用您的模型进行了训练,最后生成了热力图和关键点检测,但是没有找到对测试图片进行姿态估计效果图检测的文件,请指点一下!万分感谢
Hi Author,
我拜读了有关代码。看起来目前的版本只支持Lightning。所以基于目前的代码,我做了一些修改。使其可以训练thunder版本,重要的修改如下:
目前训练没有结束,看起来loss在持续降低,epoch 27的时候,val精度达到72%,还在继续训练。
目前遇到的问题是,用上面的checkpoint进行预测,看起来一个关键点都找不到,kpt hm的值都很小。
想请教作者,是否还有什么关键的地方需要做调整?Thanks!!
Hey @fire717 ,
Thanks for building this model. It has been very helpful to my work.
Do you know how to get the confidence values out of the model, in the way that the original Movenet does it?
Thanks
Hi, Can this repo be used to fine-tune the official tflite models of lightning and thunder? Please guide me through the process if this is possible. Thanks
movenet.pytorch/lib/data/data_tools.py
Lines 152 to 153 in bbc8140
您好,请问下面一行应该是 small_y = int(regs[i*2+1,cy,cx]+cy)
吗?
请教下 decode的时候 这个
_center_weight = np.load('lib/data/center_weight_origin.npy').reshape(48,48)
权重的作用是什么?突出中心点的值?
这个npy是怎么生成的呢
谢谢
非常感谢您的工作,作为一个菜鸟想请问一下网络结构总体是mobilenetv2的backbone+fpn,但Mobilenetv2以matlab的结构版本或者最常见的版本是224的input,而且依据原论文中stride为1(可残差连接)的有1+3+1=5个,而在您的代码Mobilenetv2的forward中我注意到仅有3次残差连接,且原论文中mobilenetv2无上采样层unsample,这是什么原因导致您选择了将mobilnetv2进行了以下改动?或者是我理解错误?或者是说采用了Inverted Residuals这种mobilenet的特色结构的在业界均可称作以mobilenet为backbone
在尝试使用别的数据集时,如Human3.6m这类若未标注是否被遮挡的数据集进行训练,应该如果处理呀?
如全标注为可见对结果的影响大吗?
python predict.py后只产生了空的output/predict文件夹,没有预测结果。是哪里需要修改一下吗
您好,为了节省时间,我就用中文说了。
首先感谢您的工作,节省了我们很多时间。
我今天才发现您开源了代码,在这之前我自己也参照您的文章进行了复现,但我还没有调出一个好的效果。
今天发现您开源之后挺激动的,立马用您的代码进行了复现,并阅读了核心实现部分。但我发现了2个问题:
@fire717 大佬你好,就是我是一名大二学生,然后是在中北大学的robomaster战队里负责用神经网络识别装甲板实现自动瞄准,不过就是之前我用yolo系列训练出来的模型最后实际测试时得到的bbox和装甲板的轮廓并不能很好的拟合,导致后续使用pnp进行姿态解算时会有较大误差,所以我想将传统yolo的数据集格式改为用四个角点的归一化坐标,现在的数据集格式是像这样:1 0.673029 0.373564 0.678429 0.426232 0.830433 0.401262 0.824525 0.351212,第一个数字是类别id,后面八个数字是归一化后的装甲板的四个角点坐标,之前我使用yolov5-face已经训练出来一个可以直接定位装甲板四个角点的模型,效果如下:
所以我想请教请教一下如何将我现在的数据集标注格式转化为您使用的coco格式,然后因为我们需要同时识别数字和颜色,所以我想将颜色和数字解耦,就是在head中增添一个1x1conv来单独输出颜色,之前以yolox为基础修改过,但是对于您的模型也不知道如何下手修改,最后就是我之前是通过openvino的c++接口来部署模型,所以不知道大佬能否提供c++实现后处理的相关思路,想参考一下,问的有点多,但还是希望大佬不吝赐:-)
FileNotFoundError: [Errno 2] No such file or directory: ../data/eval/mype . json'
大佬,是不是我处理数据的时候导致的问题,这个json文件要怎么生成
我按照readme执行到训练完,开始测试时,执行python evaluate.py出现报错问题Traceback (most recent call last):
File "evaluate.py", line 57, in
main(cfg)
File "evaluate.py", line 28, in main
data_loader = data.getEvalDataloader()
File "/movenet.pytorch-master/lib/data/data.py", line 159, in getEvalDataloader
with open(self.cfg['eval_label_path'], 'r') as f:
FileNotFoundError: [Errno 2] No such file or directory: '../data/eval/mypc.json'
我找了所有的文件夹都没有这个mypc.json文件
您好,您的模型结构主要是采取了哪篇论文,可以把论文共享一下吗,我想学习一下谢谢。
Hi fire717,感谢无私的分享。我看你在 dataset 中对 reg 不仅仅是在当前中心赋值,也在周围赋值,据你所说是为了加快收敛。但是在实际的 reg loss中,你只是用的当前中心的值,这是为什么呢?
代码如下:
def regsLoss(self, pred, target, cx0, cy0, kps_mask, batch_size, num_joints):
#[64, 14, 48, 48]
_dim0 = torch.arange(0,batch_size).long()
_dim1 = torch.zeros(batch_size).long()
loss = 0
for idx in range(num_joints):
gt_x = target[_dim0,_dim1+idx*2,cy0,cx0]
gt_y = target[_dim0,_dim1+idx*2+1,cy0,cx0]
pre_x = pred[_dim0,_dim1+idx*2,cy0,cx0]
pre_y = pred[_dim0,_dim1+idx*2+1,cy0,cx0]
loss+=self.l1(gt_x,pre_x,kps_mask[:,idx])
loss+=self.l1(gt_y,pre_y,kps_mask[:,idx])
return loss / num_joints
虽然已经2024年了,但是我仍然写了一个ncnn的例子,https://github.com/zhouweigogogo/movenet-ncnn
感谢fire717老师分享的代码实现和解析文章。我拜读之后感到受益匪浅。
这边对于mobilenet_v2的代码实现有一点细节方面的问题,不知道老师您能否为我解惑?
这是本repo对于InvertedResidual模块的实现:
def forward(self, x):
x = self.conv1(x)
for _ in range(self.n):
x = x + self.conv2(x)
return x
以下是参考torchvision中的mobilenet_v2实现中对应的部分:
# building inverted residual blocks
for t, c, n, s in inverted_residual_setting:
output_channel = _make_divisible(c * width_mult, round_nearest)
for i in range(n):
stride = s if i == 0 else 1
features.append(block(input_channel, output_channel, stride, expand_ratio=t, norm_layer=norm_layer))
input_channel = output_channel
我目前理解的两者之间的区别是,本实现版本中的conv2部分是网络结构相同且共用相同权重的InvertedResidual block,输入通过conv1后将通过(n-1)个权重相同的模块。而在torchvision的实现版本中,因为block会调用InvertedResidua的constructor,生成的是网络结构相同但不共用权重的InvertedResidualBlock。不知道我这样的理解是否正确?想请教一下原版本movenet的实现也采用了类似的设计吗?
movenet.pytorch/lib/task/task_tools.py
Line 261 in f248899
movenet.pytorch/lib/utils/metrics.py
Line 19 in f248899
希望能得到您的解释谢谢!
Hi, I am not able to match the pre-trained weights for the above model file from the output path. Could you guide me on how i can use the v3 model with its pre-trained weights. Thanks
movenet.pytorch/lib/data/data_tools.py
Lines 123 to 136 in bbc8140
您好,感谢您的分享,看了这段关于reg生成heatmap的代码,是在cx,cy周围(左与上边2个pixel;右与下边3个pixel)也附上reg值,但在不同位置时调整偏移是根据 img_size//4/2-1
,相当与图的一半。这样的操作导致比如 - cx,cy在图上半部分,reg值在cx,cy周围反而偏多或少了?
大神,请教一下,你是否有尝试过实现movenet模型的量化,比如在hub上提供的tensorflowjs 模型是半精度的即float16的,你这边是否有再进一步量化成uint8精度的?若有,请问是怎么操作的?效果如何?
谢谢作者的大量工作,现有代码中实现的pre(),是否考虑过放置在movenet model 中?
(现有输出onnx后移植到addroid app需要作较多工作,实现pre(...))
My target is implant the model to android app.
https://github.com/tensorflow/examples/tree/master/lite/examples/pose_estimation/android
请问下regs译码的时候,为何要给regs_origin+0.5再int32呢,我验证了一张图片,其编码时对于48*48而言,最后一个关键点的标签是H W =42 16,但译码以后变成了42 17,原因就在于cx cy是24 24,原regsx本来是-8,但+0.5再int32导致其出现偏移,即对于x<cx和y<cy的都会因此导致偏移, 请问这是有原因的还是错误
pth2onnx.py
mentions a weights file output/test/e100_valacc0.98349.pth
but this doesn't exist in the output folder. Could this please be made available?
Hi, currently i wanted to implement this into a real time using webcam. May i know is it possible? and how?
你好作者,非常感谢你的开源code。
背景:
我想修改识别的key points 的总数,因此在改动的时候,mirror 这个地方不甚了解
问题:
在修改到 data_augment 的时候,就发现,mirror 的方法 不太了解
该代码位于lib/data/data_augment.py
`
def Mirror(src, label=None):
"""
item = {
"img_name":save_name,
"keypoints":save_keypoints, relative position
"center":save_center,
"other_centers":other_centers,
"other_keypoints":other_keypoints,
}
# mirror 后左右手顺序就变了!
"""
keypoints = label['keypoints']
center = label['center']
other_centers = label['other_centers']
other_keypoints = label['other_keypoints']
img = cv2.flip(src, 1)
if label is None:
return img, label
for i in range(len(keypoints)):
if i % 3 == 0:
keypoints[i] = 1 - keypoints[i]
keypoints = [
keypoints[0], keypoints[1], keypoints[2],
keypoints[6], keypoints[7], keypoints[8],
keypoints[3], keypoints[4], keypoints[5],
keypoints[12], keypoints[13], keypoints[14],
keypoints[9], keypoints[10], keypoints[11],
keypoints[18], keypoints[19], keypoints[20],
keypoints[15], keypoints[16], keypoints[17],
keypoints[24], keypoints[25], keypoints[26],
keypoints[21], keypoints[22], keypoints[23],
keypoints[30], keypoints[31], keypoints[32],
keypoints[27], keypoints[28], keypoints[29],
keypoints[36], keypoints[37], keypoints[38],
keypoints[33], keypoints[34], keypoints[35],
keypoints[42], keypoints[43], keypoints[44],
keypoints[39], keypoints[40], keypoints[41],
keypoints[48], keypoints[49], keypoints[50],
keypoints[45], keypoints[46], keypoints[47]]
`
可以看到代码中keypoints 是写死的,但是这个规律没有找到。
如果我是 4个关键点 又或者 60 个关键点,他们都是左右分布的。我想问下,这个应该怎么修改,或者 原先是为啥要这样写的?
hey,i wanna you great pth like e100_valacc0.98349.pth,please help me。
請問一下 關於下面這行ground truth
movenet.pytorch/lib/task/task.py
Line 187 in 95ec853
請問為何需要經過decode??
我的理解是直接讀取jason file得到位置
但這邊跟predict一樣 似乎丟進model去decode
請問有特別的原因嗎? 這樣算loss時會不會有誤差?
謝謝
您好,您有对比过MoveNet和HRNet系列在coco上的精度对比吗?想做个基于骨骼关键点的步态识别,需要轻量化准确度比较好的姿态估计器,moveNet相比大型HRNet有差很多吗?
fire老兄,请问你了解multipose的后处理吗?通过观察官方的multipose模型,看起来后处理跟singlepose颇有不同,不过有一些细节还不是很明白。
movenet.pytorch/lib/task/task_tools.py
Lines 124 to 144 in 95ec853
您好,在prediction 阶段可以到 reg 进行了这样的一个处理, 也是TF的处理方式
48x48 根据x,y轴 以0到47的生成的坐标
x' = (rangeweight - reg)^2
tmp_reg = (x'+y')^0.5 + 1.8
keypoint_heatmap/tmp_reg -> 在取 maxpoint 的 reg_x, reg_y
这样的一个操作请问在训练哪部分体现呢?
另外,这样得到的reg_x, reg_y 和 reg heatmap 直接得到的坐标有什么区别呢?
多谢
edit:请问在微信交流一下其他细节方便吗?
我將config 的img_size 變更為384之後 先是發生 cuda memory 爆掉
RuntimeError: CUDA out of memory. Tried to allocate 82.00 MiB (GPU 0; 7.93 GiB total capacity; 6.47 GiB already allocated; 63.56 MiB free; 6.50 GiB reserved in total by PyTorch)
然後我變更batch size = 32之後發生下面error
File "/home/alan/Downloads/movenet.pytorch/lib/loss/movenet_loss.py", line 304, in maxPointPth
heatmap = heatmap*self.center_weight[:heatmap.shape[0],...]
RuntimeError: The size of tensor a (96) must match the size of tensor b (48) at non-singleton dimension 3
請問 img_size 能否加大 還是只能用192
加大後哪邊需要更改呢 感謝
这套codebase在coco17上根本无法复现acc90多的指标。可视化效果也很差。
Hii, i want to train movenet with more than 17 keypoints. so what changes i have to do for training.
你好,能否提供下移动端相关推理代码 如MNN之类的
Support for GPU-accelerated training on Apple Silicon is apparently only available as of PyTorch version 1.12
I also have other problems with running it, and before I spend hours on trying to fix it for my setup, did anybody already do it?
Sorry for already filing an issue, if there was a Q/A section I would have used that before.
thanks in advance
Thank you for creating this project. I am looking into fine-tuning the pretrained model and wanted to ask how I should approach this? Is it enough to just load the pretrained model in train.py
and use my custom dataset to adapt to? Or are there more steps involved?
HI,
While using the tensorflow api,
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
interpreter.set_tensor(input_details[0]['index'], np.array(input_image))
interpreter.invoke()
print( interpreter.get_tensor(output_details[0]['index']).shape)
The output is 1,1,17,3 that is 17 keypoints and their respective y-x, coordinates and confidence scores
I used the pretrained model in this repositories output folder called e91_valacc0.79763.pth
converted to onnx then to tf and finally to tflite. The only change I made in pth2onnx.py is opset_version=10
torch.onnx.export(run_task.model, dummy_input1, "output/pose.onnx",
verbose=True, input_names=input_names, output_names=output_names,
do_constant_folding=True,opset_version=10)
now i did the same for the tflite model- that is the first 5 lines of code.
The output shape is 1,34,48,48.
How can I obtain the output in the same format as that we receive while using the API?
作者大佬,能否提供您基于人体姿态的模型,作为预训练的基础模型?
这样当我们训练自己特定的数据,比如手部关键点识别,就可以在预训练的模型上再次训练,加快训练速度.
thanks a lot!!!
Hey Fire,
Thanks for the repository. It looks well made and to the point.
About the pre-trained models, are they the same ones that google has uploaded, converted into pytorch-compatible format or are they different ones? If they are different, what are they pre-trained on?
您好,这个movenet在coco2017验证集上的map多少啊?
我从九月到现在的心路历程和你的专栏里写的非常接近,但是苦于技术实力,两个礼拜了都没有复现成功,感谢作者的开源。
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.