Website: http://dk-liang.github.io/
Google Scholar: https://scholar.google.com/dk-liang
[IEEE TMM] Focal Inverse Distance Transform Maps for Crowd Localization
License: MIT License
Website: http://dk-liang.github.io/
Google Scholar: https://scholar.google.com/dk-liang
Hi,
Thank you for sharing the code. I tried to do a quick test after following all the data preprations. However, the output results are a bit strange IMG_1.jpg Gt 172.00 Pred 180049
, specially in the count as you see below.
Am I missing something?
P.S: I am testing the model on CPU.
Best,
(pytorch_env) D:\Project1\FIDTM>python test.py --dataset ShanghaiA --pre ./model/ShanghaiA/model_best.pth --gpu_id 0
{'dataset': 'ShanghaiA', 'save_path': 'save_file/A_baseline', 'workers': 16, 'print_freq': 200, 'start_epoch': 0, 'epochs': 3000, 'pre': './model/ShanghaiA/model_best.pth', 'batch_size': 16, 'crop_size': 256, 'seed': 1, 'best_pred': 100000.0, 'gpu_id': '0', 'lr': 0.0001, 'weight_decay': 0.0005, 'preload_data': True, 'visual': False, 'video_path': None}
Using cpu
./model/ShanghaiA/model_best.pth
=> loading checkpoint './model/ShanghaiA/model_best.pth'
57.0989010989011 921
Pre_load dataset ......
begin test
IMG_1.jpg Gt 172.00 Pred 180049
IMG_10.jpg Gt 502.00 Pred 196417
IMG_100.jpg Gt 391.00 Pred 92455
IMG_101.jpg Gt 211.00 Pred 184704
IMG_102.jpg Gt 223.00 Pred 31672
IMG_103.jpg Gt 430.00 Pred 170330
I think the proposed RDTM is almost the same with Inverse k-Nearest Neighbor Maps in [1], except the name and experiments for localization.
[1] Improving Dense Crowd Counting Convolutional Neural Networks using Inverse k-Nearest Neighbor Maps and Multiscale Upsampling
Did anyone try this using a camera for real time tracking? What's the FPS like? Is this implementation viable for a real time scenario or is the FPS really low?
Hi,thank you for your code, I tried to run test.py with the code you provided, but the results are quite different.
The details are as follows:
python test.py --test_dataset ShanghaiA --pre ./model/ShanghaiA/model_best_57.pth
{'train_dataset': 'ShanghaiA', 'task_id': 'save_file/A_baseline', 'workers': 16, 'print_freq': 200, 'start_epoch': 0, 'test_dataset': 'ShanghaiA', 'pre': './model/ShanghaiA/model_best_57.pth', 'batch_size': 16, 'seed': 1, 'best_pred': 100000.0, 'lr': '1e-4', 'prelo
ad_data': True, 'visual': False}
[2021-04-02 14:00:06] INFO (Networks.HR_Net.seg_hrnet/MainThread) => init weights from normal distribution
./model/ShanghaiA/model_best_57.pth
=> loading checkpoint './model/ShanghaiA/model_best_57.pth'
57.0989010989011 921
Pre_load dataset ......
begin test
args['task_id'] = save_file/A_baseline
IMG_1.jpg Gt 172.00 Pred 180049
IMG_10.jpg Gt 502.00 Pred 196417
IMG_100.jpg Gt 391.00 Pred 92455
IMG_101.jpg Gt 211.00 Pred 184704
IMG_102.jpg Gt 223.00 Pred 31672
IMG_103.jpg Gt 430.00 Pred 170330
IMG_104.jpg Gt 1175.00 Pred 174422
IMG_105.jpg Gt 265.00 Pred 169307
I don't know why this is, can you guide me.
Hi, thank you for you elegant code. But i'm still confused about how to generate bounding box.
How to train on NWPU-Crowd dataset? Can you provide data preprocessing script? thank you
Thanks for your work and sharing the code. It seems like for all your experiments you're using the HRNet architecture, which is a much more advanced model compared to VGG-16 that is used in most other works. From my perspective it's hard to judge how much improvement comes from the loss function you introduce and how much comes from the backbone alone.
What are your thoughts on this? Did you also run experiments with a VGG-16 backbone?
Thanks,
Paul
I get these values for loss while training the model from scratch on the JHU dataset. It started from ~10,000 and it is dropping slowly but I feel that this number is not right. In the code the for training_baseline the loss used is just MSE and not the one purposed in the paper, shouldn't this loss be in the range of 0 and 1?
If I use my private data set for training, what are the requirements for image size?
I saw that the fidt_generate_xx.py treated image sizes differently for different data sets
Hi
Is there a light weight version for crowd counting ?
Hello @dk-liang, thank you for the great work.
Can you provide ISSIM Loss validation value for the last epoch for SHA dataset?
I can do normalization the loss by several ways, knowledge about your validation value helps me choose the right one.
And I will know your loss balance between MSE and ISSIM.
梁老师,请问一下能提供文中提到的局部1-ssim损失的代码作参考么?只用全局mse训练感觉效果不理想。
Has anyone deployed FIDTM in C++
如题。
I have not found the I-SSIM loss in the code, and I just find the nn.MSE loss in the code. So, I want to know where the I-SSIM loss is. Thanks!!!
It seems that only a single hrnet is utilized to regress rdt_map end-to-end.
How to test a image without ground truth and FIDT
代码中好像用的就是L2 Loss,没有加上论文中的SSIM Loss
Is there any bug in file fidt_generate_jhu.py
? I'm trying to train this model on JHU dataset. I noticed that the size of img
is different from fidt_map
. I simply add a print
in file image.py
like:
import scipy.spatial
from PIL import Image
import scipy.io as io
import scipy
import numpy as np
import h5py
import cv2
def load_data_fidt(img_path, args, train=True):
gt_path = img_path.replace('.jpg', '.h5').replace('images', 'gt_fidt_map_2048')
img = Image.open(img_path).convert('RGB')
while True:
try:
gt_file = h5py.File(gt_path)
k = np.asarray(gt_file['kpoint'])
fidt_map = np.asarray(gt_file['fidt_map'])
break
except OSError:
print("path is wrong, can not load ", img_path)
cv2.waitKey(1000) # Wait a bit
img = img.copy()
fidt_map = fidt_map.copy()
k = k.copy()
print(img.size, fidt_map.shape) # here
return img, fidt_map, k
The output shows some difference like:
(968, 681) (681, 968)
(1023, 575) (575, 1023)
(2048, 1365) (1365, 2048)
(1280, 720) (720, 1280)
(2048, 1356) (1356, 2048)
(852, 480) (512, 909) #difference
(2250, 1500) (1365, 2048) #difference
(2692, 3297) (2048, 1672) #difference
(1023, 575) (575, 1023)
(2000, 1115) (1115, 2000)
(3840, 2160) (1152, 2048) #difference
(1000, 600) (600, 1000)
(1637, 1070) (1070, 1637)
(653, 282) (512, 1186)
(1280, 853) (853, 1280)
(1200, 600) (600, 1200)
I try to train the model on ShanghaiA dataset and It works fine.
Hello, Mr. Liang, I am replicating your experimental results based on the I-SSIM loss function given by Denis Rybalchenko, but it seems that I can't get such good results on partA.
First, thanks for the amazing work. I cannot download the model files from Baidu since I am from outside China (cannot even sign-up it's all Chinese even after google translate).
Could you please upload the model files on Google Drive or share a direct link to download them.
Thanks !
I've converted FIDTM model to onnx using following logic, but the output from onnx is wrong.
class MyNet(nn.Module):
"""Add maxpool for postprocessing"""
def __init__(self):
super().__init__()
def forward(self, x):
output = nn.functional.max_pool2d(x, (3, 3), stride=1, padding=1)
return x, output
model = get_seg_model()
model = nn.Sequential(model, MyNet())
model = nn.DataParallel(model, device_ids=[0])
... load model weights (logic similar to video_demo.py)
batch_size = 1 # just take random number
dummy_input = torch.randn(batch_size, 3, 540, 960)
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print('Using', device)
dummy_input = dummy_input.to(device)
model.eval()
model = model.cuda()
torch.onnx.export(model.module, # model being run
dummy_input, # model input (or a tuple for multiple inputs)
"crowd_fidtm_model.onnx", # where to save the model (can be a file or file-like object)
export_params=True, # store the trained parameter weights inside the model file
opset_version=11, # the ONNX version to export the model to
do_constant_folding=True, # whether to execute constant folding for optimization
input_names = ['input_1'], # the model's input names
output_names = ['output_1', 'output_2'], # the model's output names
dynamic_axes={'input_1' : {0 : 'batch_size'}, # variable length axes
'output_1' : {0 : 'batch_size'},
'output_2' : {0 : 'batch_size'}})
But after loading this onnx model, the output is wrong.
In fact, the onnx model gives almost the same values for every input image. Is it happening due to if ... else
blocks in the model? I'm not sure if model is getting converted correctly!
Hello
If I want to train this model on my own custom dataset, I just need to change the files fidt_generate_xx.py, make_npydata.py and train_baseline.py?
Also, my dataset contains bounding boxes, but that shouldn't be a big issue right?
Code of data preparation for nwpu dataset not provided at the link data
@RawanLaz @dk-liang @ZeeRizvee
Hi, thank you for the code, but I didn't find the Local-Maxima-Detection-Strategy (LMDS) in your code. Can you tell me where this algorithm is.
Hi there, thanks for this impressive work!
I look forward to your work
How to test a image without ground truth and FIDT
parameter of the computer:
GPU:1*1080Ti, 12GB
parameter of the train code:
crop_size:128*128
batch_size:16
Thank you for your inspiring work. However, I don't understand the motivation of FIDTM maps. In visualization, the FIDTM you proposed seems to be similar to the traditional Gaussian map if you set the kernel size small enough. What is the difference between these two maps? Have you compared the counting performance or localization performance of these two maps?
Hello authors,
Thank you for the great work. I have the following queries in which I need your inputs, as I am trying to work on this crowd localization task.
May I check if there is an alternate backbone, say VGG has been released in this repository that can be used instead of the current backbone as it's slow while training a large number of images of my own custom dataset, or where do I refer it to
Let's say if my custom dataset having different scenes and of different resolution, can we just use the data preparation scripts (say NWPU) to generate the FIDT maps and supply in the training such that the longest size is not more than 2048
The loss function mentioned in the paper has been released already? Right now it's still MSE loss, like the general way
Also, is there any minimum validation loss to lookout to in general to decide the number of epochs or early stopping it
Thank you in advance for the response.
Hrnet(pretrain). mse | MAE 66.049 | | MSE 105.703
how can we get the "xxxxx_gt.txt"?
such as "A_gt.txt" and "B_gt.txt ".
thanks!
Hello,
I wanted to check if the way of processing the testing and demo video/image can rise to difference results for the same image and the same model? Let's say one of images in the dataset is picked along with it's h5 file and running it give a pred value and the same image fed to demo script gives out a different prediction count and the difference is quite huge
Anything that's being missed here or to take note of ?
Thank you!
Thank you for this great work! I am curious this work and now trying to understand your paper. In my study, I will do some experiments with using your source codes. If possible, could you add license file to this repository? Thank you in advance!
My current data set is only marked with head bounding boxes. Can I take the center point of the head bounding box as the training set for your model? Consider this scenario:Bounding box A is partially covered by bounding box B. When the center point of bounding box A is taken, the center point is likely to fall on the head of the person inside bounding box B. Does this affect the training?
I look forward to your reply. Thank you
Sorry, i didn' t find your implementation of I-SSIM loss.
您可以提供UCF-QNRF数据集的真值图转换代码吗?我尝试了很多方法,都无法正常转换
Is there only MSE Loss in the code, but no ISMM Loss?
Will it affect my training?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.