dk-liang / fidtm Goto Github PK

View Code? Open in Web Editor NEW

168.0 6.0 41.0 25.67 MB

[IEEE TMM] Focal Inverse Distance Transform Maps for Crowd Localization

License: MIT License

Python 100.00%

crowd-counting crowd-localization crowd

fidtm's Introduction

Website: http://dk-liang.github.io/

Google Scholar: https://scholar.google.com/dk-liang

Top Repositories

Visitor count

fidtm's People

Contributors

Stargazers

Watchers

fidtm's Issues

Testing issue

Hi,

Thank you for sharing the code. I tried to do a quick test after following all the data preprations. However, the output results are a bit strange IMG_1.jpg Gt 172.00 Pred 180049, specially in the count as you see below.

Am I missing something?

P.S: I am testing the model on CPU.

Best,

(pytorch_env) D:\Project1\FIDTM>python test.py --dataset ShanghaiA --pre ./model/ShanghaiA/model_best.pth --gpu_id 0
{'dataset': 'ShanghaiA', 'save_path': 'save_file/A_baseline', 'workers': 16, 'print_freq': 200, 'start_epoch': 0, 'epochs': 3000, 'pre': './model/ShanghaiA/model_best.pth', 'batch_size': 16, 'crop_size': 256, 'seed': 1, 'best_pred': 100000.0, 'gpu_id': '0', 'lr': 0.0001, 'weight_decay': 0.0005, 'preload_data': True, 'visual': False, 'video_path': None}
Using cpu
./model/ShanghaiA/model_best.pth
=> loading checkpoint './model/ShanghaiA/model_best.pth'
57.0989010989011 921
Pre_load dataset ......
begin test
IMG_1.jpg Gt 172.00 Pred 180049
IMG_10.jpg Gt 502.00 Pred 196417
IMG_100.jpg Gt 391.00 Pred 92455
IMG_101.jpg Gt 211.00 Pred 184704
IMG_102.jpg Gt 223.00 Pred 31672
IMG_103.jpg Gt 430.00 Pred 170330

RDTM is similar to Inverse k-Nearest Neighbor Maps

I think the proposed RDTM is almost the same with Inverse k-Nearest Neighbor Maps in [1], except the name and experiments for localization.

[1] Improving Dense Crowd Counting Convolutional Neural Networks using Inverse k-Nearest Neighbor Maps and Multiscale Upsampling

Real Time tracking / counting?

Did anyone try this using a camera for real time tracking? What's the FPS like? Is this implementation viable for a real time scenario or is the FPS really low?

Pred and GT are too different

Hi,thank you for your code, I tried to run test.py with the code you provided, but the results are quite different.
The details are as follows:

python test.py --test_dataset ShanghaiA --pre ./model/ShanghaiA/model_best_57.pth
{'train_dataset': 'ShanghaiA', 'task_id': 'save_file/A_baseline', 'workers': 16, 'print_freq': 200, 'start_epoch': 0, 'test_dataset': 'ShanghaiA', 'pre': './model/ShanghaiA/model_best_57.pth', 'batch_size': 16, 'seed': 1, 'best_pred': 100000.0, 'lr': '1e-4', 'prelo
ad_data': True, 'visual': False}
[2021-04-02 14:00:06] INFO (Networks.HR_Net.seg_hrnet/MainThread) => init weights from normal distribution
./model/ShanghaiA/model_best_57.pth
=> loading checkpoint './model/ShanghaiA/model_best_57.pth'
57.0989010989011 921
Pre_load dataset ......
begin test
args['task_id'] = save_file/A_baseline
IMG_1.jpg Gt 172.00 Pred 180049
IMG_10.jpg Gt 502.00 Pred 196417
IMG_100.jpg Gt 391.00 Pred 92455
IMG_101.jpg Gt 211.00 Pred 184704
IMG_102.jpg Gt 223.00 Pred 31672
IMG_103.jpg Gt 430.00 Pred 170330
IMG_104.jpg Gt 1175.00 Pred 174422
IMG_105.jpg Gt 265.00 Pred 169307

I don't know why this is, can you guide me.

How to generate bounding box?

Hi, thank you for you elegant code. But i'm still confused about how to generate bounding box.

How to train on NWPU-Crowd dataset? Can you provide data preprocessing script? thank you

HRNet vs. VGG-16

Thanks for your work and sharing the code. It seems like for all your experiments you're using the HRNet architecture, which is a much more advanced model compared to VGG-16 that is used in most other works. From my perspective it's hard to judge how much improvement comes from the loss function you introduce and how much comes from the backbone alone.

What are your thoughts on this? Did you also run experiments with a VGG-16 backbone?

Thanks,
Paul

Huge or normal loss while training?

I get these values for loss while training the model from scratch on the JHU dataset. It started from ~10,000 and it is dropping slowly but I feel that this number is not right. In the code the for training_baseline the loss used is just MSE and not the one purposed in the paper, shouldn't this loss be in the range of 0 and 1?

Training set image size

If I use my private data set for training, what are the requirements for image size？
I saw that the fidt_generate_xx.py treated image sizes differently for different data sets

Is there a light weight version for crowd counting ?

ISSIM-Loss

Hello @dk-liang, thank you for the great work.

Can you provide ISSIM Loss validation value for the last epoch for SHA dataset?
I can do normalization the loss by several ways, knowledge about your validation value helps me choose the right one.
And I will know your loss balance between MSE and ISSIM.

1-SSIM 损失

梁老师，请问一下能提供文中提到的局部1-ssim损失的代码作参考么？只用全局mse训练感觉效果不理想。

Has anyone deployed FIDTM in C++

请问XX_pred_fidt.txt这个预测位置的文档怎么生成，代码里好像没见有

如题。

Hi, I just want to know where the I-SSIM loss is in your code?

I have not found the I-SSIM loss in the code, and I just find the nn.MSE loss in the code. So, I want to know where the I-SSIM loss is. Thanks!!!

Where is the implementations of "Segmentation Branch/Base Network/ Regression Branch"?

It seems that only a single hrnet is utilized to regress rdt_map end-to-end.

How to test a image without ground truth and FIDT

你好，代码似乎只用了L2 Loss没有用SSIM？

代码中好像用的就是L2 Loss，没有加上论文中的SSIM Loss

模型在训练时损失很大

老师，您好！我在使用你的模型训练时，发现损失很大，第一轮loss高达几十万，随着训练轮次增加，loss依然好几千，请问一下是什么原因呢，谢谢！

Question on JHU dataset

Is there any bug in file fidt_generate_jhu.py? I'm trying to train this model on JHU dataset. I noticed that the size of img is different from fidt_map. I simply add a print in file image.py like:

import scipy.spatial
from PIL import Image
import scipy.io as io
import scipy
import numpy as np
import h5py
import cv2


def load_data_fidt(img_path, args, train=True):
    gt_path = img_path.replace('.jpg', '.h5').replace('images', 'gt_fidt_map_2048')
    img = Image.open(img_path).convert('RGB')

    while True:
        try:
            gt_file = h5py.File(gt_path)
            k = np.asarray(gt_file['kpoint'])
            fidt_map = np.asarray(gt_file['fidt_map'])
            break
        except OSError:
            print("path is wrong, can not load ", img_path)
            cv2.waitKey(1000)  # Wait a bit

    img = img.copy()
    fidt_map = fidt_map.copy()
    k = k.copy()
    print(img.size, fidt_map.shape) # here
    return img, fidt_map, k

The output shows some difference like:

(968, 681) (681, 968)
(1023, 575) (575, 1023)
(2048, 1365) (1365, 2048)
(1280, 720) (720, 1280)
(2048, 1356) (1356, 2048)
(852, 480) (512, 909) #difference
(2250, 1500) (1365, 2048) #difference
(2692, 3297) (2048, 1672) #difference
(1023, 575) (575, 1023)
(2000, 1115) (1115, 2000)
(3840, 2160) (1152, 2048) #difference
(1000, 600) (600, 1000)
(1637, 1070) (1070, 1637)
(653, 282) (512, 1186)
(1280, 853) (853, 1280)
(1200, 600) (600, 1200)

I try to train the model on ShanghaiA dataset and It works fine.

Experimental result problem

Hello, Mr. Liang, I am replicating your experimental results based on the I-SSIM loss function given by Denis Rybalchenko, but it seems that I can't get such good results on partA.

Unable to download the model files from Baidu

First, thanks for the amazing work. I cannot download the model files from Baidu since I am from outside China (cannot even sign-up it's all Chinese even after google translate).
Could you please upload the model files on Google Drive or share a direct link to download them.

Thanks !

ONNX giving wrong output

I've converted FIDTM model to onnx using following logic, but the output from onnx is wrong.

class MyNet(nn.Module):
    """Add maxpool for postprocessing"""

    def __init__(self):
        super().__init__()

    def forward(self, x):
        output = nn.functional.max_pool2d(x, (3, 3), stride=1, padding=1)
        return x, output

model = get_seg_model()
model = nn.Sequential(model, MyNet())
model = nn.DataParallel(model, device_ids=[0])
... load model weights (logic similar to video_demo.py)
batch_size = 1  # just take random number
dummy_input = torch.randn(batch_size, 3, 540, 960)
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print('Using', device)
dummy_input = dummy_input.to(device)
model.eval()
model = model.cuda()

torch.onnx.export(model.module,               # model being run
                dummy_input,                         # model input (or a tuple for multiple inputs)
                "crowd_fidtm_model.onnx",   # where to save the model (can be a file or file-like object)
                export_params=True,        # store the trained parameter weights inside the model file
                opset_version=11,          # the ONNX version to export the model to
                do_constant_folding=True,  # whether to execute constant folding for optimization
                input_names = ['input_1'],   # the model's input names
                output_names = ['output_1', 'output_2'], # the model's output names
                dynamic_axes={'input_1' : {0 : 'batch_size'},    # variable length axes
                                'output_1' : {0 : 'batch_size'},
                                'output_2' : {0 : 'batch_size'}})

But after loading this onnx model, the output is wrong.

In fact, the onnx model gives almost the same values for every input image. Is it happening due to if ... else blocks in the model? I'm not sure if model is getting converted correctly!

Training on custom dataset

Hello

If I want to train this model on my own custom dataset, I just need to change the files fidt_generate_xx.py, make_npydata.py and train_baseline.py?

Also, my dataset contains bounding boxes, but that shouldn't be a big issue right?

Code of data preparation for nwpu dataset not provided

Code of data preparation for nwpu dataset not provided at the link data @RawanLaz @dk-liang @ZeeRizvee

where the Local-Maxima-Detection-Strategy (LMDS) in your code

Hi, thank you for the code, but I didn't find the Local-Maxima-Detection-Strategy (LMDS) in your code. Can you tell me where this algorithm is.

thanks for this impressive work!

Hi there, thanks for this impressive work!
I look forward to your work

请问可以提供下 nwpu 的generate代码吗

How to test a image without ground truth and FIDT

It takes 24 hours to train 10 epoch, about 246,080 pictures, is there something wrong, or it does so?

parameter of the computer:
GPU:1*1080Ti, 12GB

parameter of the train code:
crop_size:128*128
batch_size:16

mat怎么生成？

what's the difference between the proposed maps and small kernel-size gaussian maps ?

Thank you for your inspiring work. However, I don't understand the motivation of FIDTM maps. In visualization, the FIDTM you proposed seems to be similar to the traditional Gaussian map if you set the kernel size small enough. What is the difference between these two maps? Have you compared the counting performance or localization performance of these two maps?

Queries on custom dataset

Hello authors,

Thank you for the great work. I have the following queries in which I need your inputs, as I am trying to work on this crowd localization task.

May I check if there is an alternate backbone, say VGG has been released in this repository that can be used instead of the current backbone as it's slow while training a large number of images of my own custom dataset, or where do I refer it to

Let's say if my custom dataset having different scenes and of different resolution, can we just use the data preparation scripts (say NWPU) to generate the FIDT maps and supply in the training such that the longest size is not more than 2048

The loss function mentioned in the paper has been released already? Right now it's still MSE loss, like the general way

Also, is there any minimum validation loss to lookout to in general to decide the number of epochs or early stopping it

Thank you in advance for the response.

视频测试出现问题

您好，我在测试您给的视频数据时出现异常。

sha initail lr is 1e-4 and crop size is 256x256? can not reproduce the res when i train from scatch

Hrnet(pretrain). mse | MAE 66.049 | | MSE 105.703

about "the txt file in './local_eval/point_files/',"

how can we get the "xxxxx_gt.txt"?
such as "A_gt.txt" and "B_gt.txt ".

thanks!

Testing & demo counts discrepancy

Hello,

I wanted to check if the way of processing the testing and demo video/image can rise to difference results for the same image and the same model? Let's say one of images in the dataset is picked along with it's h5 file and running it give a pred value and the same image fed to demo script gives out a different prediction count and the difference is quite huge

Anything that's being missed here or to take note of ?

Thank you!

Too many False positives

I tried to run the video demo code on a video and using the model mode_best_57.pth or the model model_best.pth I got so many false positives? what can I do to be able to run the model on other data? (note that I didn't scale down the frames in the video)

License

Thank you for this great work! I am curious this work and now trying to understand your paper. In my study, I will do some experiments with using your source codes. If possible, could you add license file to this repository? Thank you in advance!

About the training data set

My current data set is only marked with head bounding boxes. Can I take the center point of the head bounding box as the training set for your model? Consider this scenario：Bounding box A is partially covered by bounding box B. When the center point of bounding box A is taken, the center point is likely to fall on the head of the person inside bounding box B. Does this affect the training？
I look forward to your reply. Thank you

dk-liang / fidtm Goto Github PK

fidtm's Introduction

Top Repositories

fidtm's People

Contributors

Stargazers

Watchers

Forkers

fidtm's Issues

Recommend Projects

Recommend Topics

Recommend Org