Giter Club home page Giter Club logo

fidtm's Introduction

fidtm's People

Contributors

dk-liang avatar rawanlaz avatar rydenisbak avatar zeerizvee avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

fidtm's Issues

Testing issue

Hi,

Thank you for sharing the code. I tried to do a quick test after following all the data preprations. However, the output results are a bit strange IMG_1.jpg Gt 172.00 Pred 180049, specially in the count as you see below.

Am I missing something?

P.S: I am testing the model on CPU.

Best,

(pytorch_env) D:\Project1\FIDTM>python test.py --dataset ShanghaiA --pre ./model/ShanghaiA/model_best.pth --gpu_id 0
{'dataset': 'ShanghaiA', 'save_path': 'save_file/A_baseline', 'workers': 16, 'print_freq': 200, 'start_epoch': 0, 'epochs': 3000, 'pre': './model/ShanghaiA/model_best.pth', 'batch_size': 16, 'crop_size': 256, 'seed': 1, 'best_pred': 100000.0, 'gpu_id': '0', 'lr': 0.0001, 'weight_decay': 0.0005, 'preload_data': True, 'visual': False, 'video_path': None}
Using cpu
./model/ShanghaiA/model_best.pth
=> loading checkpoint './model/ShanghaiA/model_best.pth'
57.0989010989011 921
Pre_load dataset ......
begin test
IMG_1.jpg Gt 172.00 Pred 180049
IMG_10.jpg Gt 502.00 Pred 196417
IMG_100.jpg Gt 391.00 Pred 92455
IMG_101.jpg Gt 211.00 Pred 184704
IMG_102.jpg Gt 223.00 Pred 31672
IMG_103.jpg Gt 430.00 Pred 170330

image

RDTM is similar to Inverse k-Nearest Neighbor Maps

I think the proposed RDTM is almost the same with Inverse k-Nearest Neighbor Maps in [1], except the name and experiments for localization.

[1] Improving Dense Crowd Counting Convolutional Neural Networks using Inverse k-Nearest Neighbor Maps and Multiscale Upsampling

Real Time tracking / counting?

Did anyone try this using a camera for real time tracking? What's the FPS like? Is this implementation viable for a real time scenario or is the FPS really low?

Pred and GT are too different

Hi,thank you for your code, I tried to run test.py with the code you provided, but the results are quite different.
The details are as follows:

python test.py --test_dataset ShanghaiA --pre ./model/ShanghaiA/model_best_57.pth
{'train_dataset': 'ShanghaiA', 'task_id': 'save_file/A_baseline', 'workers': 16, 'print_freq': 200, 'start_epoch': 0, 'test_dataset': 'ShanghaiA', 'pre': './model/ShanghaiA/model_best_57.pth', 'batch_size': 16, 'seed': 1, 'best_pred': 100000.0, 'lr': '1e-4', 'prelo
ad_data': True, 'visual': False}
[2021-04-02 14:00:06] INFO (Networks.HR_Net.seg_hrnet/MainThread) => init weights from normal distribution
./model/ShanghaiA/model_best_57.pth
=> loading checkpoint './model/ShanghaiA/model_best_57.pth'
57.0989010989011 921
Pre_load dataset ......
begin test
args['task_id'] = save_file/A_baseline
IMG_1.jpg Gt 172.00 Pred 180049
IMG_10.jpg Gt 502.00 Pred 196417
IMG_100.jpg Gt 391.00 Pred 92455
IMG_101.jpg Gt 211.00 Pred 184704
IMG_102.jpg Gt 223.00 Pred 31672
IMG_103.jpg Gt 430.00 Pred 170330
IMG_104.jpg Gt 1175.00 Pred 174422
IMG_105.jpg Gt 265.00 Pred 169307

I don't know why this is, can you guide me.

HRNet vs. VGG-16

Thanks for your work and sharing the code. It seems like for all your experiments you're using the HRNet architecture, which is a much more advanced model compared to VGG-16 that is used in most other works. From my perspective it's hard to judge how much improvement comes from the loss function you introduce and how much comes from the backbone alone.

What are your thoughts on this? Did you also run experiments with a VGG-16 backbone?

Thanks,
Paul

Huge or normal loss while training?

image

I get these values for loss while training the model from scratch on the JHU dataset. It started from ~10,000 and it is dropping slowly but I feel that this number is not right. In the code the for training_baseline the loss used is just MSE and not the one purposed in the paper, shouldn't this loss be in the range of 0 and 1?

Training set image size

If I use my private data set for training, what are the requirements for image size?
I saw that the fidt_generate_xx.py treated image sizes differently for different data sets

ISSIM-Loss

Hello @dk-liang, thank you for the great work.

Can you provide ISSIM Loss validation value for the last epoch for SHA dataset?
I can do normalization the loss by several ways, knowledge about your validation value helps me choose the right one.
And I will know your loss balance between MSE and ISSIM.

1-SSIM 损失

梁老师,请问一下能提供文中提到的局部1-ssim损失的代码作参考么?只用全局mse训练感觉效果不理想。

模型在训练时损失很大

老师,您好!我在使用你的模型训练时,发现损失很大,第一轮loss高达几十万,随着训练轮次增加,loss依然好几千,请问一下是什么原因呢,谢谢!
1

Question on JHU dataset

Is there any bug in file fidt_generate_jhu.py? I'm trying to train this model on JHU dataset. I noticed that the size of img is different from fidt_map. I simply add a print in file image.py like:

import scipy.spatial
from PIL import Image
import scipy.io as io
import scipy
import numpy as np
import h5py
import cv2


def load_data_fidt(img_path, args, train=True):
    gt_path = img_path.replace('.jpg', '.h5').replace('images', 'gt_fidt_map_2048')
    img = Image.open(img_path).convert('RGB')

    while True:
        try:
            gt_file = h5py.File(gt_path)
            k = np.asarray(gt_file['kpoint'])
            fidt_map = np.asarray(gt_file['fidt_map'])
            break
        except OSError:
            print("path is wrong, can not load ", img_path)
            cv2.waitKey(1000)  # Wait a bit

    img = img.copy()
    fidt_map = fidt_map.copy()
    k = k.copy()
    print(img.size, fidt_map.shape) # here
    return img, fidt_map, k

The output shows some difference like:

(968, 681) (681, 968)
(1023, 575) (575, 1023)
(2048, 1365) (1365, 2048)
(1280, 720) (720, 1280)
(2048, 1356) (1356, 2048)
(852, 480) (512, 909) #difference
(2250, 1500) (1365, 2048) #difference
(2692, 3297) (2048, 1672) #difference
(1023, 575) (575, 1023)
(2000, 1115) (1115, 2000)
(3840, 2160) (1152, 2048) #difference
(1000, 600) (600, 1000)
(1637, 1070) (1070, 1637)
(653, 282) (512, 1186)
(1280, 853) (853, 1280)
(1200, 600) (600, 1200)

I try to train the model on ShanghaiA dataset and It works fine.

Experimental result problem

Hello, Mr. Liang, I am replicating your experimental results based on the I-SSIM loss function given by Denis Rybalchenko, but it seems that I can't get such good results on partA.

Unable to download the model files from Baidu

First, thanks for the amazing work. I cannot download the model files from Baidu since I am from outside China (cannot even sign-up it's all Chinese even after google translate).
Could you please upload the model files on Google Drive or share a direct link to download them.

Thanks !

ONNX giving wrong output

I've converted FIDTM model to onnx using following logic, but the output from onnx is wrong.

class MyNet(nn.Module):
    """Add maxpool for postprocessing"""

    def __init__(self):
        super().__init__()

    def forward(self, x):
        output = nn.functional.max_pool2d(x, (3, 3), stride=1, padding=1)
        return x, output

model = get_seg_model()
model = nn.Sequential(model, MyNet())
model = nn.DataParallel(model, device_ids=[0])
... load model weights (logic similar to video_demo.py)
batch_size = 1  # just take random number
dummy_input = torch.randn(batch_size, 3, 540, 960)
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print('Using', device)
dummy_input = dummy_input.to(device)
model.eval()
model = model.cuda()

torch.onnx.export(model.module,               # model being run
                dummy_input,                         # model input (or a tuple for multiple inputs)
                "crowd_fidtm_model.onnx",   # where to save the model (can be a file or file-like object)
                export_params=True,        # store the trained parameter weights inside the model file
                opset_version=11,          # the ONNX version to export the model to
                do_constant_folding=True,  # whether to execute constant folding for optimization
                input_names = ['input_1'],   # the model's input names
                output_names = ['output_1', 'output_2'], # the model's output names
                dynamic_axes={'input_1' : {0 : 'batch_size'},    # variable length axes
                                'output_1' : {0 : 'batch_size'},
                                'output_2' : {0 : 'batch_size'}})

But after loading this onnx model, the output is wrong.

In fact, the onnx model gives almost the same values for every input image. Is it happening due to if ... else blocks in the model? I'm not sure if model is getting converted correctly!

Training on custom dataset

Hello

If I want to train this model on my own custom dataset, I just need to change the files fidt_generate_xx.py, make_npydata.py and train_baseline.py?

Also, my dataset contains bounding boxes, but that shouldn't be a big issue right?

what's the difference between the proposed maps and small kernel-size gaussian maps ?

Thank you for your inspiring work. However, I don't understand the motivation of FIDTM maps. In visualization, the FIDTM you proposed seems to be similar to the traditional Gaussian map if you set the kernel size small enough. What is the difference between these two maps? Have you compared the counting performance or localization performance of these two maps?

Queries on custom dataset

Hello authors,

Thank you for the great work. I have the following queries in which I need your inputs, as I am trying to work on this crowd localization task.

May I check if there is an alternate backbone, say VGG has been released in this repository that can be used instead of the current backbone as it's slow while training a large number of images of my own custom dataset, or where do I refer it to

Let's say if my custom dataset having different scenes and of different resolution, can we just use the data preparation scripts (say NWPU) to generate the FIDT maps and supply in the training such that the longest size is not more than 2048

The loss function mentioned in the paper has been released already? Right now it's still MSE loss, like the general way

Also, is there any minimum validation loss to lookout to in general to decide the number of epochs or early stopping it

Thank you in advance for the response.

Testing & demo counts discrepancy

Hello,

I wanted to check if the way of processing the testing and demo video/image can rise to difference results for the same image and the same model? Let's say one of images in the dataset is picked along with it's h5 file and running it give a pred value and the same image fed to demo script gives out a different prediction count and the difference is quite huge

Anything that's being missed here or to take note of ?

Thank you!

Too many False positives

I tried to run the video demo code on a video and using the model mode_best_57.pth or the model model_best.pth I got so many false positives? what can I do to be able to run the model on other data? (note that I didn't scale down the frames in the video)
Screenshot from 2021-05-20 16-48-19

License

Thank you for this great work! I am curious this work and now trying to understand your paper. In my study, I will do some experiments with using your source codes. If possible, could you add license file to this repository? Thank you in advance!

About the training data set

My current data set is only marked with head bounding boxes. Can I take the center point of the head bounding box as the training set for your model? Consider this scenario:Bounding box A is partially covered by bounding box B. When the center point of bounding box A is taken, the center point is likely to fall on the head of the person inside bounding box B. Does this affect the training?
I look forward to your reply. Thank you

I-SSIM Loss

Sorry, i didn' t find your implementation of I-SSIM loss.

UCF-QNRF

您可以提供UCF-QNRF数据集的真值图转换代码吗?我尝试了很多方法,都无法正常转换

loss

Is there only MSE Loss in the code, but no ISMM Loss?
Will it affect my training?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.