Giter Club home page Giter Club logo

lfd-a-light-and-fast-detector's Introduction

Update History

  • 2021.06.25 Some new experiments are added for your reference.
  • 2021.03.30 We maintain an Experiments wiki page to show ablation studies for your reference. Maybe these experiments are valuable for you to make proper decisions.
  • 2021.03.16 INT8 inference is updated. Check timing_inference_latency.py and predict_tensorrt.py for reference.
  • 2021.03.09 LFD now is formally released!!! Any questions and problems are welcome.

1. Introduction

In this repo, we release a new One-Stage Anchor-Free Detector named LFD. LFD completely surpasses the previous LFFD in most aspects. We are trying to make object detection easier, explainable and more applicable. With LFD, you are able to train and deploy a desired model without all the bells and whistles. Eventually, we hope LFD can be as popular as YOLO series for the industrial community in the future.

1.1 New Features

Compared to LFFD, LFD has the following features:

  • implemented using PyTorch, which is friendly for most guys (LFFD is implemented using MXNet)
  • support multi-class detection rather than single-class detection (LFFD is only for single-class)
  • higher precision and lower inference latency
  • we maintain a wiki (highly recommended) for you to fully understand LFD and master the code
  • the performance of LFD has been proved in many real-world applications
  • create your desired models with satisfactory model size and inference latency, train from scratch on your own datasets,

1.2 Performance Highlights

Before dive into the code, we present some performance results on two datasets, including precision and inference latency.

Dataset 1: WIDERFACE (single-class)

Accuracy on val under the SIO evaluation schema proposed in LFFD
Model Version Easy Set Medium Set Hard Set
v2 0.875 0.863 0.754
WIDERFACE-L 0.887 0.896 0.863
WIDERFACE-M 0.874 0.888 0.855
WIDERFACE-S 0.873 0.885 0.849
WIDERFACE-XS 0.866 0.877 0.839
  • v2 is from LFFD, you can check it in LFFD repo.
  • for fairy comparison, WIDERFACE-L/M/S/XS have the similar detection range with v2, namely [4, 320] vs [10, 320], but WIDERFACE-L/M/S/XS have different network structures.
  • great improvement on Hard Set.
Inference latency

Platform: RTX 2080Ti, CUDA 10.2, CUDNN 8.0.4, TensorRT 7.2.2.3

  • batchsize=1, weight precision mode=FP32
Model Version 640×480 1280×720 1920×1080 3840×2160
v2 2.12ms(472.04FPS) 5.02ms(199.10FPS) 10.80ms(92.63FPS) 42.41ms(23.58FPS)
WIDERFACE-L 2.67ms(374.19FPS) 6.31ms(158.38FPS) 13.51ms(74.04FPS) 94.61ms(10.57FPS)
WIDERFACE-M 2.47ms(404.23FPS) 5.70ms(175.38FPS) 12.28ms(81.43FPS) 87.90ms(11.38FPS)
WIDERFACE-S 1.82ms(548.42FPS) 3.57ms(280.00FPS) 7.35ms(136.02FPS) 27.93ms(35.81FPS)
WIDERFACE-XS 1.58ms(633.06FPS) 3.03ms(330.36FPS) 6.14ms(163.00FPS) 23.26ms(43.00FPS)

the results of v2 is directly get from LFFD, the Platform condition is slightly different from here.

  • batchsize=1, weight precision mode=FP16
Model Version 640×480 1280×720 1920×1080 3840×2160
WIDERFACE-L 1.68ms(594.12FPS) 3.69ms(270.78FPS) 7.66ms(130.51FPS) 28.65ms(34.90FPS)
WIDERFACE-M 1.61ms(622.42FPS) 3.51ms(285.13FPS) 7.31ms(136.79FPS) 27.32ms(36.60FPS)
WIDERFACE-S 1.26ms(793.97FPS) 2.39ms(418.68FPS) 4.88ms(205.09FPS) 18.46ms(54.18FPS)
WIDERFACE-XS 1.23ms(813.01FPS) 2.18ms(459.17FPS) 4.57ms(218.62FPS) 17.35ms(57.65FPS)

It can be observed that FP16 mode is evidently faster than FP32 mode. So in deployment, FP16 is highly recommended if possible.

  • batchsize=1, weight precision mode=INT8
Model Version 640×480 1280×720 1920×1080 3840×2160
WIDERFACE-L 1.50ms(667.95FPS) 3.24ms(308.43FPS) 6.83ms(146.41FPS) -ms(-FPS)
WIDERFACE-M 1.45ms(689.00FPS) 3.15ms(317.60FPS) 6.61ms(151.20FPS) -ms(-FPS)
WIDERFACE-S 1.17ms(855.29FPS) 2.14ms(466.86FPS) 4.40ms(227.18FPS) -ms(-FPS)
WIDERFACE-XS 1.09ms(920.91FPS) 2.03ms(493.54FPS) 4.11ms(243.15FPS) -ms(-FPS)

CAUTION: - means results are not available due to out of memory while calibrating

Dataset 2: TT100K (multi-class----45 classes)

Precision&Recall on test set of TT100K[1]
Model Version Precision Recall
FastRCNN in [1] 0.5014 0.5554
Method proposed in [1] 0.8773 0.9065
LFD_L 0.9205 0.9129
LFD_S 0.9202 0.9042

We use only train split (6105 images) for model training, and test our models on test split (3071 images). In [1], authors extended the training set: Classes with between 100 and 1000 instances in the training set were augmented to give them 1000 instances. But the augmented data is not released. That means we use much less data than [1] used for training. However, as you can see, our models can still achieve better performance. Precision&Recall results of [1] can be found in it's released code folder: code/results/report_xxxx.txt.

Inference latency

Platform: RTX 2080Ti, CUDA 10.2, CUDNN 8.0.4, TensorRT 7.2.2.3

  • batchsize=1, weight precision mode=FP32
Model Version 1280×720 1920×1080 3840×2160
LFD_L 9.87ms(101.35FPS) 21.56ms(46.38FPS) 166.66ms(6.00FPS)
LFD_S 4.31ms(232.27FPS) 8.96ms(111.64FPS) 34.01ms(29.36FPS)
  • batchsize=1, weight precision mode=FP16
Model Version 1280×720 1920×1080 3840×2160
LFD_L 6.28ms(159.27FPS) 13.09ms(76.38FPS) 49.79ms(20.09FPS)
LFD_S 3.03ms(329.68FPS) 6.27ms(159.54FPS) 23.41ms(42.72FPS)
  • batchsize=1, weight precision mode=INT8
Model Version 1280×720 1920×1080 3840×2160
LFD_L 5.96ms(167.89FPS) 12.68ms(78.86FPS) -ms(-FPS)
LFD_S 2.90ms(345.33FPS) 5.89ms(169.86FPS) -ms(-FPS)

CAUTION: - means results are not available due to out of memory while calibrating

2. Get Started

2.1 Install

Prerequirements

  • python => 3.6
  • albumentations => 0.4.6
  • torch => 1.5
  • torchvision => 0.6.0
  • cv2 => 4.0
  • numpy => 1.16
  • pycocotools => 2.0.1
  • pycuda = 2020.1
  • tensorrt = 7.2.2.3 (corresponding cudnn = 8.0)

All above versions are tested, newer versions may work as well but not fully tested.

Build Internal Libs

In the repo root, run the code below:

python setup.py build_ext

Once successful, you will see: ----> build and copy successfully!

if you want to know what libs are built and where they are copied, you can read the file setup.py.

Build External Libs

  • Build libjpeg-turbo
    1. download the source code v2.0.5

    2. decompress and compile:

      cd [source code]
      mkdir build
      cd build
      cmake ..
      make

      make sure that cmake configuration properly

    3. copy build/libturbojpeg.so.x.x.x to lfd/data_pipeline/dataset/utils/libs

Add PYTHONPATH

The last step is to add the repo root to PYTHONPATH. You have two choices:

  1. permanent way: append export PYTHONPATH=[repo root]:$PYTHONPATH to the file ~/.bashrc
  2. temporal way: whenever you want to code with the repo, add the following code ahead:
    1. import sys
    2. sys.path.append('path to the repo')

Until now, the repo is ready for use. By the way, we do not install the repo to the default python libs location (like /python3.x/site-packages/) for easily modification and development.

Docker Installation

please check here for more details, thanks to @ashuezy.

2.2 Play with the code

We present the details of how to use the code in two specific tasks.

Besides, we describe the structure of code in wiki.

Acknowledgement

Citation

If you find the repo is useful, please cite the repo website directly.

lfd-a-light-and-fast-detector's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

lfd-a-light-and-fast-detector's Issues

errors when run ~/WIDERFACE_train/predict.py

pytorch=1.7.1 cuda=11.2 GeForce RTX 3080
when i run ~/WIDERFACE_train/predict.py, the error occurs:

File "predict.py", line 22, in
results = config_dict['model'].predict_for_single_image(image, aug_pipeline=simple_widerface_val_pipeline, classification_threshold=0.5, nms_threshold=0.3)
File "/root/zkx/LFD-A-Light-and-Fast-Detector-master/lfd/model/lfd.py", line 634, in predict_for_single_image
nms_bboxes, nms_labels = multiclass_nms(
File "/root/zkx/LFD-A-Light-and-Fast-Detector-master/lfd/model/utils/nms.py", line 214, in multiclass_nms
dets, keep = batched_nms(bboxes, scores, labels, nms_cfg)
File "/root/zkx/LFD-A-Light-and-Fast-Detector-master/lfd/model/utils/nms.py", line 153, in batched_nms
nms_bboxes, kept_indexes = nms_op(torch.cat([bboxes_for_nms, scores[:, None]], -1), **nms_cfg_)
File "/root/zkx/LFD-A-Light-and-Fast-Detector-master/lfd/model/utils/nms.py", line 53, in nms
inds = nms_ext.nms(dets_th, iou_thr)
RuntimeError: CUDA error: no kernel image is available for execution on the device

Could u help? Thank you very much!

Warning messages that appear when I use the model to make predictions

Hello, when I use the model for prediction, I find that this warning always appears, but it does not affect the final result. May I ask how should I deal with this warning, or do I not need to deal with it:
[WARNING: ResNet pretrained weights load] unexpected keys:
_neck.neck0.0.weight _neck.neck0.1.weight _neck.neck0.1.bias _neck.neck0.1.running_mean _neck.neck0.1.running_var _neck.neck0.1.num_batches_tracked _neck.neck1.0.weight _neck.neck1.1.weight _neck.neck1.1.bias _neck.neck1.1.running_mean _neck.neck1.1.running_var _neck.neck1.1.num_batches_tracked _neck.neck2.0.weight _neck.neck2.1.weight _neck.neck2.1.bias _neck.neck2.1.running_mean _neck.neck2.1.running_var _neck.neck2.1.num_batches_tracked _neck.neck3.0.weight _neck.neck3.1.weight _neck.neck3.1.bias _neck.neck3.1.running_mean _neck.neck3.1.running_var _neck.neck3.1.num_batches_tracked _neck.neck4.0.weight _neck.neck4.1.weight _neck.neck4.1.bias _neck.neck4.1.running_mean _neck.neck4.1.running_var _neck.neck4.1.num_batches_tracked _head._scales.0._scale _head._scales.1._scale _head._scales.2._scale _head._scales.3._scale _head._scales.4._scale _head.head0_classification_path.0.weight _head.head0_classification_path.0.bias _head.head0_regression_path.0.weight _head.head0_regression_path.0.bias _head.head0_merge_path.0.weight _head.head0_merge_path.1.weight _head.head0_merge_path.1.bias _head.head0_merge_path.3.weight _head.head0_merge_path.4.weight _head.head0_merge_path.4.bias _head.head1_classification_path.0.weight _head.head1_classification_path.0.bias _head.head1_regression_path.0.weight _head.head1_regression_path.0.bias _head.head1_merge_path.0.weight _head.head1_merge_path.1.weight _head.head1_merge_path.1.bias _head.head1_merge_path.3.weight _head.head1_merge_path.4.weight _head.head1_merge_path.4.bias _head.head2_classification_path.0.weight _head.head2_classification_path.0.bias _head.head2_regression_path.0.weight _head.head2_regression_path.0.bias _head.head2_merge_path.0.weight _head.head2_merge_path.1.weight _head.head2_merge_path.1.bias _head.head2_merge_path.3.weight _head.head2_merge_path.4.weight _head.head2_merge_path.4.bias _head.head3_classification_path.0.weight _head.head3_classification_path.0.bias _head.head3_regression_path.0.weight _head.head3_regression_path.0.bias _head.head3_merge_path.0.weight _head.head3_merge_path.1.weight _head.head3_merge_path.1.bias _head.head3_merge_path.3.weight _head.head3_merge_path.4.weight _head.head3_merge_path.4.bias _head.head4_classification_path.0.weight _head.head4_classification_path.0.bias _head.head4_regression_path.0.weight _head.head4_regression_path.0.bias _head.head4_merge_path.0.weight _head.head4_merge_path.1.weight _head.head4_merge_path.1.bias _head.head4_merge_path.3.weight _head.head4_merge_path.4.weight _head.head4_merge_path.4.bias

Accuracy Discrepancy

I have tried to get the result with WIDERFACE evaluation and get the result for LFD_L — Large

Below is the score
==================== Results ====================
Easy Val AP: 0.5960946420764421
Medium Val AP: 0.6995630658106545
Hard Val AP: 0.7579042623709009

These values are quite different from the results given on the repo.

Am I missing something?

Are negative samples required?

Most detection algorithms don't require negative samples because everything outside bounding box regions is considered "negative". However, looking at some of the example datasets for LFD, generating negative samples explicitly seems to be part of the training data generation process. Can you provide some guidance on the proper way to put together a dataset for training the detector on a set of objects?

Can not detect large objects problem

Hi, I found that If I set crop_size to 400, then I can not detect large objects. even I large crop_size to 600 later, still can not detect large objects.

Do u know why?

How to improve your hard set quality ?

You mentioned that LFFD v2 greate improvement on Hard Set. And i want know that:

  • What is different between V2 and V1(tips, trick about training process, backbone, data,..) to get this quality ?
    Thank you so much ! Nice research

BUG no#1 RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED

root@1192704b450d:/opt/github/LFD-A-Light-and-Fast-Detector/WIDERFACE_train# python3 predict.py 
<class 'lfd.model.lfd.LFD'>
Traceback (most recent call last):
  File "predict.py", line 26, in <module>
    results = config_dict['model'].predict_for_single_image(image, aug_pipeline=simple_widerface_val_pipeline, classification_threshold=0.5, nms_threshold=0.3)
  File "../lfd/model/lfd.py", line 553, in predict_for_single_image
    predicted_classification, predicted_regression = self.forward(data_batch)
  File "../lfd/model/lfd.py", line 493, in forward
    backbone_outputs = self._backbone(x)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "../lfd/model/backbone/lfd_resnet.py", line 479, in forward
    x = self._stem(x)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/container.py", line 119, in forward
    input = module(input)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/conv.py", line 399, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/conv.py", line 396, in _conv_forward
    self.padding, self.dilation, self.groups)
RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED

Too many FP results with classification_threshold=0.01

Hi, thank you very much for your brillant work.
In WIDERFACE_train/evaluation.py ,classification_threshold was set to 0.01, if I do evaluation with this value, I can get the extact mAP 0.866 | 0.877 | 0.839 as you mentioned in README, but it seems there are to many FP results. You may check the following image for reference.
res
In WIDERFACE_train/predict.py, classification_threshold was set to 0.5, then it can get right result, but when I use this value to do evaluation with LS model, it can only get mAP 0.812/0.794/0.637.
So I think maybe mAP results in LFD for WIDERFACE are reached with an improper parameter. I would be appreciate if you can save image results in evaluation.py with classification_threshold=0.01 and check those results.
Looking forward to your reply.

Build/Compile libjpeg-turbo

Win10-x64, powershell

I'm trying to use WIDERFACE predict.py, but i have the following error:

ImportError: cannot import name 'nms_ext' from 'lfd.model.utils.libs'

I think this is because of libjpeg-turbo.

Can anyone help me on how to build and compile libjpeg-turbo on windows?

当我构建TensorRT时遇到的报错

你好,我在构建TensorRT时,运行了以下命令
python predict_tensorrt.py
遇到了以下的报错
......
[TensorRT] VERBOSE: Layer: 501 copy Weights: 0 HostPersistent: 0 DevicePersistent: 0
[TensorRT] VERBOSE: Layer: 539 copy Weights: 0 HostPersistent: 0 DevicePersistent: 0
[TensorRT] VERBOSE: Layer: 577 copy Weights: 0 HostPersistent: 0 DevicePersistent: 0
[TensorRT] VERBOSE: Layer: 615 copy Weights: 0 HostPersistent: 0 DevicePersistent: 0
[TensorRT] VERBOSE: Layer: 653 copy Weights: 0 HostPersistent: 0 DevicePersistent: 0
[TensorRT] VERBOSE: Layer: 520 copy Weights: 0 HostPersistent: 0 DevicePersistent: 0
[TensorRT] VERBOSE: Layer: 558 copy Weights: 0 HostPersistent: 0 DevicePersistent: 0
[TensorRT] VERBOSE: Layer: 596 copy Weights: 0 HostPersistent: 0 DevicePersistent: 0
[TensorRT] VERBOSE: Layer: 634 copy Weights: 0 HostPersistent: 0 DevicePersistent: 0
[TensorRT] VERBOSE: Layer: 672 copy Weights: 0 HostPersistent: 0 DevicePersistent: 0
[TensorRT] VERBOSE: Total Host Persistent Memory: 153184
[TensorRT] VERBOSE: Total Device Persistent Memory: 12235264
[TensorRT] VERBOSE: Total Weight Memory: 0
[TensorRT] VERBOSE: Builder timing cache: created 94 entries, 231 hit(s)
[TensorRT] VERBOSE: Engine generation completed in 5.63893 seconds.
[TensorRT] VERBOSE: Calculating Maxima
[TensorRT] INFO: Starting Calibration.
[TensorRT] INFO: Post Processing Calibration data in 9.89e-07 seconds.
Engine build unsuccessfully!
Traceback (most recent call last):
File "predict_tensorrt.py", line 110, in
int8_calibrator=int8_calibrator)
AssertionError

LFD-resnet

what's the difference between lfd-resnet and resnet?

Ubantu , Centos or Windows?

hello, thank you for your work!
Could you tell me your system? When I install pycuda on Centos, there are lots of error about installing, like this:
致命错误:boost/numeric/conversion/detail/preprocessed/numeric_cast_traits_common.hpp:没有那个文件或目录
when I search on the internet, it seems that there is no pycuda for the Centos.
I am confused, so can you tell me you operating system or add it in the wiki?
Thank you very much!

Error: onnx2tensorRT in NVIDIA TX2

Assertion failed: !isDynamic(tensor_ptr->getDimensions()) && "InstanceNormalization does not support dynamic inputs!".

do u have a solution ?

Output the Quality Focal Loss

To use the quality focal loss did you passed to it the quality score like [0.1 0.6 0.2 0.4] or you passed the onehot from the target like [0 1 0 0] ?

Tensorrt error on second and subsequent execute_async calls

Hello,

I'm using a small modification of predict_tensorrt.py from the WIDERFACE example (on my own data), which works fine on my x64 linux PC. However, I'm having an issue on jetson xavier. After the first execute_async call, I'm getting the below error on subsequent calls:

python predict_tensorrt.py
Engine info:
	max batch size:  1
	device memory_size:  2417152
before memcpy_htod_async
after memcpy_htod_async
before memcpy_dtod_async
after memcpy_dtod_async
after stream.synchronize()
before memcpy_htod_async
after memcpy_htod_async
[TensorRT] ERROR: 1: [reformat.cu::NCHWToNCQHW4::826] Error Code 1: Cuda Runtime (invalid resource handle)
before memcpy_dtod_async
after memcpy_dtod_async
after stream.synchronize()
before memcpy_htod_async
after memcpy_htod_async
[TensorRT] ERROR: 1: [reformat.cu::NCHWToNCQHW4::826] Error Code 1: Cuda Runtime (invalid resource handle)
before memcpy_dtod_async
after memcpy_dtod_async
after stream.synchronize()

Code from lfd/model/lfd.py

        print("before memcpy_htod_async")
        [cuda.memcpy_htod_async(inp.device, inp.host, stream) for inp in input_buffers]
        print("after memcpy_htod_async")
        tensorrt_engine_context.execute_async(batch_size=1, bindings=bindings, stream_handle=stream.handle)
        print("before memcpy_dtod_async")
        [cuda.memcpy_dtoh_async(out.host, out.device, stream) for out in output_buffers]
        print("after memcpy_dtod_async")
        stream.synchronize()
        print("after stream.synchronize()")

Any insight helpful.

Thanks,
Marc

Errors when running prediect.py

Win10-x64, Pycharm
Errors in line 455 of turbojpeg.py:

lib_path = os.path.join(os.path.dirname(__file__), 'libs')
libturtojpeg_path_list = [os.path.join(lib_path, file_name) for file_name in os.listdir(lib_path) if file_name.lower().startswith('libturbojpeg')]

# multiple paths may be got, choose the one with the longest chars
target_libturbojpeg_path = max(libturtojpeg_path_list)
turbojpeg = TurboJPEG(lib_path=target_libturbojpeg_path)

error message:
_

target_libturbojpeg_path = max(libturtojpeg_path_list)
ValueError: max() arg is an empty sequence

_
It seems like the lack of some files?

[BUG] RandomBBoxCropRegionSampler can not handle image size smaller than 512

/lfd/data_pipeline/sampler/region_sampler.py", line 300, in crop_from_image
    image[max(0, crop_y):min(im_h, crop_h + crop_y), max(0, crop_x):min(im_w, crop_w + crop_x)]
ValueError: could not broadcast input array from shape (0,400,3) into shape (191,400,3)


Default behavior seems using (crop_size, crop_size) as target size, if set crop_size to 512, it will got this error fianlly (here I set to 400).

Does this a bug?

Does gen_neg images important?

How's this part contribute to the whole model performance? If we dont generate neg images, how many points will drop in terms of precision and recall?

在3090显卡 CUDA11.0 环境下编译的问题

您好,我的显卡是3090,用的是CUDA11.0,在执行 python3 setup.py build_ext 的时候遇到问题:

/home/dddzz/worksapce/Codes/FaceDetection2/LFD-A-Light-and-Fast-Detector-master/venv/lib/python3.8/site-packages/torch/cuda/init.py:104: UserWarning:

GeForce RTX 3090 with CUDA capability sm_86 is not compatible with the current PyTorch installation.

The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70 sm_75.

If you want to use the GeForce RTX 3090 GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/

warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name))

:/usr/local/cuda-11.0/bin/nvcc -DWITH_CUDA -I/home/dddzz/worksapce/Codes/FaceDetection2/LFD-A-Light-and-Fast-Detector-master/venv/lib/python3.8/site-packages/torch/include -I/home/dddzz/worksapce/Codes/FaceDetection2/LFD-A-Light-and-Fast-Detector-master/venv/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/home/dddzz/worksapce/Codes/FaceDetection2/LFD-A-Light-and-Fast-Detector-master/venv/lib/python3.8/site-packages/torch/include/TH -I/home/dddzz/worksapce/Codes/FaceDetection2/LFD-A-Light-and-Fast-Detector-master/venv/lib/python3.8/site-packages/torch/include/THC -I:/usr/local/cuda-11.0/include -I/home/dddzz/worksapce/Codes/FaceDetection2/LFD-A-Light-and-Fast-Detector-master/venv/include -I/usr/include/python3.8 -c lfd/model/utils/build/nms/src/cuda/nms_kernel.cu -o build/temp.linux-x86_64-3.8/lfd/model/utils/build/nms/src/cuda/nms_kernel.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1011" -DTORCH_EXTENSION_NAME=nms_ext -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_86,code=sm_86 -std=c++14

unable to execute ':/usr/local/cuda-11.0/bin/nvcc': No such file or directory

error: command ':/usr/local/cuda-11.0/bin/nvcc' failed with exit status 1

请问应该从文件中哪些地方进行修改,来匹配我的硬件设备。非常感谢!

why trt mode consumed more GPU and MEM space?

thx this awesome work! I just want to deploy this awesome work in practice, but i find that trt mode consume more GPU and MEM space.The GPU refered above is Tesla T4, i'd appreciate if the prompt repley.

测试精度问题

你好,我直接运行了 “WIDERFACE_LFD_XS.py“ 这个脚本,大约一天半的时间完成了1000个epoch的训练。
接下来运行 evaluation.py 脚本获取测试结果。

在使用 Retinaface.pytorch 中提供的 widerface_evaluate 文件夹下面的脚本进行测试,结果仅为:
Easy Val AP: 0.5808434018537905
Medium Val AP: 0.6870650160504126
Hard Val AP: 0.7353618031863371

这个测试结果与您公布的差距比较大。官方的 matlab 脚本我运行以后发现只绘制了3张图,图中也没有 LFD对应的曲线及数值上的指标。

请问我该怎么做,希望得到您的帮助,非常感谢!

WIDERFACE_train predict_error

WIDERFACE_train

lib_path: F:\Desktop\LFD-A-Light-and-Fast-Detector-master\lfd\data_pipeline\dataset\utils\libs
libturtojpeg_path_list: []
Traceback (most recent call last):
File "F:/Desktop/LFD-A-Light-and-Fast-Detector-master/WIDERFACE_train/predict.py", line 4, in
from lfd.data_pipeline.augmentation import *
File "F:\Desktop\LFD-A-Light-and-Fast-Detector-master\lfd\data_pipeline_init_.py", line 4, in
from .data_loader import *
File "F:\Desktop\LFD-A-Light-and-Fast-Detector-master\lfd\data_pipeline\data_loader_init_.py", line 3, in
from .data_loader import DataLoader
File "F:\Desktop\LFD-A-Light-and-Fast-Detector-master\lfd\data_pipeline\data_loader\data_loader.py", line 8, in
from ..dataset import turbojpeg, reserved_keys
File "F:\Desktop\LFD-A-Light-and-Fast-Detector-master\lfd\data_pipeline\dataset_init_.py", line 7, in
from .utils import *
File "F:\Desktop\LFD-A-Light-and-Fast-Detector-master\lfd\data_pipeline\dataset\utils_init_.py", line 3, in
from .turbojpeg import turbojpeg
File "F:\Desktop\LFD-A-Light-and-Fast-Detector-master\lfd\data_pipeline\dataset\utils\turbojpeg.py", line 457, in
target_libturbojpeg_path = max(libturtojpeg_path_list)
ValueError: max() arg is an empty sequence

Process finished with exit code 1

Code runs fine but getting a 'Segmentation fault' when exiting the predict program

All the predictions run fine and when the predict.py terminates it generates a 'segmentation fault' error message.

Is the program not supposed to call some proper termination expression to get pytorch terminates properly and in order, e.g. freeing up memory?
All seems to work though, and memory is ultimately released I can see that, but it seems to be done in a brutal way by the OS.

OCR with LFD

Hi @YonghaoHe , thanks for your work!
I'm trying to train charachter detection model using LFD, I'm using traffic sign detector as an example. Which parameters and config do you suggest change to get better result?

Enhance predict.py flexibility and functionalities

hi

thanks for your great project, really cool.

I just got it to work and I was thinking it could deserve a slightly more flexible inference code, e.g. adding command shell parameters rather than modifying the .py file directly to change the model used, or adding batch image file processing or webcam support, tweaking threshold/NMS values, etc.

I ll add some of those in the next few days but I am not a good programmer so just fyi. Happy to share stuff though if that is of any help.

I am on Jetson Xavier NX btw, I ll share some performance stats as well for reference once I got a chance to measure those.

Some notes on the installation for Jetson users:
Note1: I successfully installed and ran the code with pycuda 2019.1.2
Note2: watch out the installation of albumentations => 0.4.6 with pip3 install albumentations, this package somehow automatically forced an update of the opencv module and the numpy module to recent versions which were incompatible with Jetpack. So I had to manually uninstall the installed python-opencv and numpy packages and re-install the correct ones! Not the end of the world but definitely would be a killer for someone not used to those type of hurdles.

int8 calibrator

I have made the following changes in the file lfd/deployment/tensorrt/build_engine.py

#assert int8_calibrator is not None, 'calibrator is not provided!'

if precision_mode == 'int8':
    config.int8_calibrator = INT8Calibrator(data='',cache_file='int8_calibrator_cache')#int8_calibrator

I don't understand the data parameter. Can you please help with passing images in this data attribute?

图片Normalize可省略提升性能

在推理阶段, 图片预处理Normalize占用了大量CPU资源, tensorrt跑推理速度只有使用timing_inference_latency预计的0.5倍.
在网络内有BN的情况下, 可以省略预处理的Normalize. 即在训练时和推理时同时去掉Normalize. 肉眼看性能明显提升, 准召差别不大

May you release a cpu-based inference version?

As is known to all, it is better using CUDA for the training procedure. However, it is not practical for inference since many edge devices have not a GPU/CUDA hardway, such as my Surface.
In your README file, it is your wish that making LFD popular as YOLO series. So, I suggest that you release a cpu-based (even arm-based) inference version for edge devices, just like the Libfacedetection.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.