hcplab-sysu / aten Goto Github PK

View Code? Open in Web Editor NEW

79.0 7.0 14.0 19.36 MB

License: Other

Python 99.19% Dockerfile 0.21% Makefile 0.08% C++ 0.53%

aten's Introduction

Adaptive Temporal Encoding Network for Video Instance-level Human Parsing

By Qixian Zhou, Xiaodan Liang, Ke Gong, Liang Lin (ACM MM18)

Requirements

Python3, TensorFlow 1.3+, Keras 2.0.8+

Dataset

The model is trained and evaluated on our proposed VIP dataset for video instance-level human parsing. Please check it for more dataset details. VIP dataset contains 404 video sequences, including 304 sequences for training set, 50 sequences for validation set and 50 sequences for test set. For every 25 consecutive frames in each video, one frame is annotated densely with pixel-wise semantic part categories and instance-level identification. We release the source videos, frames and the fine annotations for training and validation set. You can evaluate the performance of your model on validation set with our released evaluation code.

The VIP dataset is available on both OneDrive and Baidu drive.

The share link of Baidu drive is:

link: https://pan.baidu.com/s/1rt9wmRf6o8HoBzj7EscyeQ

pwd: cpbt

Models

Models are released on OneDrive and baidu drive:

Parsing-RCNN(frame-level) weights(parsing_rcnn.h5).
ATEN(p=2,l=3) weights(aten_p2l3.h5).

Installation

Clone this repository
Keras with convGRU2D installation.

cd keras_convGRU
python setup.py install

Compile flow_warp ops(optional). The flow_warp.so have been generated(Ubuntu14.04, gcc4.8.4, python3.6, tf1.4). To compile flow_warp ops, you can excute the code as follows:

cd ops
make

Dataset setup. Download the VIP dataset(both VIP_Fine and VIP_Sequence) and decompress them. The directory structure of VIP should be as follows:

VIP
----Images
--------videos1
--------...
--------videos404
----adjacent_frames
--------videos1
--------...
--------videos404
----front_frame_list
----Category_ids
----Human_ids
----Instance_ids
----lists
........

Model setup. Download released weights and place in models floder.

Training

# ATEN training on VIP
python scripts/vip/train_aten.py

# Parsing-RCNN(frame-level) training on VIP
python scripts/vip/train_parsingrcnn.py

Inference

# ATEN inference on VIP
python scripts/vip/test_aten.py

# Parsing-RCNN(frame-level) inference on VIP
python scripts/vip/test_parsingrcnn.py

the results are stored in ./vis

Evaluate

modify the path in evaluate/*.py
run the code to evaluate your results generated by visualize.py

# for human parsing
python evaluate/test_parsing.py

# for instance segmentation
python evaluate/test_ap.py

# for instance-level human parsing
python evaluate/test_inst_part_ap.py

Reference

@inproceedings{zhou2018,
    Author = {Qixian Zhou, Xiaodan Liang, Ke Gong, Liang Lin},
    Title = {Adaptive Temporal Encoding Network for Video Instance-level Human Parsing},
    Booktitle = {Proc. of ACM International Conference on Multimedia (ACM MM)},
    Year = {2018}
}

Acknowledgements

This code is based on other source code on github:

matterport/Mask_RCNN(https://github.com/matterport/Mask_RCNN), an implementation of Mask R-CNN on Python 3, Keras, and TensorFlow.
KingMV/ConvGRU(https://github.com/KingMV/ConvGRU), an implementation of ConvGRU2D on Keras.

aten's People

Contributors

Stargazers

Watchers

Forkers

sberryman december-boy jasondu1993 doriswzg blanktec andrewjong brightjmc ylsxw jiananli2016 softengers zhanghongyong123456 wojiaoyanmin mh6845 qzane

aten's Issues

VIP Dataset on Google Drive

Any chance you can upload the VIP dataset to Google Drive? I'm unable to download it from Baidu drive here in the states.

Thanks!

How to get the result of instance-level segmentation

Thank you for your perfect work. After the multi-person segmentation. Is there any method to get the single instance-level segmentation?

Thx

About the flownet pretained model

Could you please tell me where is the flownet model from

Dataset download link does not work

Dear Authors,
Thank you very much for your contribution. Your Onedrive link is not working, also I can not download the dataset using Baidu Link. Can you please share the dataset link with me? Thanks in advance.

Could you please release the evaluation code for the vip dataset?

I could only find the visualization code. But there is no evaluation metric code released, i.e., the reported mIOU and AP scores.

about fps and dataset annotation toolboxes

Hi ,Thanks for your sharing. I want to konw whether this task can predict online, and also want to know what annotation toolboxes you have used for human parsing dataset.

flow_wrap.so make error:

ERROR:

`/ATEN/ops>make
Makefile:5: /usr/local/lib/python3.6/site-packages/tensorflow/include
nvcc -std=c++11 -c --expt-relaxed-constexpr --gpu-architecture=sm_61
-o ./build/flow_warp_gpu.o flow_warp/flow_warp.cu.cc
-I /usr/local/lib/python3.6/site-packages/tensorflow/include -I/usr/local/lib/python3.6/site-packages/tensorflow/include/external/nsync/public -I/usr/local/cuda/include -I/usr/local -D GOOGLE_CUDA=1 -x cu -Xcompiler -fPIC
/usr/local/lib/python3.6/site-packages/tensorflow/include/absl/strings/string_view.h(501): error: constexpr function return is non-constant

/usr/local/lib/python3.6/site-packages/tensorflow/include/google/protobuf/arena_impl.h(55): warning: integer conversion resulted in a change of sign

/usr/local/lib/python3.6/site-packages/tensorflow/include/google/protobuf/arena_impl.h(309): warning: integer conversion resulted in a change of sign

/usr/local/lib/python3.6/site-packages/tensorflow/include/google/protobuf/arena_impl.h(310): warning: integer conversion resulted in a change of sign

/usr/local/lib/python3.6/site-packages/tensorflow/include/unsupported/Eigen/CXX11/../src/SpecialFunctions/SpecialFunctionsImpl.h(651): warning: missing return statement at end of non-void function "Eigen::internal::igammac_cf_impl<Scalar, mode>::run [with Scalar=float, mode=Eigen::internal::VALUE]"
detected during:
instantiation of "Scalar Eigen::internal::igammac_cf_impl<Scalar, mode>::run(Scalar, Scalar) [with Scalar=float, mode=Eigen::internal::VALUE]"
(855): here
instantiation of "Scalar Eigen::internal::igamma_generic_impl<Scalar, mode>::run(Scalar, Scalar) [with Scalar=float, mode=Eigen::internal::VALUE]"
(2096): here
instantiation of "Eigen::internal::igamma_retval<Eigen::internal::global_math_functions_filtering_base<Scalar, void>::type>::type Eigen::numext::igamma(const Scalar &, const Scalar &) [with Scalar=float]"
/usr/local/lib/python3.6/site-packages/tensorflow/include/unsupported/Eigen/CXX11/../src/SpecialFunctions/SpecialFunctionsHalf.h(34): here

/usr/local/lib/python3.6/site-packages/tensorflow/include/unsupported/Eigen/CXX11/../src/SpecialFunctions/SpecialFunctionsImpl.h(712): warning: missing return statement at end of non-void function "Eigen::internal::igamma_series_impl<Scalar, mode>::run [with Scalar=float, mode=Eigen::internal::VALUE]"
detected during:
instantiation of "Scalar Eigen::internal::igamma_series_impl<Scalar, mode>::run(Scalar, Scalar) [with Scalar=float, mode=Eigen::internal::VALUE]"
(863): here
instantiation of "Scalar Eigen::internal::igamma_generic_impl<Scalar, mode>::run(Scalar, Scalar) [with Scalar=float, mode=Eigen::internal::VALUE]"
(2096): here
instantiation of "Eigen::internal::igamma_retval<Eigen::internal::global_math_functions_filtering_base<Scalar, void>::type>::type Eigen::numext::igamma(const Scalar &, const Scalar &) [with Scalar=float]"
/usr/local/lib/python3.6/site-packages/tensorflow/include/unsupported/Eigen/CXX11/../src/SpecialFunctions/SpecialFunctionsHalf.h(34): here
......
instantiation of "Eigen::internal::gamma_sample_der_alpha_retval<Eigen::internal::global_math_functions_filtering_base<Scalar, void>::type>::type Eigen::numext::gamma_sample_der_alpha(const Scalar &, const Scalar &) [with Scalar=double]"
/usr/local/lib/python3.6/site-packages/tensorflow/include/unsupported/Eigen/CXX11/../src/SpecialFunctions/arch/CUDA/CudaSpecialFunctions.h(154): here

1 error detected in the compilation of "/tmp/tmpxft_00004c83_00000000-6_flow_warp.cu.cpp1.ii".
make: *** [flow_warp_gpu.o] Error 1

System information:

python3.6
tensorflow1.13
keras2.2.4
CUDA9.2
cuDNN9.2
nvcc -V:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Tue_Jun_12_23:07:04_CDT_2018
Cuda compilation tools, release 9.2, V9.2.148

try to solve problem

1,In Makefile, modify --gpu-architecture=sm_52 to --gpu-architecture=sm_61, can't solve
2,reference tensorflow/tensorflow#22766, add -DNDEBUG, the error information
`/ATEN/ops>make
Makefile:5: /usr/local/lib/python3.6/site-packages/tensorflow/include
nvcc -std=c++11 -c --expt-relaxed-constexpr --gpu-architecture=sm_61
-o ./build/flow_warp_gpu.o flow_warp/flow_warp.cu.cc
-I /usr/local/lib/python3.6/site-packages/tensorflow/include -I/usr/local/lib/python3.6/site-packages/tensorflow/include/external/nsync/public -I/usr/local/cuda/include -I/usr/local -D GOOGLE_CUDA=1 -x cu -Xcompiler -fPIC -DNDEBUG
/usr/local/lib/python3.6/site-packages/tensorflow/include/absl/strings/string_view.h(501): warning: expression has no effect

/usr/local/lib/python3.6/site-packages/tensorflow/include/google/protobuf/arena_impl.h(55): warning: integer conversion resulted in a change of sign

/usr/local/lib/python3.6/site-packages/tensorflow/include/google/protobuf/arena_impl.h(309): warning: integer conversion resulted in a change of sign

/usr/local/lib/python3.6/site-packages/tensorflow/include/google/protobuf/arena_impl.h(310): warning: integer conversion resulted in a change of sign

module 'keras.layers' has no attribute 'ConvGRU2D'

Hi I've already install the requirements as specified in Readme.md and the compilation was sucessful. But I got the error when I was trying to run the program by python scripts/vip/train_aten.py
/data/APL/ATEN/aten_model.py(676)conv_gru_unit() x = KL.ConvGRU2D(filters=256, kernel_size=(3, 3), name="gru_recurrent_unit", AttributeError: module 'keras.layers' has no attribute 'ConvGRU2D'
The output of dir(keras.layers) is:
['Activation', 'ActivityRegularization', 'Add', 'AlphaDropout', 'AtrousConv1D', 'AtrousConv2D', 'AtrousConvolution1D', 'AtrousConvolution2D', 'Average', 'AveragePooling1D', 'AveragePooling2D', 'AveragePooling3D', 'AvgPool1D', 'AvgPool2D', 'AvgPool3D', 'BatchNormalization', 'Bidirectional', 'Concatenate', 'Conv1D', 'Conv2D', 'Conv2DTranspose', 'Conv3D', 'Conv3DTranspose', 'ConvLSTM2D', 'ConvLSTM2DCell', 'ConvRNN2D', 'ConvRecurrent2D', 'Convolution1D', 'Convolution2D', 'Convolution2DTranspose', 'Convolution3D', 'Cropping1D', 'Cropping2D', 'Cropping3D', 'CuDNNGRU', 'CuDNNLSTM', 'Deconv2D', 'Deconv3D', 'Deconvolution2D', 'Deconvolution3D', 'Dense', 'DepthwiseConv2D', 'Dot', 'Dropout', 'ELU', 'Embedding', 'Flatten', 'GRU', 'GRUCell', 'GaussianDropout', 'GaussianNoise', 'GlobalAveragePooling1D', 'GlobalAveragePooling2D', 'GlobalAveragePooling3D', 'GlobalAvgPool1D', 'GlobalAvgPool2D', 'GlobalAvgPool3D', 'GlobalMaxPool1D', 'GlobalMaxPool2D', 'GlobalMaxPool3D', 'GlobalMaxPooling1D', 'GlobalMaxPooling2D', 'GlobalMaxPooling3D', 'Highway', 'Input', 'InputLayer', 'InputSpec', 'K', 'LSTM', 'LSTMCell', 'Lambda', 'Layer', 'LeakyReLU', 'LocallyConnected1D', 'LocallyConnected2D', 'Masking', 'MaxPool1D', 'MaxPool2D', 'MaxPool3D', 'MaxPooling1D', 'MaxPooling2D', 'MaxPooling3D', 'Maximum', 'MaxoutDense', 'Minimum', 'Multiply', 'PReLU', 'Permute', 'RNN', 'ReLU', 'Recurrent', 'RepeatVector', 'Reshape', 'SeparableConv1D', 'SeparableConv2D', 'SeparableConvolution1D', 'SeparableConvolution2D', 'SimpleRNN', 'SimpleRNNCell', 'Softmax', 'SpatialDropout1D', 'SpatialDropout2D', 'SpatialDropout3D', 'StackedRNNCells', 'Subtract', 'ThresholdedReLU', 'TimeDistributed', 'UpSampling1D', 'UpSampling2D', 'UpSampling3D', 'Wrapper', 'ZeroPadding1D', 'ZeroPadding2D', 'ZeroPadding3D', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__path__', '__spec__', 'absolute_import', 'activations', 'add', 'advanced_activations', 'average', 'concatenate', 'constraints', 'conv_utils', 'convolutional', 'convolutional_recurrent', 'copy', 'core', 'cudnn_recurrent', 'deserialize', 'deserialize_keras_object', 'division', 'dot', 'embeddings', 'func_dump', 'func_load', 'has_arg', 'initializers', 'interfaces', 'local', 'maximum', 'merge', 'minimum', 'multiply', 'namedtuple', 'noise', 'normalization', 'np', 'object_list_uid', 'pooling', 'print_function', 'python_types', 'recurrent', 'regularizers', 'serialize', 'subtract', 'to_list', 'transpose_shape', 'warnings', 'wrappers']
I'm using docker tf 1.10

VIP dataset lost some frames in every video

I downloaded the VIP, but I found every video lost some frames, such as the thirteenth frame. How I can get the complete frames or videos

How does this work compare to CIHP_PGN?

Hi, I'm looking at both your work and CIHP_PGN. The conclusions from both your papers are very similar.

CIHP_PGN Paper

In this paper, we presented a novel detection-free Part Grouping Network to investigate instance-level human parsing, which is a more pioneering and challenging work in analyzing human in the wild. To push the research boundary of human parsing to match real-world scenarios much better, we further introduce a new large-scale (...)
Experimental results on PASCAL-Person-Part [6] and our CIHP dataset demonstrate the superiority of our proposed approach, which surpasses previous methods for both semantic part segmentation and edge detection tasks, and achieves state-of-the-art performance for instance-level human parsing.

Your Paper

In this work, we investigate video instance-level human parsing that is a more pioneering and realistic task in analyzing human in the wild. To fill the blank of video human parsing data resources, we further introduce a large-scale (...)
Experimental results on DAVIS [36] and our VIP dataset demonstrate the superiority of our proposed approach, which achieves state-of-the-art performance on both video instance-level human parsing and video segmentation tasks.

I'm wondering - which produces better accuracy, this work or CIHP_PGN? Considering that both claim "more pioneering", "demonstrate the superiority of our proposed approach", and "achieve state-of-the-art", can you help explain the differences? I'm not clear which I should use. Thanks!