Giter Club home page Giter Club logo

attentionpipeline's Introduction

Fast and accurate object detection in high resolution 4K and 8K video using GPUs [arXiv:1810.10551] [media: 1, 2] [video]

internship project at CMU, 2017-2018

Illustration image

Video: https://www.youtube.com/watch?v=07wCxSItnAk

Working with high resolution videos to locate certain objects. First pass serves as attention selection on low res version / fast check, second pass then checks more thoroughly on selected areas.

Instructions

Installation

Sample workstation:

  • Ubuntu 16.04.3
  • CUDA 8.0 (cuda-repo-ubuntu1604-8-0-local-ga2_8.0.61-1_amd64.deb)
  • CUDNN 5.1 (cudnn-8.0-linux-x64-v5.1.tgz).

Used Anaconda3 and these modules (likely also works with newer versions)

  • matplotlib 2.0.0
  • tensorflow-gpu 1.0.1
  • Theano 0.9.0
  • Keras 2.0.3
  • hdf5 1.8.17
  • pillow 4.1.0
  • additionally, depending on the version - either YAD2K or darkflow (see instruction bellow)

ps: useful to install on server with restricted access: python setup.py install --user

Install

  • python 3.6.1, tensorflow with gpu and cuda support, keras (see list above)

  • version 1: YAD2K, python YOLO v2 implementation: https://github.com/allanzelener/YAD2K (commit hash a42c760ef868bc115e596b56863dc25624d2e756)

    • put files from "__to-be-put-with-YAD2K" to YAD2K folder
    • make sure that there is correct path to the YAD2K folder in "yolo_handler.py" on line yolo_paths = ["/home/<whatever>/YAD2K/","<more possible paths>"]
  • version 2: darkflow, another tensorflow YOLO v2 implementation, worked better with server deployment: https://github.com/thtrieu/darkflow

    • See this gist to test out darkflow on it's own
  • prepare data (see the ffmpeg commands bellow) so it follows this hierarchy:

    • VideoName (whatever name, for example PL_Pizza sample)
      • input
        • frames (whatever name again, for example separate different fps)
          • 0001.jpg
          • 0002.jpg
          • ...

Data preparation

Works with high resolution videos, respectively with the individual frames saved into input folder. We can convert the resulting annotated output frames back into video.

[Video to frames] 30 images every second (30 fps, can be changed), named frames/0001.jpg, frames/0002.jpg, ...

  • ffmpeg -i VIDEO.mp4 -qscale:v 5 -vf fps=30 frames/%04d.jpg

[Frames to video] Keep the same framerate

  • ffmpeg -r 30/1 -pattern_type glob -i 'frames/*.jpg' -c:v libx264 -vf fps=30 -pix_fmt yuv420p out_30fps.mp4

Running v1

  • Go through proper installation of everything required.
  • cd /<path to project>/video_parser_v1
  • python run_fast_sketch.py -horizontal_splits 2 -attention_horizontal_splits 1 -input "/<custom path>/PL_Pizza sample/input/frames/" -name "_ExampleRunNameHere"
  • See the results in /<custom path>/PL_Pizza sample/output_ExampleRunNameHere

Running v2

  • Go through proper installation of everything required.
  • cd /<path to project>/video_parser_v2
  • python run_serverside.py -horizontal_splits 2 -atthorizontal_splits 1 -input "/<custom path>/PL_Pizza sample/input/frames/" -name "_ExampleRunNameHere"
  • See the results in /<custom path>/__Renders/<_ExampleRunNameHere>/

To add the support of additional server workers:

Client - server illustration

  • (optionally) prepare server workers with either this setup:

    • python Server.py (useful CUDA_VISIBLE_DEVICES=0 python Server.py) on several server nodes. Each of these binds port :8123
    • connect via ssh tunnel from the client, use python ssh_server_connect.py as a guidance on which calls you need to run. It will be something like ssh -N -f -L 9000:"+server_name+":8123 "+user+"@"+server_name+".pvt.bridges.psc.edu
    • (the main client code will try to connect to ports 9000 .. 9099 locally - with this tunnel it will connect to the server workers on their individual servers and port :8123)
  • (optionally) alternatively run on one server with multiple GPU's with this setup:

    • CUDA_VISIBLE_DEVICES=0 python Server_gpuN.py 1
    • CUDA_VISIBLE_DEVICES=1 python Server_gpuN.py 2
    • etc. CUDA_VISIBLE_DEVICES=<id-1> python Server_gpuN.py <id>
    • Each one will again listen on local ports, this time it's :8000 + id*(11) (where id was 1,2, ...)
    • Finally run the client on yet another GPU and it will try to locally connect to all ports between 8000 ... 8099

(optional) Annotation

When the python code is run with -annotategt 'True', then the model will look for which frames have ground truth annotations accompanying them (in VOC style .xml file next to the .jpg). For these frames it then saves results into the output folder (into files annotbboxes.txt and annotnames.txt).

Visualization tool can be then run (with paths to the input and output folder set correctly). For example set file "visualize_gt_prediction.py" with these paths:

gt_path_folder = "/<path>/intership_project/_side_projects/annotation_conversion/annotated examples/input/auto_annot/" output_model_predictions_folder = "/<path>/intership_project/_side_projects/annotation_conversion/annotated examples/output_annotation_results/"

As result we should see something like image in _side_projects/annotation_conversion/annotated examples/example_visualization_of_ap_measurement.jpg

  • hand annotation possible with labelImg: https://github.com/tzutalin/labelImg (install with pip install labelImg)
  • automatic annotation of the "PNNL ParkingLot Pizza" dataset with _side_projects/annotation_conversion/convert_parking_lot_to_voc.py

(optional) Time profiling

I used kernprof from https://github.com/rkern/line_profiler#kernprof. Follow installation mentioned there (pip install line_profiler).

  • Put @profile before each function for profiling
  • Run kernprof -l -v run_fast_sketch.py

attentionpipeline's People

Contributors

previtus avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

attentionpipeline's Issues

Has files lose

Dear,
i meet a problem when i run your code:
python run_fast_sketch.py -horizontal_splits 2 -atthorizontal_splits 1 -name "_ExampleRunNameHere"
import yad2k, eval_yolo, eval_yolo_direct_images
ModuleNotFoundError: No module named 'eval_yolo'

can you provide me eval_yolo_direct_images,and eval_yolo py file?
i am not sure why i miss these file , i download https://github.com/allanzelener/YAD2K yolov2 and can inference.

A question about PEViD-UHD dateset

Thanks for your research!

I want to use PEViD-UHD dataset to run a object detection with 4k~8k images.
But the video I downloaded from ftp server at "tremplin.epfl.ch/PEViD-UHD/Original_UHD_MP4/" is 30 fps and about 1060 frames in all.
Strangely, the annotation files ( e.g. Exchanging_bags_day_indoor_1_4K.xgtf ) just have 390 frames in all. And the number of frames are different.

Have I downloaded the wrong files?

I really appreciate if you could tell me how to do.

There are two images showing the different informations.
image
image

Regarding Dataset

Hey, I am trying to fetch PeVID-UHD datasets. The server is not responding. Do you have a mirror of the dataset ? or Can you help me get the dataset?

Furthermore, Can you show how you arranged your codes and dataset while implementing (directory snip) when you worked on this project?

I am also facing problems with implementation with the newer versions of libraries. Can you share points to keep in mind during implementation in latest versions of libraries?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.