Giter Club home page Giter Club logo

nvp's Introduction

Scalable Neural Video Representations with Learnable Positional Features (NVP)

Official PyTorch implementation of "Scalable Neural Video Representations with Learnable Positional Features" (NeurIPS 2022) by Subin Kim*1, Sihyun Yu*1, Jaeho Lee2, and Jinwoo Shin1.

1KAIST, 2POSTECH

TL;DR: We propose a novel neural representation for videos that is the best of both worlds; achieved high-quality encoding and the compute-/parameter- efficiency simultaneously.

1. Requirements

Environments

Required packages are listed in environment.yaml. Also, you should install the following packages:

conda install pytorch torchvision cudatoolkit=11.3 -c pytorch

pip install git+https://github.com/subin-kim-cv/tiny-cuda-nn/#subdirectory=bindings/torch

Dataset

Download the UVG-HD dataset from the following link:

Then, extract RGB sequences from the original YUV videos of UVG-HD using ffmpeg. Here, INPUT is the input file name, and OUTPUT is a directory to save decompressed RGB frames.

ffmpeg -f rawvideo -vcodec rawvideo -s 1920x1080 -r 120 -pix_fmt yuv420p -i INPUT.yuv OUTPUT/f%05d.png

2. Training

Run the following script with a single GPU.

CUDA_VISIBLE_DEVICES=0 python experiment_scripts/train_video.py --logging_root ./logs_nvp --experiment_name <EXPERIMENT_NAME> --dataset <DATASET> --num_frames <NUM_FRAMES> --config ./config/config_nvp_s.json 
  • Option --logging_root denotes the path to save the experiment log.
  • Option --experiment_name denotes the subdirectory to save the log files (results, checkpoints, configuration, etc.) existed under --logging_root.
  • Option --dataset denotes the path of RGB sequences (e.g., ~/data/Jockey).
  • Option --num_frames denotes the number of frames to reconstruct (300 for the ShakeNDry video and 600 for other videos in UVG-HD).
  • To reconstruct videos with 300 frames, please change the values of t_resolution in configuration file to 300.

3. Evaluation

Evaluation without compression of parameters (i.e., qunatization only).

CUDA_VISIBLE_DEVICES=0 python experiment_scripts/eval.py --logging_root ./logs_nvp --experiment_name <EXPERIMENT_NAME> --dataset <DATASET> --num_frames <NUM_FRAMES> --config ./logs_nvp/<EXPERIMENT_NAME>/config_nvp_s.json   
  • Option --save denotes whether to save the reconstructed frames.
  • One can specify an option --s_interp for a video superresolution results. It denotes the superresolution scale (e.g., 8).
  • One can specify an option --t_interp for a video frame interpolation results. It denotes the temporal interpolation scale (e.g., 8).

Evaluation with compression of parameters using well-known image and video codecs.

  1. Save the quantized parameters.

    CUDA_VISIBLE_DEVICES=0 python experiment_scripts/compression.py --logging_root ./logs_nvp --experiment_name <EXPERIMENT_NAME> --config ./logs_nvp/<EXPERIMENT_NAME>/config_nvp_s.json  
    
  2. Compress the saved sparse positional image-/video-like features using codecs.

    • Execute compression.ipynb.
    • Please change the logging_root and experiment_name in compression.ipynb appropriately.
    • One can change qscale, crf, framerate which changes the compression ratio of sparse positinal features.
      • qscale ranges from 1 to 31, where larger values mean the worse quality (2~5 recommended).
      • crf ranges from 0 to 51 where larger values mean the worse quality (20~25 recommended).
      • framerate (25 or 40 recommended).
  3. Evaluation with the compressed parameters.

    CUDA_VISIBLE_DEVICES=0 python experiment_scripts/eval_compression.py --logging_root ./logs_nvp --experiment_name <EXPERIMENT_NAME> --dataset <DATASET> --num_frames <NUM_FRAMES>  --config ./logs_nvp/<EXPERIMENT_NAME>/config_nvp_s.json --qscale 2 3 3 --crf 21 --framerate 25
    
    • Option --save denotes whether to save the reconstructed frames.
    • Please specify the option --qscale, --crf, --framerate as same with the values in the compression.ipynb.

4. Results

Reconstructed video results of NVP on UVG-HD, and other 4K/long/temporally dynamic videos are available at the following project page.

Our model achieves the following performance on UVG-HD with a single NVIDIA V100 32GB GPU:

Encoding Time BPP PSNR (↑) FLIP (↓) LPIPS (↓)
~5 minutes 0.901 34.57 $\pm$ 2.62 0.075 $\pm$ 0.021 0.190 $\pm$ 0.100
~10 minutes 0.901 35.79 $\pm$ 2.31 0.065 $\pm$ 0.016 0.160 $\pm$ 0.098
~1 hour 0.901 37.61 $\pm$ 2.20 0.052 $\pm$ 0.011 0.145 $\pm$ 0.106
~8 hours 0.210 36.46 $\pm$ 2.18 0.067 $\pm$ 0.017 0.135 $\pm$ 0.083
  • The reported values are averaged over the Beauty, Bosphorus, Honeybee, Jockey, ReadySetGo, ShakeNDry, and Yachtride videos in UVG-HD and measured using LPIPS, FLIP repositories.

One can download the pretrained checkpoints from the following link

Citation

@inproceedings{
    kim2022scalable,
    title={Scalable Neural Video Representations with Learnable Positional Features},
    author={Kim, Subin and Yu, Sihyun and Lee, Jaeho and Shin, Jinwoo},
    booktitle={Advances in Neural Information Processing Systems},
    year={2022},
}

References

We used the code from following repositories: SIREN, Modulation, tiny-cuda-nn.

nvp's People

Contributors

jakobtroidl avatar sihyun-yu avatar subin-kim-cv avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

nvp's Issues

Converting encoding time to iterations

Thanks so much for sharing the code. Could you point me to where you provide the number of training iterations that correspond to each encoding time? I don't have access to v100s, and depending on conditions on the GPUS I can use, I get significant speed differences even between GPUs of the same type, so figuring out how to reproduce the results given only the approximate v100 encoding time has been challenging.

whether `temporal_interp` is used at training time ?

Hi, thank you for the amazing work.

I'm interested in the temporal interpolation (frame interpolation) function of NVP. But I found that the quality of interpolated frames is very low.

I noted that if temporal interpolation is required (i.e., opt.t_interp != -1), then temporal_interp is set as True at test time, and the model will call self.sparse_grid.forward_inter. But at training time, the temporal_interp is set as Flase as default.

So, do I need to turn on temporal_interp at training time if I want temporal interpolation ? or the sparse_grid.forward_inter is only used for inference time ?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.