Giter Club home page Giter Club logo

surp-loss-recovery-bit-rate-control's Introduction

Goal of the project

To reproduce Reparo: Loss-Resilient Generative Codec for Video Conferencing. (https://arxiv.org/abs/2305.14135)

Introduction to Reparo

Proposed Innovation: adapt the generative network to resolve the problem of video freeze caused by package loss and improve the video quality in real-time video conferrencing

Network Structure: tokenizer(VQGAN) + bitcontroller+ packagizer + loss recovery module + decoder

Progress

Implement Part 1: Tokenizer/Codec(VQGAN)

Structure and explanations

graph LR
L((Data\nLoader))-->A
A((Encoder)) -- map to --> B((Codebook))--return-->F[token]-->C
M[Input]-->L
C((decoder)) --> D[Output]
B -- return --> E
E[Quan_Loss]
Loading

mingpt.py: get a transformer

lpips.py: get a pretrained VGG16 model

encoder.py: encode

decoder.py: decode

codebook.py: quantize the vector (map it to the closest vector); return the quantized vector and the encoding index

utils.py: load and process the data for future use

vqgan.py: pull the encoder, codebook and decoder together to build a VQGAN model; return the codebook mapping, the encoding index and the loss; calculate and return $\lambda$ (balance the contribution of the perceptual loss and the GAN loss)

vtoi.py: transform videos to images

train_vqgan1.py: train and resume the model

video_reconstruct.py: reconstruct the video

How to use

1. train the model with the data:

step1: video to image

python vtoi.py --path PATH_TO_VIDEO

step2: input the data and train the model python training_vqgan1.py --dataset-path DATASET_PATH --epochs NUM_OF_EPOCH

2. reconstruct the input video with pretrained model

DATASET_PATH: the path to the processed data in 1.step1

python video_reconstruct.py --dataset-path DATASET_PATH --num-images NUM_OF_IMAGE --checkpoint-path CHECKPOINT/XXX.pt --image-output IMAGE_OUTPUT_DIR --video-output VIDEO_OUTPUT_DIR

Sample Input & Output

INPUT: Input & Output/wave.mp4

OUPUT: Input & Output/testvideo2.avi

Citatation

   @misc{esser2021taming,
      title={Taming Transformers for High-Resolution Image Synthesis}, 
      author={Patrick Esser and Robin Rombach and Björn Ommer},
      year={2021},
      eprint={2012.09841},
      archivePrefix={arXiv},
      primaryClass={cs.CV} 	}

	https://github.com/dome272/VQGAN-pytorch
	https://arxiv.org/abs/2305.14135

surp-loss-recovery-bit-rate-control's People

Contributors

hifei4869 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.