Giter Club home page Giter Club logo

endora's Introduction

Endora: Video Generation Models as

Endoscopy Simulators

Chenxin Li1* Hengyu Liu1* Yifan Liu1* Brandon Y. Feng2 Wuyang Li1 Xinyu Liu1 Zhen Chen3 Jing Shao4 Yixuan Yuan1✉

1CUHK   2MIT CSAIL   3CAS CAIR   4Shanghai AI Lab  

* Equal Contributions. Corresponding Author.


introduction

💡Key Features

  • A high-fidelity medical video generation framework, tested on endoscopy scenes, laying the groundwork for further advancements in the field.
  • The first public benchmark for endoscopy video generation, featuring a comprehensive collection of clinical videos and adapting existing general-purpose generative video models for this purpose.
  • A novel technique to infuse generative models with features distilled from a 2D visual foundation model, ensuring consistency and quality across different scales.
  • Versatile ability through successful applications in video-based disease diagnosis and 3D surgical scene reconstruction, highlighting its potential for downstream medical tasks

🛠Setup

git clone https://github.com/XGGNet/Endora.git
cd Endora
conda create -n Endora python=3.10
conda activate Endora

pip install torch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 --index-url https://download.pytorch.org/whl/cu118

pip install -r requirements.txt

Tips A: We test the framework using pytorch=2.1.2, and the CUDA compile version=11.8. Other versions should be also fine but not totally ensured.

Tips B: GPU with 24GB (or more) is recommended for video sampling by Endora inference, and 48GB (or more) for Endora training.

📚Data Preparation

Colonoscopic: The dataset provided by paper can be found here. You can directly use the processed video data by Endo-FM without further data processing.

Kvasir-Capsule: The dataset provided by paper can be found here. You can directly use the processed video data by Endo-FM without further data processing.

CholecTriplet: The dataset provided by paper can be found here. You can directly use the processed video data by Endo-FM without further data processing.

Please run process_data.py and process_list.py to get the split frames and the corresponding list at first.

CUDA_VISIBLE_DEVICES=gpu_id python process_data.py -s /path/to/datasets -t /path/to/save/video/frames

CUDA_VISIBLE_DEVICES=gpu_id python process_list.py -f /path/to/video/frames -t /path/to/save/text

The resulted file structure is as follows.

├── data
│   ├── CholecT45
│     ├── 00001.mp4
|     ├──  ...
│   ├── Colonoscopic
│     ├── 00001.mp4
|     ├──  ...
│   ├── Kvasir-Capsule
│     ├── 00001.mp4
|     ├──  ...
│   ├── CholecT45_frames
│     ├── train_128_list.txt
│     ├── 00001 
│           ├── 00000.jpg
|           ├── ...
|     ├──  ...
│   ├── Colonoscopic_frames
│     ├── train_128_list.txt
│     ├── 00001
│           ├── 00000.jpg
|           ├── ...
|     ├──  ...
│   ├── Kvasir-Capsule_frames
│     ├── train_128_list.txt
│     ├── 00001
│           ├── 00000.jpg
|           ├── ...
|     ├──  ...

🎇Sampling Endoscopy Videos

You can directly sample the endoscopy videos from the checkpoint model. Here is an example for quick usage for using our pre-trained models:

  1. Download the pre-trained weights from here and put them to specific path defined in the configs.
  2. Run sample.py by the following scripts to customize the various arguments like adjusting sampling steps.

Simple Sample to generate a video

bash sample/col.sh
bash sample/kva.sh
bash sample/cho.sh

DDP sample

bash sample/col_ddp.sh
bash sample/kva_ddp.sh
bash sample/cho_ddp.sh

⏳Training Endora

The weight of pretrained DINO can be found here, and in our implementation we use ViT-B/8 during training Endora. And the saved path need to be edited in ./configs

Train Endora with the resolution of 128x128 with N GPUs on the Colonoscopic dataset

torchrun --nnodes=1 --nproc_per_node=N train.py \
  --config ./configs/col/col_train.yaml \
  --port PORT \
  --mode type_cnn \
  --prr_weight 0.5 \
  --pretrained_weights /path/to/pretrained/DINO

Run training Endora with scripts in ./train_scripts

bash train_scripts/col/train_col.sh
bash train_scripts/kva/train_kva.sh
bash train_scripts/cho/train_cho.sh

📏Metric Evaluation

We first split the generated videos to frames and use the code from StyleGAN to evaluate the model in terms of FVD, FID and IS.

Test with process_data.py and code in stylegan-v

CUDA_VISIBLE_DEVICES=gpu_id python process_data.py -s /path/to/generated/video -t /path/to/video/frames
cd /path/to/stylegan-v
CUDA_VISIBLE_DEVICES=gpu_id python ./src/scripts/calc_metrics_for_dataset.py \
  --fake_data_path /path/to/video/frames \
  --real_data_path /path/to/dataset/frames 

Test with scipt test.sh

bash test.sh

🧰Running Compared Methods Re-implemented on Endoscopy

We provide the code of training and testing scripts of compared methods on endoscopy video generation (as shown in Table 1. Quantitative Comparison in paper). Please enter Other-Methods/ for more details. We will keep cleaning up the code.

The pre-trained weights for all the comparison methods are available here.

Here is an overview of performance&checkpoints on Colonoscopic Dataset.

Method FVD↓ FID↓ IS↑ Checkpoints
StyleGAN-V 2110.7 226.14 2.12 Link
LVDM 1036.7 96.85 1.93 Link
MoStGAN-V 468.5 53.17 3.37 Link
Endora (Ours) 460.7 13.41 3.90 Link

✒Ablation on Endora's Variants

We also provide the training of other variants of Endora (as shown in Table 3. Ablation Studies in paper). Training and Sampling Scripts are in train_scripts/ablation and sample/ablation respectively.

bash /train_scripts/ablation/train_col_ablation{i}.sh  % e.g., i=1 to run the 1st-row ablation experiments. 
bash /sample/ablation/col_ddp_ablation{i}.sh  % e.g., i=1 to run the 1st-row ablation experiments. 
Modified Diffusion Spatiotemporal Encoding Prior Guidance FVD↓ FID↓ IS↑ Checkpoints
611.9 22.44 3.61 Link
593.7 17.75 3.65 Link
493.5 13.88 3.89 Link
460.7 13.41 3.90 Link

🎪Downstream Application

We provide the reproduction steps for reproducing the results of extending Endora to downstream applications (as shown in Section 3.3 in paper).

Case I. Temporal-consistent Data Augmentation

Please follow the steps:

  1. Enter the path "Downstream-Semi/"
  2. Download PolypDiag dataset provided by paper from here. You can directly use the processed video data by Endo-FM without further data processing.
  3. Run the script bash semi_baseline.sh to obtain the Supervised-only lowerbound of semi-supervised disease diagnosis.
  4. Sample the endoscopy videos on Colonoscopic and CholecTriplet as augmented data. We also provide the sampled videos here for direct usage.
  5. Run the script bash semi_gen.sh for semi-supervised disease diagnosis using the augmented unlabeled data.
Method Colonoscopic CholeTriplet
Supervised-only 74.5 74.5
LVDM 76.2 78.0
Endora (Ours) 87.0 82.0

Case II. View-consistent Scene Simulator

Please follow the steps:

  1. Run COLMAP on the generated videos as the point initialization.
  2. Use EndoGaussian to train 3D representation of Gaussian Splatting.

Videos of Rendered RGB & Rendered Depth

🛒TODO List

  • Release code for Endora
  • Clean up the code for Endora
  • Upload the ckpt for compared methods.
  • Clean up the codes for training compared methods.

🎈Acknowledgements

Greatly appreciate the tremendous effort for the following projects!

📜Citation

If you find this work helpful for your project,please consider citing the following paper:

@article{li2024endora,
  author    = {Chenxin Li and Hengyu Liu and Yifan Liu and Brandon Y. Feng, and Wuyang Li and Xinyu Liu, Zhen Chen and Jing shao and Yixuan Yuan},
  title     = {Endora: Video Generation Models as Endoscopy Simulators},
  journal   = {arXiv preprint arXiv:2403.11050},
  year      = {2024}
}

endora's People

Contributors

xggnet avatar liuhengyu321 avatar yifliu3 avatar wymancv avatar

Stargazers

 avatar  avatar Zhiwen Yang avatar Xinyao Wu avatar  avatar Ray Yin avatar tk avatar Min Tan avatar Ma Jiajian avatar AAAAAA avatar  avatar  avatar  avatar Chen Xuanzhong avatar  avatar sword avatar  avatar HantaoZhang avatar  avatar Daniel Ji avatar  avatar Loping151 avatar Kun Yuan avatar wxyi avatar  avatar  avatar hao avatar  avatar Zhen Yao avatar  avatar  avatar  avatar  avatar Hanlin ZHANG avatar  avatar 邵泽满 avatar Yiqiao Qiu avatar  avatar  avatar  avatar Marchons avatar Journey avatar Mozy Okubo avatar  avatar  avatar yinda chen avatar  avatar Nie Lin avatar  avatar Qiushi Yang avatar  avatar  avatar  avatar  avatar  avatar fatak avatar Wenting Chen avatar Jie LIU avatar CassieMai avatar  avatar Flower avatar Bryanttxc avatar Asklv avatar Kairun Wen avatar Yunlong Lin avatar  avatar  avatar Kice avatar Zhihao PENG avatar  avatar Meiyuan Wen avatar Davy Qin avatar  avatar Qing Xu avatar  avatar  avatar Yiyang Liu avatar  avatar  avatar wjx0213 avatar  avatar yijinmao avatar  avatar Steffen avatar  avatar  avatar  avatar Brian avatar Hyperion_9 avatar  avatar  avatar  avatar Li Kehan avatar d-xr avatar  avatar  avatar  avatar  avatar  avatar Xinyu Liu avatar

Watchers

 avatar  avatar

endora's Issues

About the pretrained models

你好,我在加载预训练权重时发现好像和模型是不匹配的,问题如下:
(1)
图片
有一个多出的cov.weight;
(2)RuntimeError: Error(s) in loading state_dict for OptimizedModule:size mismatch for _orig_mod.pos_embed: copying a param with shape torch.Size([1, 64, 1152]) from checkpoint, the shape in l current model is torch.Size([1, 16, 1152]).

请问我需要改动模型代码哪里的参数吗?希望能得到您的解答!谢谢!!!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.