Giter Club home page Giter Club logo

vqassessment / fast-vqa-and-fastervqa Goto Github PK

View Code? Open in Web Editor NEW
245.0 6.0 24.0 37.92 MB

[ECCV2022, TPAMI2023] FAST-VQA, and its extended version FasterVQA.

Home Page: https://www.ecva.net/papers/eccv_2022/papers_ECCV/html/1225_ECCV_2022_paper.php

License: Other

Jupyter Notebook 78.46% Python 21.54%
end-to-end-machine-learning low-level-vision quality-assessment quality-assessment-framework video-quality-assessment blind-video-quality-assessment deep-learning pytorch

fast-vqa-and-fastervqa's People

Contributors

angelahahaa avatar teowu avatar vonsago avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

fast-vqa-and-fastervqa's Issues

Match constraint

Hi @teowu

Thanks for the interesting work. I have some questions regarding the match constraint:

  1. Is the implementation of the match constraint the same as the PatchMerching?
  2. How to perform the ablation study on mis-matched mini-cubes?

Thanks for any help you can provide.

Hanwei
Best

the score

I have run the default one, and get a score around 0.133. Is the predicted score of the video? I find the groundtruth is 2.88 in the txt, Is there any connection between them?

Segmentation fault (core dumped)

Hi, I'm trying to use your Infer for a single MP4 video cmd. However, no matter which model I chose, e.g., FasterVQA, Fast-VQA-M, it raises error: Segmentation fault (core dumped). I tried to debug it. It seems the problem happened when I tried to put model onto device. However, if I use CPU to run it, the bash will not respond anymore.

My torch version is 2.2.0+cu118. System: Ubuntu. GPU: V100 32GB

decord issues

Hello. Thanks for your awesome work. I am trying to retrain the model with LSVQ datasets. During training, the library called decord reports some errors. Detailed error messages are described below.

/home/fujun/datasets/iqa/LSVQ/ia-batch7/NASAPlaneCrashTest.mp4
[h264 @ 0xac7d8c0] mmco: unref short failure
/home/fujun/datasets/iqa/LSVQ/ia-batch4/youtube-uUHP-0UZAJg.mp4
[h264 @ 0x8a7fd80] mmco: unref short failure
[h264 @ 0x8a7fd80] mmco: unref short failure
/home/fujun/datasets/iqa/LSVQ/yfcc-batch3/2416.mp4
[h264 @ 0xac41980] mmco: unref short failure
[h264 @ 0xac41980] mmco: unref short failure
/home/fujun/datasets/iqa/LSVQ/ia-batch1/youtube-SsbAXHjJmTg.mp4
[h264 @ 0x9dcc400] mmco: unref short failure
/home/fujun/datasets/iqa/LSVQ/ia-batch9/Waterloo_City_Council_-_Monday_February_10.mp4
[h264 @ 0xba63980] SEI type 0 size 64 truncated at 56
[h264 @ 0x8a80b00] SEI type 0 size 64 truncated at 56
[h264 @ 0x974ccc0] SEI type 0 size 64 truncated at 56
[h264 @ 0x8a80b00] SEI type 0 size 64 truncated at 56
[h264 @ 0x974ccc0] SEI type 0 size 64 truncated at 56
[h264 @ 0x8a80b00] SEI type 0 size 64 truncated at 56

How to deal with these errors?

Issue in "VQA.py" file

In your repo, you have mentioned that to check the quality of a single video in the range of [0,1] , we can use "vqa.py" file. So, I am using that file and performing a different study for that I need the video quality assessment task. I have written my code and performed everything but the only problem that I am getting is --
I have trained FasterVQA on my dataset but when I am accessing the quality of a single video using "vqa.py" file with the help of my trained network, in line 17 and 98, you have hardcoded the mean and std which are specific to your models. Since my model is trained on my dataset it is not performing well with these mean and std as compared to their corresponding pretrained model.
I just want to know how can i find mean and std specific to my trained model?

How to visualise val loss graph?

Similar to the training loss, can you please share how to visualize the validation loss V/S epoch graph used for checking the performance and whether the model is overfitting?

train

Hello, I want to ask that you can only train under the LSVQ data set when training. If so, you can share a data set with me. The official connection cannot download it.

config file seems not compatible with the vqa.py

Hi, when running vaq.py to inference a single mp4 file with default config file options/fast/f3dvqa-b.yml, it will produce an error:

Traceback (most recent call last):
File "/data4/xxx/VideoQualiry/FAST-VQA-and-FasterVQA/vqa.py", line 62, in
sampler = SampleFrames(clip_len = t_data_opt["clip_len"], num_clips = t_data_opt["num_clips"])
KeyError: 'clip_len

Has the data structure in f3dvqa-b.yml been modified? How should I fix it.

different score for the same video

hey,
using fasterVQA , i got different scores for the same samples of a the same video (in test mode) for either my own video and the video in demos directory.
why is it happening?

_IncompatibleKeys

This is yml file

name: FAST-VQA-B-Refactor-1*4
num_epochs: 1
l_num_epochs: 0
warmup_epochs: 2.5
ema: true
save_model: true
batch_size: 4
num_workers: 6

wandb:
    project_name: VQA_Experiments_2022

data:
    train:
        type: FusionDataset
        args:
            phase: train
            anno_file: ./examplar_data_labels/dataset/train_list.txt
            data_prefix: ../output_videos/
            sample_types:
                fragments:
                    fragments_h: 7
                    fragments_w: 7
                    fsize_h: 32
                    fsize_w: 32
                    aligned: 32
                    clip_len: 32
                    frame_interval: 2
                    num_clips: 1
    
    val-ltest:
        type: FusionDataset
        args:
            phase: test
            anno_file: ./examplar_data_labels/dataset/val_list.txt
            data_prefix: ../output_videos/
            sample_types:
                #resize:
                #    size_h: 224
                #    size_w: 224
                fragments:
                    fragments_h: 7
                    fragments_w: 7
                    fsize_h: 32
                    fsize_w: 32
                    aligned: 32
                    clip_len: 32
                    frame_interval: 2
                    num_clips: 4 
    
model:
    type: DiViDeAddEvaluator
    args:
        backbone:
            fragments:
                checkpoint: false
                pretrained: 
        backbone_size: swin_tiny_grpb
        backbone_preserve_keys: fragments
        divide_head: false
        vqa_head:
            in_channels: 768
            hidden_channels: 64
            
optimizer:
    lr: !!float 1e-3
    backbone_lr_mult: !!float 1e-1
    wd: 0.05
        
load_path: ../swin_tiny_patch244_window877_kinetics400_1k.pth

I am unable to load pre-trained swin model.
Got an error like this

_IncompatibleKeys(missing_keys=['fragments_backbone.layers.0.blocks.0.attn.fragment_position_bias_table', 'fragments_backbone.layers.0.blocks.1.attn.fragment_position_bias_table

All results are negative

Attempted to run your implementation via fast_vqa_model.py file in apis. I always get a -0.xx value, which does not match with what is expected (score between 0-100).

OSError: [Errno 22] Invalid argument: './pretrained_weights/FAST_VQA_3D_1*1.pth'

Hi,

I try to use your tool to evaluate quality of some videos of mine. The error "OSError: [Errno 22] Invalid argument: './pretrained_weights/FAST_VQA_3D_1*1.pth'" occurs.

Step to reproduce the issue:
python vqa.py

I've seen in the github release the files "FAST_VQA_3D_1_1_Scr.pth" and "FAST_VQA_B_1_4_Scr.pth", but with my OS (Windows 10 or Ubuntu 20.04), I can't rename those files because of the char '' in the name of the file "FAST_VQA_3D_11.pth"

Can you tell me what is wrong in my use of your application ?

Regards,
Adrien Schadle

LSVQ database

I wanted to download LSVQ database, but the colab official website is not be open. Do you have any download path?Can you share with me?

Training Probelems

Hello ! Thank you for your wonderful work.
I followed the configuration file you provided for FasterVQA training and found that when running new_train.py, the results are best after the first epoch, but they deteriorate as training progresses. What could be the reason for this? Additionally, multiple optimal models for different validation sets are saved. How is the Fast_VQA_3D_1*1.pth selected?

gpu error

from fastvqa import deep_end_to_end_vqa

import torch
dum_video = torch.randn((3,240,720,1080))
model_type="fast"
vqa = deep_end_to_end_vqa(True, model_type=model_type)
[True, True, True, False]
/home/saman/miniconda3/envs/fast_vqa/lib/python3.8/site-packages/torch/functional.py:478: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:2895.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
Successfully loaded pretrained=[True] fast-vqa model from pretrained_path=[pretrained_weights/fast_vqa_v0_3.pth].
Please make sure the input is [torch.tensor] in [(C,T,H,W)] layout and with data range [0,1].
vqa = deep_end_to_end_vqa(True, model_type=model_type, device="cuda:1")
[True, True, True, False]
/home/saman/miniconda3/envs/fast_vqa/lib/python3.8/site-packages/torch/cuda/init.py:146: UserWarning:
NVIDIA GeForce RTX 3060 with CUDA capability sm_86 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70.
If you want to use the NVIDIA GeForce RTX 3060 GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/

warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name))
Successfully loaded pretrained=[True] fast-vqa model from pretrained_path=[pretrained_weights/fast_vqa_v0_3.pth].
Please make sure the input is [torch.tensor] in [(C,T,H,W)] layout and with data range [0,1].

vqa(dum_video)
Traceback (most recent call last):
File "", line 1, in
File "/home/saman/Projects/FAST-VQA/fastvqa/apis/fast_vqa_model.py", line 77, in call
x = ((x.permute(1, 2, 3, 0) - self.mean) / self.std).permute(3, 0, 1, 2)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cpu!

No results obtained for testing

I am trying to run the python3 vqa.py file but it seems to get struck after "Setting backbone"
Do I need to download anything else apart from your git repository for testing the video?

Args are: Namespace(device='cuda', model='FasterVQA', video_path='./demos/10053703034.mp4')
backbone_size: swin_tiny_grpb
/home/user/.local/lib/python3.8/site-packages/torch/functional.py:445: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:2157.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]

Setting backbone: fragments_backbone

KeyErrors

File "/mnt/disk10T/wch/Faster_VQA1/fastvqa/datasets/fusion_datasets.py", line 535, in init
self.samplers[stype] = FragmentSampleFrames(sopt["clip_len"], sopt["num_clips"], sopt["frame_interval"])
KeyError: 'clip_len'

There are several keyerrors that occur in the code, may I ask how to change the code to solve the problem? In addition, may I ask if there is something else in the code that need to change and add? Thank you very much.

lru_cache TypeError

I run FAST-VQA.ipynb In[1], but get this error:

>>> from fastvqa.apis import deep_end_to_end_vqa
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/app/FAST-VQA-and-FasterVQA/fastvqa/__init__.py", line 1, in <module>
    from .datasets import *
  File "/app/FAST-VQA-and-FasterVQA/fastvqa/datasets/__init__.py", line 15, in <module>
    from .fusion_datasets import get_spatial_fragments, SimpleDataset, FusionDataset,  LSVQPatchDataset, FusionDatasetK400
  File "/app/FAST-VQA-and-FasterVQA/fastvqa/datasets/fusion_datasets.py", line 130, in <module>
    def get_resize_function(size_h, size_w, target_ratio=1, random_crop=False):
  File "/usr/local/lib/python3.7/functools.py", line 490, in lru_cache
    raise TypeError('Expected maxsize to be an integer or None')
TypeError: Expected maxsize to be an integer or None

and FAST-VQA.ipynb need to update.

how to transfer/fine tune on my dataset?

Hi
I want to:

  1. Retrain your model for my video dataset
  2. finetune/transfer learn on my dataset

Can you please Provide the minimum list and directory structure of the files/weights/dataset that I need to download for the above mentioned tasks?

You have provided the l**" python3 split_train.py "** for fine tuning. However I am facing the following error when I run it:

{'name': 'DiViDe-MRSSSL-DivideHead-NOUP', 'num_epochs': 30, 'l_num_epochs': 0, 'warmup_epochs': 2.5, 'ema': True, 'save_model': True, 'batch_size': 16, 'num_workers': 6, 'need_upsampled': False, 'need_feat': True, 'need_fused': False, 'wandb': {'project_name': 'VQA_Experiments_2022'}, 'data': {'train': {'type': 'FusionDataset', 'args': {'phase': 'train', 'random_crop': False, 'anno_file': './examplar_data_labels/train_labels.txt', 'data_prefix': '../datasets/LSVQ', 'sample_types': {'fragments': {'fragments_h': 8, 'fragments_w': 8, 'fsize_h': 16, 'fsize_w': 16, 'aligned': 32}, 'resize': {'size_h': 128, 'size_w': 128}}, 'clip_len': 32, 'frame_interval': 2, 'num_clips': 1}}, 'val': {'type': 'FusionDataset', 'args': {'phase': 'test', 'anno_file': './examplar_data_labels/LIVE_VQC/labels.txt', 'data_prefix': '../datasets/LIVE_VQC', 'sample_types': {'fragments': {'fragments_h': 8, 'fragments_w': 8, 'fsize_h': 16, 'fsize_w': 16, 'aligned': 32}, 'resize': {'size_h': 128, 'size_w': 128}}, 'clip_len': 32, 'frame_interval': 2, 'num_clips': 4}}}, 'model': {'type': 'DiViDeAddEvaluator', 'args': {'divide_head': True, 'vqa_head': {'in_channels': 768, 'hidden_channels': 64}}}, 'optimizer': {'lr': 0.001, 'backbone_lr_mult': 0.1, 'wd': 0.05}, 'load_path': '../model_baselines/NetArch/swin_tiny_patch244_window877_kinetics400_1k.pth', 'test_load_path': './pretrained_weights/DiViDe-MRSSSL-DivideHead-BiLearn_s_dev_v0.0.pth'}
Traceback (most recent call last):
  File "split_train.py", line 627, in <module>
    main()
  File "split_train.py", line 365, in main
    print(opt["split_seed"])
KeyError: 'split_seed'

problem

swin_tiny_grpb
None False
Setting backbone: fragments_backbone

Segmentation fault

When i run the vaq.py, somenthing like this

model-s and model-n

May I know the differences between model-s and model-n in split_train.py?

Correcting reference

Hi @TimothyHTimothy, thanks for the great work on VQA!

I found that you're citing the wrong reference paper (conference version) for our RAPIQUE model. Could you help correct to cite the right journal version instead in the camera-ready? Thank you very much for your consideration!

@article{tu2021rapique,
  title={RAPIQUE: Rapid and accurate video quality prediction of user generated content},
  author={Tu, Zhengzhong and Yu, Xiangxu and Wang, Yilin and Birkbeck, Neil and Adsumilli, Balu and Bovik, Alan C},
  journal={IEEE Open Journal of Signal Processing},
  volume={2},
  pages={425--440},
  year={2021},
  publisher={IEEE}
}

'Upsample' warnings and video decode errors

Hi, guys!
Thank you for your work. My environment is: torch 1.10, torchvision 0.11.
After runing 'python new_train.py -o options/fast/fast-b.yml' code, I meet these warnings and errors:

  1. UpSample warnings
    image

  2. Decode errors
    image
    image

Although the traing process seems still running, I wanna know whether these are normal cases or NOT.
Thanks for your help

Datasets request

Hello. Thanks for your great work. I am trying to retrain the model with LSVQ datasets. However, the form of LSVQ is disabled, and I can not get the datasets. Would you share your LSVQ datasets with me? Thank you very much.

andb.errors.UsageError: api_key not configured

I am trying to train with the given command: python new_train.py -o options/fast/fast-b.yml
but I am getting this error:
Traceback (most recent call last):
File "/proj/noref-videomodel/FAST-VQA-and-FasterVQA/new_train.py", line 617, in
main()
File "/proj/noref-videomodel/FAST-VQA-and-FasterVQA/new_train.py", line 394, in main
run = wandb.init(
File "/opt/conda/lib/python3.10/site-packages/wandb/sdk/wandb_init.py", line 1195, in init
raise e
File "/opt/conda/lib/python3.10/site-packages/wandb/sdk/wandb_init.py", line 1172, in init
wi.setup(kwargs)
File "/opt/conda/lib/python3.10/site-packages/wandb/sdk/wandb_init.py", line 306, in setup
wandb_login._login(
File "/opt/conda/lib/python3.10/site-packages/wandb/sdk/wandb_login.py", line 317, in _login
wlogin.prompt_api_key()
File "/opt/conda/lib/python3.10/site-packages/wandb/sdk/wandb_login.py", line 247, in prompt_api_key
raise UsageError("api_key not configured (no-tty). call " + directive)
wandb.errors.UsageError: api_key not configured (no-tty). call wandb.login(key=[your_api_key])

Do we need wandb API's key? If, yes, then how to get it for this project?

run FAST-VQA.ipynb UnpicklingError: could not find MARK

File /DATA/jupyter/share/haodawei/FAST-VQA/fastvqa/apis/fast_vqa_model.py:55, in VQAModel.load_pretrained(self, pretrained_path, device)
54 def load_pretrained(self, pretrained_path, device):
---> 55 state_dict = torch.load(pretrained_path, map_location=device)
57 if "state_dict" in state_dict:
58 state_dict = state_dict["state_dict"]

File /usr/local/python39/lib/python3.9/site-packages/torch/serialization.py:593, in load(f, map_location, pickle_module, **pickle_load_args)
591 return torch.jit.load(opened_file)
592 return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
--> 593 return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)

File /usr/local/python39/lib/python3.9/site-packages/torch/serialization.py:762, in _legacy_load(f, map_location, pickle_module, **pickle_load_args)
756 if not hasattr(f, 'readinto') and (3, 8, 0) <= sys.version_info < (3, 8, 2):
757 raise RuntimeError(
758 "torch.load does not work with file-like objects that do not implement readinto on Python 3.8.0 and 3.8.1. "
759 f"Received object of type "{type(f)}". Please update to Python 3.8.2 or newer to restore this "
760 "functionality.")
--> 762 magic_number = pickle_module.load(f, **pickle_load_args)
763 if magic_number != MAGIC_NUMBER:
764 raise RuntimeError("Invalid magic number; corrupt file?")

UnpicklingError: could not find MARK

How to draw Fig.9

Hi,

How can we draw the quality maps as shown in Fig.9? Will the codes for the maps be released? Could you please provide some instructions?

finetune

hello, when i want to finetune in Konvid, the result is crazy bad(plcc just 0.378), the following is .yml files

load_path: ./pretrained_weights/FAST_VQA_B_1_4.pth
test_load_path:

I have only modify the path of data

[Solved] Segmentation Fault (core dump) ** vqa.py**

This issue caused by import lib of decord and pytorch at the same time. For fixing this bug, you should reinstall lib of decord step by step like this:

sudo add-apt-repository ppa:savoury1/ffmpeg4
sudo apt-get update
sudo apt-get install -y build-essential python3-dev python3-setuptools make cmake
sudo apt-get install -y ffmpeg libavcodec-dev libavfilter-dev libavformat-dev libavutil-dev

Make install using cuda

git clone --recursive https://github.com/dmlc/decord
cd decord
mkdir build && cd build
cmake .. -DUSE_CUDA=ON -DCMAKE_BUILD_TYPE=Release
make

Then install for python

cd ../python
python3 setup.py install --user

Finally, you have installed decord correctly. You can re-run the vqa.py.

Why is FasterVQA 4 times more efficient than FastVQA.

It seems that the difference between FasterVQA and FastVQA is the application of St-GMS. However, in my understanding, St-MS samples more areas in the temporal domain, which will not improve efficiency. On the contrary, the implementation may become less efficient because the loop for t becomes larger. Right? I don't know what's wrong with my understanding.

Looking forward to your reply.

Why the output scores are weird like [-0.05591499]?

Hi, I downloaded the fast_vqa_v0_3.pth and A001.mp4 as the input video, then ran the inference.py and fast_vqa_model.py, but with both of them the output scores are weird like [-0.05591499], so that cannot compute srocc with label score, for example, [80.232]. Is there anything wrong did I do...?

Reproducing full resolution Swin-T baseline from FastVQA paper

Hello. Thanks you for your great work!
I had a question about the full-resolution Swin-T baseline given in the FastVQA paper. It is mentioned that fixed recognition features were regressed to get the baseline. Does this mean all frames of the video (no temporal sampling) and no fragmentation or resizing was done? Or was the temporally sampled video the input to the Swin-T model for generating the fixed features?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.