cvlab-columbia / viper Goto Github PK

View Code? Open in Web Editor NEW

1.6K 1.6K 117.0 26.15 MB

Code for the paper "ViperGPT: Visual Inference via Python Execution for Reasoning"

License: Other

Python 29.49% Shell 0.08% Jupyter Notebook 70.44%

viper's People

Contributors

Stargazers

Watchers

Forkers

hbcbh1999 dan255 kowaretaken tspannhw sailfish009 kustomzone axay28 anandhfullstack bukenyalukman schaferk emmytheo jaedukseo ai-app ailabteam pininduwk fraware michalsr armenr netwrkspider jovialio sanjayss34 ryanrussell jermwatt josebenitezg glavin001 falcondai cloudbring echo719 allan-mendes justinchiu czha168 majid03 swagatamaiti joskid xgmiao angelsantamaria iwillcodeu stephahn llv22 lchcapitalhumain flowerfan justvarshney cabelo siemensmatei devaansh100 moreesindo rstuven jameshennessytempus stjordanis sanchay-t hadryan puukallistaja techthiyanes maximgt tonyxia2016 apollohuang1 cat-stack-boop akashirf piranhachuck69 paperwave subashsigdel tanglespace gabesarch thinh-huynh-re chaejungmaeng dsivov yifanxu74 mystique-orca aniki-ly zebrajack databricks-ben ljl215186 mschoenb97 klonggan mendonis61 arostag arpg leesunfreshing hufeihu allthingsllm codeaudit seshakiran achalddave nicolasleiva trlkbeta jxw-tmp ztsin para-lost dougdotcon derkmed daukantas xiechengmude omarcr danielle34 karlfiner-robotiken hervemignot mjw10086 5l1v3r1 arunbanswal dassaswat

viper's Issues

Parallelize computation between GPUs or utilize TPUs?

Hello,

I am wondering if its possible to parallelize the GLIP and BLIP2 computation over multiple GPUs or to utilize TPUs for inference? I currently have access to 3 NVIDIA RTX A4500 each with 20GiBs of Vram memory. I suspect I should enough Vram memory to do inference with viperGPT but when I run the note book, I get out of memory errors for the GPU.

Alternatively, I am contemplating to perform inference using Kaggles TPUs with a Vram of 128 GiBs for 20 hours a week. However, I would need to utilize pytorch xla to make the torch tensors compatible with TPUs. Where in the code might I make the change? Should I edit the source code to perform inference?

Tony,

GQA Evaluation in test-dev_balanced dataset

Thanks for the great work! I want to reproduce evaluation on GQA. But I meet some problems, I have checked the issue, but the problem is still not resolved, so i choose to start a new issue.

From the issue, I understand that the results in the paper are obtained in the test-dev_balance of GQA. But I got the result in test-dev_balance only about 0.25 acc through the code and config.yaml in github. Meanwhile, in the first 5000 questions in test-dev_all, we get an acc close to 0.5 (similar to the result in paper). I don't understand for this large difference in results with the same settings.They differ only in test-dev_balance dataset and test-dev_all.

We also used stratified sampling for validation on the test-dev_balance dataset. We randomly selected 200 questions from the 0th to 2000, 2001 to 4000, 4001 to 6000, and 6001 to 8000 questions, respectively. The following are our test results (we computed all the acc as well as removed the acc that failed to compile separately).

Therefore, I would like to ask if there are some special config settings, such as BLIP model settings (blip2-flan-t5-xxl and blip2-flan-t5-xl), and load_models settings in base_config.yaml, or some other settings, when doing the verification of GQA.

If possible, could you provide some details in evaluating the GQA dataset?We wonder if we did done something wrong somewhere

Thanks in advance!

split in test-dev_balance	acc	filter failed to compile
0-2000	0.24742268041237114	0.3582089552238806
2001-4000	0.24861878453038674	0.3284671532846715
4001-6000	0.2346368715083799	0.35
6001-8000	0.23711340206185566	0.31724137931034485

About the paper

Is this paper submitted to ICCV 2023? What is the result? Thanks.

How to debug in viper?

Hi, I'm attempting to replicate the quantitative results of ViperGPT on the RefCOCO dataset, but I've run into an issue. Specifically, the code appears to halt after loading the DepthEstimation mode, without issuing any error message. Additionally, attempts to debug the issue using print or pdb methods have proven unsuccessful.

Could you please provide guidance on how to resolve this problem? Any assistance would be greatly appreciated.

CUDA out of memory (XVLM Model)

I am trying to run the "main_simple.ipynb" example script that was given, but I get the following error message when running from main_simple_lib import *:

OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 11.77 GiB total capacity; 10.38 GiB already allocated; 12.31 MiB free; 10.66 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

I reduced the BLIP2 model size from XXL to XL as seen here:

Additionally, I realized that the models are loaded from a list_models variable, so I printed them out as they were loaded consecutively, as seen here:

    counter_ = 0
    for model_class_ in list_models:
        print("MODEL: " + str(model_class_))
        for process_name_ in model_class_.list_processes():
            if process_name_ in config.load_models and config.load_models[process_name_]:
                consumers[process_name_] = make_fn(model_class_, process_name_, counter_)
                counter_ += 1

I see that the OOM traceback doesn't occur until I load the XVLM model, as seen here:

Is there any way I can fix this issue? I'm currently running a RTX 3080TI with 12GB of memory. I have tried not loading the XVLM model by using the following configuration:

which resolves my OOM exception, but I see new errors when I run execute_code(code, im, show_intermediate_steps=True) as seen here:

I'm not sure if the TypeError: object of type 'NoneType' has no len() is due to the XVLM model not being loaded, however. Any help would be greatly appreciated!

RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory

I'm getting this error on colab, anyone else facing this issue?

RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory

RuntimeError                              Traceback (most recent call last)
[/home/skhanuja/viper/main_simple.ipynb](about:blank) Cell 2 line 1
----> 1 from main_simple_lib import *

File[ ~/viper/main_simple_lib.py:32](about:blank)
     29 cache = Memory('cache/' if config.use_cache else None, verbose=0)
     31 mp.set_start_method('spawn', force=True)
--->[ 32](about:blank) from vision_processes import forward, finish_all_consumers  # This import loads all the models. May take a while
     33 from image_patch import *
     34 from video_segment import *

File[ ~/viper/vision_processes.py:177](about:blank)
    175     for process_name_ in model_class_.list_processes():
    176         if process_name_ in config.load_models and config.load_models[process_name_]:
-->[ 177](about:blank)             consumers[process_name_] = make_fn(model_class_, process_name_, counter_)
    178             counter_ += 1
    180 queues_in = None

File[ ~/viper/vision_processes.py:43](about:blank), in make_fn(model_class, process_name, counter)
     40 num_gpus = torch.cuda.device_count()
     41 gpu_number = counter % num_gpus
--->[ 43](about:blank) model_instance = model_class(gpu_number=gpu_number)
     45 def _function(*args, **kwargs):
     46     if process_name != model_class.name:
...
File[ ~/miniconda3/envs/vipergpt/lib/python3.10/site-packages/torch/serialization.py:282](about:blank), in _open_zipfile_reader.__init__(self, name_or_buffer)
    281 def __init__(self, name_or_buffer) -> None:
-->[ 282](about:blank)     super(_open_zipfile_reader, self).__init__(torch._C.PyTorchFileReader(name_or_buffer))

RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory

Dataset-specific API examples

Hi,

First of all thanks for your awesome work, really inspiring and also kudos for the clean and elegant code!

I was wondering, would it be possible to release the API examples specific for each dataset, i.e. Listings 2-5?

I know the examples in the wild don't need any API examples per se and work out of the box, but I'd assume that for the benchmarks and the final numbers these in-context examples play a significant role.
I'm mainly curious to understand how much "hand-holding" the model needs, i.e. how detailed/extensive these examples have to be (e.g. in comparison to the ones already given in Listings 2-5)?
Also, how much do these dataset-specific examples need to "cover" the distribution of possible questions for a particular dataset?

Thanks!

error while loading BLIPModel

Hi, I'm attempting to run main_batch.py but there is some error,
ValueError:
Some modules are dispatched on the CPU or the disk. Make sure you have enough GPU RAM to fit
the quantized model. If you want to dispatch the model on the CPU or the disk while keeping
these modules in 32-bit, you need to set load_in_8bit_fp32_cpu_offload=True and pass a custom
device_map to from_pretrained. Check
https://huggingface.co/docs/transformers/main/en/main_classes/quantization#offload-between-cpu-and-gpu
for more details.

OkVQA Evaluation

Thanks for the great work! I love how interpretable ViperGPT is! I am trying to evaluate the results on the OkVQA dataset, but I am facing a similar issue as Issue #24 , wherein the model generates the full answer instead of the specific (1-word) answer required for evaluating it as correct for the exact-match accuracy. I also tried being a bit "lenient" in calculating the accuracy by marking the prediction as correct if the answer word existed in the models' full-sentence predictions, however I still got an accuracy less than that reported in the paper.

Here are evaluation metrics from my experiments:
Exact-Match Accuracy (Wrong answer if prediction does not exactly match the answer): 9.435%
"Lenient" Accuracy (Correct answer if the answer word exists in the model's full length prediction): 21.62%

I am using GPT-3.5 for code generation and blip2-flan-t5-xl for visual queries. Could using blip2-flan-t5-xl instead of blip2-flan-t5-xxl resulted in such a high drop in accuracy, as I would have expected the "Lenient" Accuracy to be at least higher than the one reported in the paper as it may miscount a few answers as correct even though they aren't?

How to address problems with generated code

I've managed to bump up against an issue with the generated code:

Any recommended ways around this?

Problem with maximum context length using text-davinci-003

Hi, since codex is not available anymore, I've tried to use text-davinci-003, but openai is always sending back the error.

"This model's maximum context length is 4097 tokens, however, you requested 5270 tokens (4758 in your prompt; 512 for the completion). Please reduce your prompt; or completion length."

How do you deal with the max context length problem?

Can I run this repo without GPUs? I have a small project that needs to deploy this to a virtual machine, but GPU is too expensive for me to afford.

Missing function from OKVQA datasets

The function all_answers_from_dict is missing in the datasets.py file.

viper/datasets/okvqa.py

Line 14 in b9221c2

from datasets import general_postprocessing, all_answers_from_dict

Can you provide this function?
Thanks in advance!

Bug when using BLIP2 models

Hello!
Thank you for sharing the code of vipergpt. I have noticed that the cropped_image tensor in the ImagePatch function is being divided by 255. However, the BLIP2 model input requires PIL images or tensors that are of the original image scale. Therefore, when using the BLIP2 model, it may be necessary to multiply the cropped_image tensor.

Compilation issues after running setup.py inside GLIP directory

Hello,

I followed the instructions in the README in order to install and run viper. After I do cd GLIP and run python setup.py clean --all build develop --user, I get a plenty of DeprecatedTypeProperties& at::Tensor::type() errors, which generally show this message:

 warning: ‘at::DeprecatedTypeProperties& at::Tensor::type() const’ is deprecated: Tensor.type() is deprecated. Instead use Tensor.options(), which in many cases (e.g. in a constructor) is a drop-in replacement. If you were using data from type(), that is now available from Tensor itself, so instead of tensor.type().scalar_type(), use tensor.scalar_type() instead and instead of tensor.type().backend() use tensor.device().

It does finish compiling with warnings, and when I try importing main_simple_lib in main_simple.ipynb I get (as expected) an error:

Loading BLIP...
2023-04-24 12:42:26.690282: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
[libprotobuf FATAL google/protobuf/stubs/common.cc:83] This program was compiled against version 3.9.2 of the Protocol Buffer runtime library, which is not compatible with the installed version (3.19.6).  Contact the program author for an update.  If you compiled the program yourself, make sure that your headers are from the same version of Protocol Buffers as your link-time library.  (Version verification failed in "bazel-out/k8-opt/bin/tensorflow/core/framework/tensor_shape.pb.cc".)
terminate called after throwing an instance of 'google::protobuf::FatalException'
  what():  This program was compiled against version 3.9.2 of the Protocol Buffer runtime library, which is not compatible with the installed version (3.19.6).  Contact the program author for an update.  If you compiled the program yourself, make sure that your headers are from the same version of Protocol Buffers as your link-time library.  (Version verification failed in "bazel-out/k8-opt/bin/tensorflow/core/framework/tensor_shape.pb.cc".)
Aborted

I believe the problem could be related to an incompatibility between libraries, but I made sure to install everything from requirements.txt and have an active conda environment, so at this point I do not really know what else it could be. Does anyone know why I could be getting this issue? Thanks in advance 🥲

main batch without multiprocessing

Whenever I turn off multiprocessing using main_batch, nothing runs. The models load and then that is it. I've tried debugging it but I haven't had any success.

TypeError: 'type' object is not subscriptable on running main_simple.ipynb

Hi,

I followed the setup in the README.md. I am getting the following error when running the first cell of the main_simple.ipynb notebook:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[3], line 1
----> 1 from main_simple_lib import *

File ~/viper/main_simple_lib.py:32
     29 cache = Memory('cache/' if config.use_cache else None, verbose=0)
     31 mp.set_start_method('spawn', force=True)
---> 32 from vision_processes import forward, finish_all_consumers  # This import loads all the models. May take a while
     33 from image_patch import *
     34 from video_segment import *

File ~/viper/vision_processes.py:21
     17 console = Console(highlight=False)
     19 if mp.current_process().name == 'MainProcess':
     20     # No need to initialize the models inside each process
---> 21     import vision_models
     22     # Create a list of all the defined models
     23     list_models = [m[1] for m in inspect.getmembers(vision_models, inspect.isclass)
     24                    if vision_models.BaseModel in m[1].__bases__]

File ~/viper/vision_models.py:144
    140         to_return = to_return.cpu()
    141         return to_return  # To save: plt.imsave(path_save, prediction.cpu().numpy())
--> 144 class CLIPModel(BaseModel):
    145     name = 'clip'
    147     def __init__(self, gpu_number=0, version="ViT-L/14@336px"):  # @336px

File ~/viper/vision_models.py:232, in CLIPModel()
    227     negative_text_features = F.normalize(negative_text_features, dim=-1)
    229     return negative_text_features
    231 @torch.no_grad()
--> 232 def classify(self, image: Union[torch.Tensor, list], categories: list[str], return_index=True):
    233     is_list = isinstance(image, list)
    234     if is_list:

TypeError: 'type' object is not subscriptable

Can you provide prompts for ChatGPT 3.5?

I've encountered significant performance gaps while reproducing your work. I understand that your work is based on code-davinci-002, and ViperGPT's performance on ChatGPT 3.5 seems far from what is mentioned in your paper. Could you offer prompts designed for ChatGPT (such as Refcoco dataset)? This is crucial for my work, and I appreciate your assistance!

Minor issue: the link to the image in the notebook demo needs to be updated

Not a big issue though :D.
Furthermore, I think they moved most checkpoints of GLIP to huggingface. Here's the link (including both large and tiny versions):
https://huggingface.co/GLIPModel/GLIP/tree/main
It would be nicer if it could be included in the Pretrained models of readme, since the link for downloading GLIP checkpoints in download_models.sh does not work anymore.

Clarification over API use examples in prompt

Hello, thank you for the great work! I'm checking your prompts, especially your recently released prompt for GQA (prompts/benchmark/gqa.yaml). I see two types of example programs: some for demonstrating API usage, and some as in-context demonstration. Could you clarify how you curate these examples? Are they automatically generated or manually curated, and do you have specific criteria when selecting these examples?

Also, I see that some of the example questions are from GQA training set, and a few are even from the test set (for example, "Is that blanket to the right of a pillow"). Is this a bug? Does it potentially leak the test set?

Disk quota for running main_batch.py

Hi,

When I run the main_batch with CONFIG_NAME=base_config python main_batch.py, It downloads files that saturated my disk quota 20G (I am working on a remote cluster) and report disk quota exceed error and then exit. May I ask if that is what quota should I expect by running main_batch with base_configs (I only modify the directory of pre-trained model and data)? Here is a screen shot of the error:

ModuleNotFoundError: No module named 'IPython.kernel'

When I run the file of main_simple.ipynb,it gives ModuleNotFoundError: No module named 'IPython.kernel'.

Unable to download GLIP large checkpoint

Hi Team,

Thank you for your amazing work!

I get the below error when trying to download GLIP large checkpoint using wget -nc -P $PRETRAINED_MODELS_PATH/GLIP/checkpoints https://penzhanwu2bbs.blob.core.windows.net/data/GLIPv1_Open/models/glip_large_model.pth

Error:

Resolving penzhanwu2bbs.blob.core.windows.net (penzhanwu2bbs.blob.core.windows.net)... 20.60.68.132
Connecting to penzhanwu2bbs.blob.core.windows.net (penzhanwu2bbs.blob.core.windows.net)|20.60.68.132|:443... connected.
HTTP request sent, awaiting response... 409 Public access is not permitted on this storage account.
2023-05-30 16:21:45 ERROR 409: Public access is not permitted on this storage account..

Consider removing HiddenPrints?

I was trying to run your code (e.g. from vision_processes import forward), but the program would crash without my knowledge.

I believe the program crashed within a HiddenPrints context manager, in this case at:

viper/vision_models.py

Lines 1223 to 1224 in 03ba31e

 with warnings.catch_warnings(), HiddenPrints("XVLM"): 

 model = XVLMBase(config_xvlm, use_contrastive_loss=True, vision_config=vision_config)

But accordingly, this never invoked HiddenPrints.__exit__

viper/utils.py

Line 194 in 03ba31e

def __exit__(self, exc_type, exc_val, exc_tb):

So I never discovered the issue and consequently nothing else in my program was able to print anything. I had no idea why and this was very difficult to debug.

Would you please consider removing HiddenPrints from this codebase? I think it is better to show extra warnings than to risk such "unknown crashes". Thanks!

Any plans to include the video API in chatapi.prompt?

I was hoping to run VideoQA using ViperGPT, however, since Codex was discontinued, the same was not possible with ChatGPT. Have you explored other API prompts for videos to work in the same context window?

How to input videos?

Hi,

Is there a load_video function similar to load_image function that can load the video to be sent to the model?

Thanks

"process_guesses" function in Listing 4. OK-VQA example

Hi, thanks for sharing the code.

When reproducing your results on OK-VQA, I found that the in-context example you used for OK-VQA contains a function named process_guesses, which does not exist in the repo. Could you please provide its implementation?

Thanks a lot!

codex() returns None

Hi, I am really confused about this error.

After I finished all the installation, I started to run main_batch.py on NExTQA dataset, and my config file is:

path_pretrained_models: './pretrained_models'       # Path to the pretrained models
execute_code: False                                 # Execute the code after generating it. Only applies to main_batch

dataset:                                            # Dataset configuration
    dataset_name: 'NExTQA'                       # Dataset name
    data_path: '/content/gdrive/MyDrive'                               # Dataset path
    split: 'test'                                       # Dataset split. If '', it assumes there is only one split
    max_samples:                                    # Maximum number of samples to load
    batch_size: 20                                  # Batch size
    start_sample:                                  # Start sample index. Only used if max_samples is not None

load_models:                                        # Which pretrained models to load
    maskrcnn: False
    clip: False
    glip: True
    owlvit: False
    tcl: False
    gpt3_qa: True
    gpt3_general: True
    depth: True
    blip: True
    saliency: False
    xvlm: True
    codex: True
    codellama: False

detect_thresholds:                                  # Thresholds for the models that perform detection
    glip: 0.5
    maskrcnn: 0.8
    owlvit: 0.1
ratio_box_area_to_image_area: 0.0                   # Any detected patch under this size will not be returned
crop_larger_margin: True                            # Increase size of crop by 10% to include more context

verify_property:                                    # Parameters for verify_property
    model: xvlm                                     # Model to use for verify_property
    thresh_clip: 0.6
    thresh_tcl: 0.25
    thresh_xvlm: 0.6

best_match_model: xvlm                              # Which model to use for best_[image, text]_match

gpt3:                                               # GPT-3 configuration
    n_votes: 1                                      # Number of tries to use for GPT-3. Use with temperature > 0
    qa_prompt: ./prompts/gpt3/gpt3_qa.txt
    guess_prompt: ./prompts/gpt3/gpt3_process_guess.txt
    temperature: 0.                                 # Temperature for GPT-3. Almost deterministic if 0
    model: text-davinci-003                         # See openai.Model.list() for available models

codex:
    temperature: 0.                                 # Temperature for Codex. (Almost) deterministic if 0
    best_of: 1                                      # Number of tries to choose from. Use when temperature > 0
    max_tokens: 512                                 # Maximum number of tokens to generate for Codex
    prompt: ./prompts/chatapi.prompt                # Codex prompt file, which defines the API. (doesn't support video for now due to token limits)
    model: gpt-3.5-turbo                            # Codex model to use. [code-davinci-002, gpt-3.5-turbo, gpt-4]. See openai.Model.list() for available models

# Saving and loading parameters
save: True                                          # Save the results to a file
save_new_results: True                              # If False, overwrite the results file
results_dir: ./results/                             # Directory to save the results
use_cache: True                                     # Use cache for the models that support it (now, GPT-3)
clear_cache: False                                  # Clear stored cache
use_cached_codex: False                             # Use previously-computed Codex results
cached_codex_path: ''                               # Path to the csv results file from which to load Codex results
log_every: 20                                       # Log accuracy every n batches
wandb: False                                        # Use Weights and Biases

blip_half_precision: True                           # Use 8bit (Faster but slightly less accurate) for BLIP if True
blip_v2_model_type: blip2-flan-t5-xxl               # Which model to use for BLIP-2

use_fixed_code: False                               # Use a fixed code for all samples (do not generate with Codex)
fixed_code_file: ./prompts/fixed_code/blip2.prompt  # Path to the fixed code file

but I got None from codex() in each iteration,

if not config.use_cached_codex:
     codes = codex(prompt=batch['query'], base_prompt=base_prompt, input_type=input_type,
     extra_context=batch['extra_context'])
(codes here is None)

I have no idea how to fix or debug this error. Is there any suggestions to locate or resolve this error. Thanks a lot.

API for VideoQA

Hi, thanks for the great work! I really love this paper and really happy to try some VideoQA examples on this model.

In this process, I faced some questions. It will be really nice of you to share any useful information with me!

The API in the paper and in the api.prompt file are slightly different, may I ask which is the final version?
Also, I noticed

viper/prompts/api.prompt

Line 443 in bde4c63

# Examples of how to use the API

, I am wondering whether you also add some in-context examples to the prompt, if so, may I know the exact number and the exact examples? Knowing this prompt will largely help me reproduce the results of the VideoQA task.

Any help would be highly appreciated. Thanks in advance!

from maskrcnn_benchmark.engine.predictor_glip import ......

On line 416 of the vision_models.py, the module maskrcnn_benchmark was imported. Is it https://github.com/facebookresearch/maskrcnn-benchmark/blob/main/INSTALL.md here? Am I to install the maskrcnn module? I do not see it from the requirements.txt . I think there are several modules that the code imports with out it being in the requirements.txt?

MCQ Evaluation

Hi! I was wondering how to make the MCQ evaluation work - from my understanding the queries.csv file has a "possible_answers" column which takes a list of possible answers, but for some reason my results keep giving me answers outside of the specified list. For example, here is a line of my queries.csv:

index,sample_id,possible_answers,query_type,query,answer,image_name
0,0,"['purple', 'red', 'green', 'yellow']",,What color is the flower?,purple,flower.jpeg

But the returned result is always "blue", which I don't want to be an option. I was wondering if there is a way to fix this behavior.

Can't find a version for "decord" when running setup

I'm on an m1 mbp. After running the setup script with bash, it starts installing dependencies. It fails when it gets to decord:

  Using cached accelerate-0.18.0-py3-none-any.whl (215 kB)
Collecting backoff==2.2.1
  Using cached backoff-2.2.1-py3-none-any.whl (15 kB)
Collecting bitsandbytes==0.38.1
  Using cached bitsandbytes-0.38.1-py3-none-any.whl (104.3 MB)
Collecting cityscapesscripts==2.2.1
  Using cached cityscapesScripts-2.2.1-py3-none-any.whl (473 kB)
ERROR: Could not find a version that satisfies the requirement decord==0.6.0 (from versions: none)
ERROR: No matching distribution found for decord==0.6.0
download_models.sh: line 9: wget: command not found

Please advise.

GQA Evaluation

Thanks for the great work! I wanted to reproduce evaluation on GQA, however, I am not sure how I can do that.

GQA treats question answering as a classification problem, however, I am not sure how I can do it in this setting. Previous models(specifically, LXMERT) trains a classifier for this, but how do we replicate that here?

I tried sentence similarity models, however, the answers are only single words which is why the model is not working too well.

If possible, could you provide me the code/let me know the implementation used for the same?

Thanks in advance!

"AttributeError: 'VideoSegment' object has no attribute 'shape' when running GPT-3.5-turbo generated code

Hi, I'm currently trying to implement video question-answering using the code generated by GPT-3.5-turbo, as described in the paper. However, when executing the generated code, I'm encountering an AttributeError: 'VideoSegment' object has no attribute 'shape'.
The relevant portion of the code is as follows:

video_segment = VideoSegment(video)
last_frame = ImagePatch(video_segment, -1)

According to the paper, ImagePatch(video_segment, -1) should be a valid operation that retrieves the last frame from the video segment to create an ImagePatch. However, it seems like this operation is not actually implemented in the code.
Could you please provide guidance on how this is supposed to work? Is there a step in the process that is not explicitly stated in the paper, such as extracting the last frame from the video_segment before passing it to ImagePatch? Any clarification would be greatly appreciated.

Recommended vram/reliable method to run CPU only?

Just wondering how much vram you guys recommend. My GPU has 16gb but that doesn't seem to be enough as I'm running into OOM errors. Alternatively, is there a reliable method to run CPU-only? I've tried forcing the device in vision_models.py, line 36, but it looks that doesn't cover everything.

BLIP-2 Accelerate ImportError

Thanks for the great work and the latest update on eval script. When I was trying to run the model, I meet the following problem when loading BLIP-2 model:

I m wondering how to solve this problem, thanks!

Video example

Hi, is there any example to run for a given video? It is kind of confusing how to do this.

when the source codes can be released?

Not working on Apple Silicon

I'm trying to get this working on my Mac M2. However, Decord doesn't seem to have a version for Apple Silicon. Alternatively, I've been trying to get it to work with Google Colab, but Conda + Colab has been difficult. Any suggestions for what to do?

Issues downloading models from Google Drive

Hey, thanks for your great work!

I was just trying to download the pretrained models and encountered the following issues with gdown.

For example, this command:

gdown "https://drive.google.com/uc?id=1Cb1azBdcdbm0pRMFs-tupKxILTCXlB4O" -O $PRETRAINED_MODELS_PATH/TCL/TCL_4M.pth

results in:

Access denied with the following error:

        Cannot retrieve the public link of the file. You may need to change
        the permission to 'Anyone with the link', or have had many accesses. 

You may still be able to access the file from the browser:

         https://drive.google.com/uc?id=1Cb1azBdcdbm0pRMFs-tupKxILTCXlB4O

Of course, I downloaded the files manually instead. But it would be great if you can fix this. I had this issue for the X-VLM, TCL, and InSPyReNet models.

The opensource library

In your paper you mentioned "To promote research in this direction, we develop a Python library enabling rapid development for program synthesis for visual tasks, which will be opensourced upon publication.", would you be willing to share the address of this python library or share it in another form?

InvalidRequestError: The model: `code-davinci-002` does not exist

Hello,

I managed to confirm that the API key I have from openai can request the openai framework via this openai test script . without RateLimitError. However, I was not able to get ViperGPT running on the jupytur notebook test example yet.

I have compiled maskrcnn_benchmark on | NVIDIA-SMI 470.141.03 Driver Version: 470.141.03 CUDA Version: 11.4

It would keep running the get_code function for hours and when I interrupt the jupytur notebook, I would get the below error.

InvalidRequestError: The model: code-davinci-002 does not exist
What might the issue be?

Request for Generated Code from Experiments

Hi there,

I'm really interested in your project and would like to try it out. However, I noticed that your experiments involve generating some code that can be time-consuming and expensive to reproduce. As it would require significant computational resources to generate the code from scratch, I was wondering if you would be willing to provide the generated code so that others can build upon your work without incurring these costs. It would be really helpful to have this code as a starting point for my own experiments.

Thank you for your time and consideration.

Appendix B, listings 2,3,4, and 5?

I am looking for the API examples provided per-dataset as hinted in listings 2,3,4, and 5, of the appendix.
From a quick search I could not find them in the codebase. Are they available somewhere?

Multiple ModuleNotFoundError

I am running this within the vipergpt conda environment. I've tried to conda install dill as well as pip install -r requirements and the dill package seems to be installed properly, the latter did actually end up installing a bunch of packages which apparently were not already there (not very familiar with conda so maybe pip and conda have different installation destinations), however I still get the errors shown below.

Code not moving across codex_helper

Hi,
When I run main_simple.ipynb in debugger mode, I notice that the code never passes over codex_helper here. It repeatedly runs the same function but doesn't move across it. What can be the possible solution to this?

Thanks

	with warnings.catch_warnings(), HiddenPrints("XVLM"):
	model = XVLMBase(config_xvlm, use_contrastive_loss=True, vision_config=vision_config)

cvlab-columbia / viper Goto Github PK

viper's People

Contributors

Stargazers

Watchers

Forkers

viper's Issues

Recommend Projects

Recommend Topics

Recommend Org