Giter Club home page Giter Club logo

scenic's Introduction

Scenic

scenic logo

Scenic is a codebase with a focus on research around attention-based models for computer vision. Scenic has been successfully used to develop classification, segmentation, and detection models for multiple modalities including images, video, audio, and multimodal combinations of them.

More precisely, Scenic is a (i) set of shared light-weight libraries solving tasks commonly encountered tasks when training large-scale (i.e. multi-device, multi-host) vision models; and (ii) several projects containing fully fleshed out problem-specific training and evaluation loops using these libraries.

Scenic is developed in JAX and uses Flax.

Contents

What we offer

Among others Scenic provides

  • Boilerplate code for launching experiments, summary writing, logging, profiling, etc;
  • Optimized training and evaluation loops, losses, metrics, bi-partite matchers, etc;
  • Input-pipelines for popular vision datasets;
  • Baseline models, including strong non-attentional baselines.

SOTA models and baselines in Scenic

There are some SOTA models and baselines in Scenic which were either developed using Scenic, or have been reimplemented in Scenic:

Projects that were developed in Scenic or used it for their experiments:

More information can be found in projects.

Baselines that were reproduced in Scenic:

More information can be found in baseline models.

Philosophy

Scenic aims to facilitate rapid prototyping of large-scale vision models. To keep the code simple to understand and extend we prefer forking and copy-pasting over adding complexity or increasing abstraction. Only when functionality proves to be widely useful across many models and tasks it may be upstreamed to Scenic's shared libraries.

Getting started

  • See projects/baselines/README.md for a walk-through baseline models and instructions on how to run the code.
  • If you would like to contribute to Scenic, please check out the Philisophy, Code structure and Contributing sections. Should your contribution be a part of the shared libraries, please send us a pull request!

Quickstart

You will need Python 3.9 or later. Download the code from GitHub

$ git clone https://github.com/google-research/scenic.git
$ cd scenic
$ pip install .

and run training for ViT on ImageNet:

$ python scenic/main.py -- \
  --config=scenic/projects/baselines/configs/imagenet/imagenet_vit_config.py \
  --workdir=./

Note that for specific projects and baselines, you might need to install extra packages that are mentioned in their README.md or requirements.txt files.

Here is also a minimal colab to train a simple feed-forward model using Scenic.

Scenic component design

Scenic is designed to propose different levels of abstraction, to support hosting projects that only require changing hyper-parameters by defining config files, to those that need customization on the input pipeline, model architecture, losses and metrics, and the training loop. To make this happen, the code in Scenic is organized as either project-level code, which refers to customized code for specific projects or baselines or library-level code, which refers to common functionalities and general patterns that are adapted by the majority of projects. The project-level code lives in the projects directory.

scenic design

Library-level code

The goal is to keep the library-level code minimal and well-tested and to avoid introducing extra abstractions to support minor use-cases. Shared libraries provided by Scenic are split into:

  • dataset_lib: Implements IO pipelines for loading and pre-processing data for common Computer Vision tasks and benchmarks (see "Tasks and Datasets" section). All pipelines are designed to be scalable and support multi-host and multi-device setups, taking care dividing data among multiple hosts, incomplete batches, caching, pre-fetching, etc.
  • model_lib : Provides
    • several abstract model interfaces (e.g. ClassificationModel or SegmentationModel in model_lib.base_models) with task-specific losses and metrics;
    • neural network layers in model_lib.layers, focusing on efficient implementation of attention and transformer layers;
    • accelerator-friendly implementations of bipartite matching algorithms in model_lib.matchers.
  • train_lib: Provides tools for constructing training loops and implements several optimized trainers (classification trainer and segmentation trainer) that can be forked for customization.
  • common_lib: General utilities, like logging and debugging modules, functionalities for processing raw data, etc.

Project-level code

Scenic supports the development of customized solutions for customized tasks and data via the concept of "project". There is no one-fits-all recipe for how much code should be re-used by a project. Projects can consist of only configs and use the common models, trainers, task/data that live in library-level code, or they can simply fork any of the mentioned functionalities and redefine, layers, losses, metrics, logging methods, tasks, architectures, as well as training and evaluation loops. The modularity of library-level code makes it flexible for projects to fall placed on any spot in the "run-as-is" to "fully customized" spectrum.

Common baselines such as a ResNet and Vision Transformer (ViT) are implemented in the projects/baselines project. Forking models in this directory is a good starting point for new projects.

Citing Scenic

If you use Scenic, you can cite our white paper. Here is an example BibTeX entry:

@InProceedings{dehghani2021scenic,
    author    = {Dehghani, Mostafa and Gritsenko, Alexey and Arnab, Anurag and Minderer, Matthias and Tay, Yi},
    title     = {Scenic: A JAX Library for Computer Vision Research and Beyond},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    year      = {2022},
    pages     = {21393-21398}
}

Disclaimer: This is not an official Google product.

scenic's People

Contributors

a-nagrani avatar aaronsarna avatar ahmetius avatar alexeyg avatar andrbaer avatar andsteing avatar antoyang avatar anuragarnab avatar austincstone avatar bryant1410 avatar chensun-goog avatar chiamp avatar evcu avatar hawkinsp avatar jheek avatar jrruijli avatar kfrancischen avatar mariolucic avatar mathildecaron31 avatar maximneumann avatar mechcoder avatar mjlm avatar mostafadehghani avatar phseo avatar rchen152 avatar ryanccarelli avatar shuang-liu avatar untom avatar xingyizhou avatar yilei avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

scenic's Issues

DETR matcher function

Hi,

I have several issues regarding the matcher function:

[1] In the default config file the matcher function specified is 'hungarian_cover_tpu'. This function is not supported in the code, the only available Hungarian matching functions are 'hungarian_tpu' and 'hungarian_scan_tpu'. I am trying to run the code on GCP TPU, which one do you think is more efficient ?

config.matcher = 'hungarian_cover_tpu'

[2] I tried to run the sinkhorn matcher - and I get the following exception when invoking 'sample_best_permutation' :

ile "scenic/projects/baselines/detr/detr_base_model.py", line 447, in loss_function
    matches = self.matcher(cost, n_cols)
  File "scenic/projects/baselines/detr/detr_base_model.py", line 202, in matcher
    return jax.lax.stop_gradient(matcher_fn(cost))
  File "scenic/model_lib/matchers/sinkhorn.py", line 146, in sinkhorn_matcher
    permutation = sample_best_permutation(rng, coupling, cost, num_permutations)
  File "scenic/model_lib/matchers/sinkhorn.py", line 94, in sample_best_permutation
    perms = vec_sample_permutation(key, coupling)
  File "scenic/model_lib/matchers/sinkhorn.py", line 65, in sample_permutation
    w = jnp.einsum('bnm,bm->bn', coupling, v)
  File .../.local/lib/python3.8/site-packages/jax/_src/numpy/lax_numpy.py", line 4333, in einsum
    operands, contractions = einsum_contract_path_fn(
  File "/usr/local/lib/python3.8/dist-packages/opt_einsum/contract.py", line 238, in contract_path
    raise ValueError("Size of label '{}' for operand {} ({}) does not match previous "
ValueError: Size of label 'm' for operand 1 (101) does not match previous terms (100).

The only thing I changed was changing the backbone - and since the matcher is applied on the decoder output I don't think it should change much.

vit demo not running past one epoch

when running the base config, eg:

time python3 main.py --config=projects/baselines/configs/imagenet/imagenet_vit_config.py --workdir=./fp32-test

with the only modification being a batch size of 48 (i am using 2 nvidia gpu's to run things)

at the end of the first epoch the code dies:

I0929 15:28:35.031877 139903179183936 local.py:41] Setting work unit notes: 0.4% @9277, 0.3 steps/s, ETA: 153959 min (90 min : 0.0% checkpoint, 23.7% eval)
I0929 15:28:35.042148 139882655643392 logging_writer.py:34] [9277] steps_per_sec=0.259032
I0929 15:29:42.340490 139903179183936 local.py:41] Setting work unit notes: 0.4% @9281, 0.1 steps/s, ETA: 671072 min (91 min : 0.0% checkpoint, 23.4% eval)
I0929 15:29:42.351035 139882655643392 logging_writer.py:34] [9281] steps_per_sec=0.059428
I0929 15:31:58.488358 139903179183936 local.py:41] Setting work unit notes: 0.4% @9283, 0.0 steps/s, ETA: 2713548 min (93 min : 0.0% checkpoint, 22.8% eval)
I0929 15:31:58.627787 139882655643392 logging_writer.py:34] [9283] steps_per_sec=0.014697
Killed

real    93m53.945s
user    222m56.696s
sys     63m40.052s

do you have any suggestions on things to check/have you seen this before?

Support alternative data_augmentations in imagenet_dataset

We are interested in using scenic's dataset_lib for a self-supervised learning project. The current imagenet_dataset is limited to either "default" augmentation (random crop, reshape, flip) or "None" (center crop, reshape). Some self-supervised learning models rely on alternative augmentations like random_resized_crop.

I could PR, (a) a solution that allows users to pass arbitrary preprocessing functions or (b) add a branch to the current logic for ssl preprocessing. Is this something that scenic would support? Is there a preference for one of these solutions?

Initialization of model

I noticed that flax provides multiple initializers in the form of flax.linen.initializers. Looking through the unet code i couldnt see it being used. So i wanted to know the current initialization strategy. Is it completely random and how it compares to the way networks are initialized in pytorch and will it affect the training dynamics, convergence etc.

Useful for Transfer learning?

Is it possible to use this TokenLearner within pretrained ViT models for transfer learning? If so, have you seen similar performance benefits?

Eval/Train step freeze during evaluation/train step with pmap and multi-gpu

Hello,

I am working currently with ViVit Scenic project (https://github.com/google-research/scenic/tree/main/scenic/projects/vivit) which is based on JAX.

And one of the problem that was encountered is that during the evaluation step or train step the process just hangs randomly (can happen in 1 hour of training or in 30 hours). This hang happens somewhere inside the pmapped eval/train function. And it happens only if there are more than 1 GPU. Furthermore, during this hang the GPU utilization (I am currenlty using A100, V100 or P5000) is zero, but the CPU utilization is almost 100 on all cores. And it hangs without any error or Exception.

Basically, if the eval/train step function is vmapped, just jitted or pmapped with 1 GPU, the steps works perfectly.

I have tried different experiments to find out the issue, by isolating code from any dataset or other libraries apart from jax, flax, ml_collections, scipy and tf. But nothing appears to help. There is an obvious connection between the error and multi-gpu configuration.

My environment is based on last version of jax and jaxlib:

jax 0.3.13
jaxlib 0.3.10+cuda11.cudnn82

Though I have previously tried different versions of jax and jaxlib (both "jax[cuda11_cudnn82]" and "jax[cuda11_cudnn805]").

My CUDA version is 11.4 and cudnn is 8.3.

Currently, the problem is quite crucial for use case of Vision Transformers and I am looking for some help or hints to solve this problem.

I attach to this post an Python file with running script which is isolated from data and use only basic packages as jax, flax and tf. It is made for 4 gpus, feel free to edit it at the end depending on your number of GPUs.
eval_steps_simple.py.txt

Installation Issue

I am trying to install scenic following the instructions (pip install .). I am using conda to create environment. Python=3.9.1, pip=21.2.4. I am getting the following information
INFO: pip is looking at multiple versions of pandas to determine which version is compa tible with other requirements. This could take a while.
INFO: This is taking longer than usual. You might need to provide the dependency resolv er with stricter constraints to reduce runtime. If you want to abort this run, you can press Ctrl + C to do so. To improve how pip performs, tell us what happened here: https ://pip.pypa.io/surveys/backtracking
It is taking hours and still not finished. I tried different versions of python and pip but still not work. Could you please let me know how to solve this? Thanks.

I can't install the environment

pip install .
Error is

ERROR: Command errored out with exit status 1:
     command: /media/yiwei/600G/anaconda3/envs/scenic/bin/python -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/media/yiwei/yiwei-01/project/scenic/setup.py'"'"'; __file__='"'"'/media/yiwei/yiwei-01/project/scenic/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-djfqqk6i/install-record.txt --single-version-externally-managed --compile --install-headers /media/yiwei/600G/anaconda3/envs/scenic/include/python3.7m/scenic
         cwd: /media/yiwei/yiwei-01/project/scenic/
    Complete output (3 lines):
    running install
    running simclr_download
    error: HTTP Error 301: Moved Permanently
    ----------------------------------------
ERROR: Command errored out with exit status 1: /media/yiwei/600G/anaconda3/envs/scenic/bin/python -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/media/yiwei/yiwei-01/project/scenic/setup.py'"'"'; __file__='"'"'/media/yiwei/yiwei-01/project/scenic/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-djfqqk6i/install-record.txt --single-version-externally-managed --compile --install-headers /media/yiwei/600G/anaconda3/envs/scenic/include/python3.7m/scenic Check the logs for full command output.

Will CLIP pre-trained owl-vit weights be made available?

Hi, thanks for your excellent work on open-vocab object detection.
It seems that all available owl-vit weights have already been fine-tuned on downstream object detection datasets. I wonder will CLIP pre-trained owl-vit weights be made available?

Error running OWL-ViT Colab

Hi there,

thank you very much for the extremely impressive work on OWL-ViT and for providing code, checkpoints and even a Colab to try it.

When running the Colab "as is", I am getting this error:
ScopeParamShapeError: Inconsistent shapes between value and initializer for parameter "positional_embedding" in "/clip/text": (16, 512), (15, 512)

it would be great if you can let us know how to fix this.
Thank you!
Chris

Unable to install scenic from requirements.txt in docker

Hi, I am trying to build docker image using

https://github.com/google-research/google-research/blob/eaa1a3f4c7e223f86c5266605c8aaf5b09df640b/dreamfields/Dockerfile#L1-L14

But the process halts at scenic with following error.

requirements.txt relevant content

git+git://github.com/google-research/scenic.git

error log:

Step 5/8 : RUN pip install -r requirements.txt
 ---> Running in 6678fe83aa7b
Looking in links: https://storage.googleapis.com/jax-releases/jax_releases.html, https://download.pytorch.org/whl/torch_stable.html
Collecting git+https://github.com/openai/CLIP.git (from -r requirements.txt (line 5))
  Cloning https://github.com/openai/CLIP.git to /tmp/pip-req-build-ldyy3145
  Running command git clone --filter=blob:none -q https://github.com/openai/CLIP.git /tmp/pip-req-build-ldyy3145
  Resolved https://github.com/openai/CLIP.git to commit e58d49454c92986a1d2a6a48add2333bbfbeaf51
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'done'
Collecting git+git://github.com/google-research/scenic.git (from -r requirements.txt (line 15))
  Cloning git://github.com/google-research/scenic.git to /tmp/pip-req-build-50cvcxa5
  Running command git clone --filter=blob:none -q git://github.com/google-research/scenic.git /tmp/pip-req-build-50cvcxa5
  fatal: remote error:
    The unauthenticated git protocol on port 9418 is no longer supported.
  Please see https://github.blog/2021-09-01-improving-git-protocol-security-github/ for more information.
WARNING: Discarding git+git://github.com/google-research/scenic.git. Command errored out with exit status 128: git clone --filter=blob:none -q git://github.com/google-research/scenic.git /tmp/pip-req-build-50cvcxa5 Check the logs for full command output.
ERROR: Command errored out with exit status 128: git clone --filter=blob:none -q git://github.com/google-research/scenic.git /tmp/pip-req-build-50cvcxa5 Check the logs for full command output.
WARNING: You are using pip version 21.3.1; however, version 22.0.4 is available.
You should consider upgrading via the '/usr/bin/python -m pip install --upgrade pip' command.
The command '/bin/sh -c pip install -r requirements.txt' returned a non-zero code: 1

Implementation of End-to-End Object Detection with Transformers

Hi,

I am currently implementing a Vision Transformer variant with Jax & Flax - and I want to use my classifier as a backbone for downstream tasks such as Object Detection.
In the readme file it is mentioned that DETER was reproduced in Scenic - will this implementation be available ?

Thanks!

Masked AutoEncoder

Hi,

Thanks for your contribution. May I ask do you plan to release the code about MAE with scenic.

Best,
Lucas

how to run vivit in Epic Kitchens dataset?

Hi,

In the vivit project, the epic_kitchens/vivit_large_factorised_encoder.py seems cannot match to the default dataloader. Could you please update the corresponding dataloader? Thanks.

AssertionError: Duplicate registrations for type 'experimentalOptimizer'`

Hello, I am trying to import clip using scenic by using:
from scenic.projects.baselines.clip import model as clip
However, an error like this occured:
tensorflow.python.framework.errors_impl.AlreadyExistsError: Another metric with the same name already exists.
I have tried downgrading keras as suggested here, however another error occurs:
Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/rex/anaconda3/envs/jax/lib/python3.9/site-packages/scenic/projects/baselines/clip/model.py", line 12, in <module> from scenic.projects.baselines.clip import download File "/home/rex/anaconda3/envs/jax/lib/python3.9/site-packages/scenic/projects/baselines/clip/download.py", line 9, in <module> from tensorflow.io import gfile File "/home/rex/anaconda3/envs/jax/lib/python3.9/site-packages/tensorflow/__init__.py", line 473, in <module> keras._load() File "/home/rex/anaconda3/envs/jax/lib/python3.9/site-packages/tensorflow/python/util/lazy_loader.py", line 41, in _load module = importlib.import_module(self.__name__) File "/home/rex/anaconda3/envs/jax/lib/python3.9/importlib/__init__.py", line 127, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "/home/rex/anaconda3/envs/jax/lib/python3.9/site-packages/keras/__init__.py", line 25, in <module> from keras import models File "/home/rex/anaconda3/envs/jax/lib/python3.9/site-packages/keras/models.py", line 20, in <module> from keras import metrics as metrics_module File "/home/rex/anaconda3/envs/jax/lib/python3.9/site-packages/keras/metrics/__init__.py", line 23, in <module> from keras.metrics.base_metric import Metric File "/home/rex/anaconda3/envs/jax/lib/python3.9/site-packages/keras/metrics/base_metric.py", line 25, in <module> from keras.engine import base_layer File "/home/rex/anaconda3/envs/jax/lib/python3.9/site-packages/keras/engine/base_layer.py", line 43, in <module> from keras.mixed_precision import loss_scale_optimizer File "/home/rex/anaconda3/envs/jax/lib/python3.9/site-packages/keras/mixed_precision/loss_scale_optimizer.py", line 20, in <module> from keras.optimizer_experimental import optimizer as optimizer_experimental File "/home/rex/anaconda3/envs/jax/lib/python3.9/site-packages/keras/optimizer_experimental/optimizer.py", line 649, in <module> tf.__internal__.saved_model.load.register_revived_type( File "/home/rex/anaconda3/envs/jax/lib/python3.9/site-packages/tensorflow/python/saved_model/revived_types.py", line 133, in register_revived_type raise AssertionError(f"Duplicate registrations for type '{identifier}'") AssertionError: Duplicate registrations for type 'experimentalOptimizer'

How to download kinetics dataset ?

Dear all,

We would like to run experiment similar to ViViT and token learner on kinetics dataset. How can we download the kinetics dataset and create the corresponding TFRecords. We could not find any documentation on this.

Adding OWL-ViT to HuggingFace Transformers

Hi,

I've implemented OWL-ViT as a fork of ๐Ÿค— HuggingFace Transformers, and we are planning to add it to the library soon (see huggingface/transformers#17938). Here's a notebook that illustrates inference with it: https://colab.research.google.com/drive/1IMPWZcnlMy-tdnTDrUcOZU3oiGg-hTem?usp=sharing

I really like the simplicity of OWL-ViT and there are so many potential use cases for open-vocabulary object detection, especially within the robotics community, so we are all excited to it add it to transformers.

As you may or may not know, each model on the HuggingFace hub has its own Github repository. For example, the OWL-ViT-base-patch32 checkpoint can be found here. If you check the "files and versions" tab, you can find the converted weights of the model. The model hub uses git-LFS (large file storage) to use Git with large files such as model weights. This means that any model has its own Git commit history!

A model card can also be added to the repo, which is just a README.

If you haven't done so, would you be interested in joining the Google organisation on the hub, such that we can store all model checkpoints there (rather than under my username)?

Let me know!

Kind regards,

Alara
ML Engineer @ HuggingFace

Fixing poor localization in OWL-ViT

Hi, I seem to be getting poor localization using huggingface OWL-ViT, and am wondering if there's anything I can do to improve it? For example:

import requests
from PIL import Image
import torch
from transformers import OwlViTProcessor, OwlViTForObjectDetection
import matplotlib.pyplot as plt
import matplotlib.patches as patches


processor = OwlViTProcessor.from_pretrained("google/owlvit-base-patch32")
model = OwlViTForObjectDetection.from_pretrained("google/owlvit-base-patch32")

url = "https://cf.ltkcdn.net/dogs/images/std-xs/284653-340x227-pembroke-welsh-corgi-exercise.jpg"
image = Image.open(requests.get(url, stream=True).raw)

texts = [["a photo of a corgi", "a photo of a tennis ball"]]
inputs = processor(text=texts, images=image, return_tensors="pt")
outputs = model(**inputs)

# Target image sizes (height, width) to rescale box predictions [batch_size, 2]
target_sizes = torch.Tensor([image.size[::-1]])
# Convert outputs (bounding boxes and class logits) to COCO API
results = processor.post_process(outputs=outputs, target_sizes=target_sizes)

i = 0  # Retrieve predictions for the first image for the corresponding text queries
text = texts[i]
boxes, scores, labels = results[i]["boxes"], results[i]["scores"], results[i]["labels"]


fig, ax = plt.subplots()
ax.imshow(image)
iw, ih = image.size

score_threshold = 0.1
for box, score, label in zip(boxes, scores, labels):
    box = [round(i, 2) for i in box.tolist()]
    if score >= score_threshold:
        x,y,w,h = box[0], box[1], box[2]-box[0], box[3]-box[1]
        print(f"Detected {text[label]} with confidence {round(score.item(), 3)} at location {box}")
        print(x,y,w,h)        
        rect = patches.Rectangle((x,y), w, h, linewidth=1, edgecolor='r', facecolor='none')
        ax.add_patch(rect)

plt.show()

image

variable "simple_tokenizer" is missing

In the 10 line in scenic/projects/baselines/clip/tokenizer.py:

   ` from clip.simple_tokenizer import SimpleTokenizer`

But there are no objects or files including "simple_tokenizer"

Data Loading for ViViT

Hi,

I followed the data processing instruction to prepare the tfrecord files, but I got this error when trying to start training.

2021-11-11 09:47:39.233273: W tensorflow/core/framework/op_kernel.cc:1692] OP_REQUIRES failed at example_parsing_ops.cc:480 : Invalid argument: Name: <unknown>, Key: image/encoded, Index: 0.  Number of values != expected.  values size: 100 but output shape: []
2021-11-11 09:47:39.233739: W tensorflow/core/framework/op_kernel.cc:1692] OP_REQUIRES failed at example_parsing_ops.cc:480 : Invalid argument: Name: <unknown>, Key: image/encoded, Index: 0.  Number of values != expected.  values size: 100 but output shape: []
2021-11-11 09:47:39.242622: W tensorflow/core/framework/op_kernel.cc:1692] OP_REQUIRES failed at example_parsing_ops.cc:480 : Invalid argument: Name: <unknown>, Key: image/encoded, Index: 0.  Number of values != expected.  values size: 100 but output shape: []
2021-11-11 09:47:39.251052: W tensorflow/core/framework/op_kernel.cc:1692] OP_REQUIRES failed at example_parsing_ops.cc:480 : Invalid argument: Name: <unknown>, Key: image/encoded, Index: 0.  Number of values != expected.  values size: 100 but output shape: []
    app.run(main=main)
  File "/workspace/scenic/app.py", line 64, in run
    app.run(functools.partial(_run_main, main=main))
  File "/usr/local/lib/python3.8/dist-packages/absl/app.py", line 303, in run
    _run_main(main, args)
  File "/usr/local/lib/python3.8/dist-packages/absl/app.py", line 251, in _run_main
    sys.exit(main(argv))
  File "/workspace/scenic/app.py", line 96, in _run_main
    main(rng=rng, config=FLAGS.config, workdir=FLAGS.workdir, writer=writer)
  File "scenic/projects/vivit/main.py", line 34, in main
    trainer(
  File "/workspace/scenic/projects/vivit/trainer.py", line 230, in train
    train_batch = next(dataset.train_iter)
  File "/usr/local/lib/python3.8/dist-packages/flax/jax_utils.py", line 147, in prefetch_to_device
    enqueue(size)  # Fill up the buffer.
  File "/usr/local/lib/python3.8/dist-packages/flax/jax_utils.py", line 144, in enqueue
    for data in itertools.islice(iterator, n):
  File "/workspace/scenic/projects/vivit/data/video_tfrecord_dataset.py", line 452, in <genexpr>
    current_ds_iterator = (
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/data/ops/iterator_ops.py", line 761, in __next__
    return self._next_internal()
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/data/ops/iterator_ops.py", line 744, in _next_internal
    ret = gen_dataset_ops.iterator_get_next(
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/gen_dataset_ops.py", line 2728, in iterator_get_next
    _ops.raise_from_not_ok_status(e, name)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/framework/ops.py", line 6941, in raise_from_not_ok_status
    six.raise_from(core._status_to_exception(e.code, message), None)
  File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.InvalidArgumentError: Name: <unknown>, Key: image/encoded, Index: 0.  Number of values != expected.  values size: 80 but output shape: []
	 [[{{node ParseSingleSequenceExample/ParseSequenceExample/ParseSequenceExampleV2}}]] [Op:IteratorGetNext]

Segmentation fault when running vivit experiment

Hi,

Thanks for sharing the code. I am trying to reproduce the vivit paper results for kinetics-400.
But I got segmentation fault when trying to run main.py. I am using 8 x 3090 with cuda 11.2
The jax version is 0.2.24, jaxlib version is 0.1.69, and tensorflow version is 2.7.0.
I've never used jax before, so it is not clear to me what causes this issue. Could anyone help here? The following is the exception traceback.

Thanks!

root@01b7ceb4a818:/workspace# python3 scenic/projects/vivit/main.py --config=scenic/projects/vivit/configs/kinetics400/vivit_base_k400.py --workdir=vivit_base_k400/
I1109 01:22:51.434930 139861215621312 xla_bridge.py:232] Unable to initialize backend 'tpu_driver': Not found: Unable to find driver in registry given worker: 
I1109 01:22:52.361684 139861215621312 xla_bridge.py:232] Unable to initialize backend 'tpu': Invalid argument: TpuPlatform is not available.
I1109 01:22:52.361946 139861215621312 app.py:80] JAX host: 0 / 1
I1109 01:22:52.362037 139861215621312 app.py:81] JAX devices: [GpuDevice(id=0, process_index=0), GpuDevice(id=1, process_index=0), GpuDevice(id=2, process_index=0), GpuDevice(id=3, process_index=0), GpuDevice(id=4, process_index=0), GpuDevice(id=5, process_index=0), GpuDevice(id=6, process_index=0), GpuDevice(id=7, process_index=0)]
I1109 01:22:52.362362 139861215621312 local.py:45] Setting task status: host_id: 0, host_count: 1
I1109 01:22:52.362540 139861215621312 local.py:51] Created artifact Workdir of type ArtifactType.DIRECTORY and value vivit_base_k400/.
I1109 01:22:52.853591 139861215621312 app.py:91] RNG: [0 0]
I1109 01:24:19.953434 139861215621312 train_utils.py:149] device_count: 8
I1109 01:24:19.954089 139861215621312 train_utils.py:150] num_hosts : 1
I1109 01:24:19.954144 139861215621312 train_utils.py:151] host_id : 0
Fatal Python error: Segmentation fault

Current thread 0x00007f33fa1330c0 (most recent call first):
  File "/opt/conda/lib/python3.7/site-packages/tensorflow/python/framework/load_library.py", line 58 in load_op_library
  File "/opt/conda/lib/python3.7/site-packages/tensorflow_text/python/metrics/text_similarity_metric_ops.py", line 28 in <module>
  File "<frozen importlib._bootstrap>", line 219 in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 728 in exec_module
  File "<frozen importlib._bootstrap>", line 677 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 967 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 983 in _find_and_load
  File "/opt/conda/lib/python3.7/site-packages/tensorflow_text/python/metrics/__init__.py", line 20 in <module>
  File "<frozen importlib._bootstrap>", line 219 in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 728 in exec_module
  File "<frozen importlib._bootstrap>", line 677 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 967 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 983 in _find_and_load
  File "<frozen importlib._bootstrap>", line 219 in _call_with_frames_removed
  File "<frozen importlib._bootstrap>", line 1035 in _handle_fromlist
  File "/opt/conda/lib/python3.7/site-packages/tensorflow_text/__init__.py", line 21 in <module>
  File "<frozen importlib._bootstrap>", line 219 in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 728 in exec_module
  File "<frozen importlib._bootstrap>", line 677 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 967 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 983 in _find_and_load
  File "/opt/conda/lib/python3.7/site-packages/dmvr/tokenizers.py", line 21 in <module>
  File "<frozen importlib._bootstrap>", line 219 in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 728 in exec_module
  File "<frozen importlib._bootstrap>", line 677 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 967 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 983 in _find_and_load
  File "<frozen importlib._bootstrap>", line 219 in _call_with_frames_removed
  File "<frozen importlib._bootstrap>", line 1035 in _handle_fromlist
  File "/opt/conda/lib/python3.7/site-packages/dmvr/processors.py", line 20 in <module>
  File "<frozen importlib._bootstrap>", line 219 in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 728 in exec_module
  File "<frozen importlib._bootstrap>", line 677 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 967 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 983 in _find_and_load
  File "<frozen importlib._bootstrap>", line 219 in _call_with_frames_removed
  File "<frozen importlib._bootstrap>", line 1035 in _handle_fromlist
  File "/opt/conda/lib/python3.7/site-packages/dmvr/modalities.py", line 21 in <module>
  File "<frozen importlib._bootstrap>", line 219 in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 728 in exec_module
  File "<frozen importlib._bootstrap>", line 677 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 967 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 983 in _find_and_load
  File "<frozen importlib._bootstrap>", line 219 in _call_with_frames_removed
  File "<frozen importlib._bootstrap>", line 1035 in _handle_fromlist
  File "/workspace/scenic/projects/vivit/data/video_tfrecord_dataset.py", line 9 in <module>
  File "<frozen importlib._bootstrap>", line 219 in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 728 in exec_module
  File "<frozen importlib._bootstrap>", line 677 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 967 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 983 in _find_and_load
  File "<frozen importlib._bootstrap>", line 1006 in _gcd_import
  File "/opt/conda/lib/python3.7/importlib/__init__.py", line 127 in import_module
  File "/workspace/scenic/dataset_lib/datasets.py", line 91 in get
  File "/workspace/scenic/dataset_lib/datasets.py", line 128 in get_dataset
  File "/workspace/scenic/train_lib/train_utils.py", line 153 in get_dataset
  File "scenic/projects/vivit/main.py", line 31 in main
  File "/workspace/scenic/app.py", line 96 in _run_main
  File "/opt/conda/lib/python3.7/site-packages/absl/app.py", line 251 in _run_main
  File "/opt/conda/lib/python3.7/site-packages/absl/app.py", line 303 in run
  File "/workspace/scenic/app.py", line 64 in run
  File "scenic/projects/vivit/main.py", line 44 in <module>
Segmentation fault (core dumped)

Incomplete Type Hints

Thanks for this great release, really dig the design split for scenic! I noticed that submodules are partially type hinted (e.g. common_lib is but dataset_lib isn't) and was wondering if it would be possible to fix via a copybara setting. Would make it great to reuse some of the submodules in personal projects.

Cheers!

OWL-ViT models pre-trained checkpoints loading problem

I have downloaded the pre-trained checkpoint for OWL-ViT and when I try to load it on my personal machine, as a part of OWL-ViT minimal example.ipynb, but I get this error:

ExtraData                                 Traceback (most recent call last)
Input In [7], in <cell line: 1>()
----> 1 variables = module.load_variables(config.init_from.checkpoint_path)

File /opt/conda/miniconda3/lib/python3.8/site-packages/scenic/projects/owl_vit/models.py:44, in TextZeroShotDetectionModule.load_variables(self, checkpoint_path)
     42 @nn.nowrap
     43 def load_variables(self, checkpoint_path: str) -> Mapping[str, Any]:
---> 44   restored = checkpoints.restore_checkpoint(checkpoint_path, target=None)
     45   return {'params': restored['optimizer']['target']}

File /opt/conda/miniconda3/lib/python3.8/site-packages/flax/training/checkpoints.py:456, in restore_checkpoint(ckpt_dir, target, step, prefix, parallel, gda_manager)
    453   else:
    454     checkpoint_contents = fp.read()
--> 456 state_dict = serialization.msgpack_restore(checkpoint_contents)
    457 state_dict = _restore_gdas(state_dict, target, ckpt_path, step, gda_manager)
    459 if target is None:

File /opt/conda/miniconda3/lib/python3.8/site-packages/flax/serialization.py:350, in msgpack_restore(encoded_pytree)
    337 def msgpack_restore(encoded_pytree: bytes):
    338   """Restore data structure from bytes in msgpack format.
    339 
    340   Low-level function that only supports python trees with array leaves,
   (...)
    348     and array leaves.
    349   """
--> 350   state_dict = msgpack.unpackb(
    351       encoded_pytree, ext_hook=_msgpack_ext_unpack, raw=False)
    352   return _unchunk_array_leaves_in_place(state_dict)

File msgpack/_unpacker.pyx:201, in msgpack._cmsgpack.unpackb()

ExtraData: unpack(b) received extra data.```

Also, here are the TF related libraries:

flax 0.5.3
jax 0.3.16
jaxlib 0.3.15
tensorboard 2.9.1
tensorboard-data-server 0.6.1
tensorboard-plugin-wit 1.8.1
tensorflow 2.9.1
tensorflow-addons 0.17.1
tensorflow-datasets 4.6.0
tensorflow-estimator 2.9.0
tensorflow-hub 0.12.0
tensorflow-io-gcs-filesystem 0.26.0
tensorflow-metadata 1.9.0
tensorflow-model-optimization 0.7.3
tensorflow-text 2.9.0
tensorstore 0.1.22

Any idea on what causes this problem?

Examples

Please share some example codes and tutorial codes on usage

Interpolation of positional embedding

Hi!

When you interpolated the positional embedding, did you reshape it into a 2d feature map and apply bilinear interpolation or just use a 1d linear interpilation directly on the embedding vector?

ViViT strange results using scenic checkpoints.

We are using ViViT base checkpoint and code provided in scenic and obtain strange inference results on K400 test set (no training, just inference from the checkpoint).

  • When we use the full K400 test set as the test set (~33k samples)
    Results: I0222 07:01:57.562714 140194510305024 logging_writer.py:35] [7650] test/accuracy=0.002614,test/accuracy_top_5=0.012980, test/loss=5.991464

  • We obtain similar results on the full test set using each of the following model's respective checkpoints: ViViT-B/16x2, ViViT-B/16x2 FE, ViViT-L/16x2 FE

  • eg. for ViViT-B/16x2 unfactorized we use the following checkpoint: https://storage.googleapis.com/scenic-bucket/vivit/kinetics_400/vivit_base_16x2_unfactorized/checkpoint

  • Similarly obtaining strange test results when using a small amount of the K400 training data as the test set.

  • If we train ViViT after loading a ViT checkpoint, the model does learn (accuracy climbs and loss decreases for training) however we similarly obtain strange results for inference on validation (2 orders of magnitude lower than train)

Platform: Linux environment with GPU's.
Data preprocessed using the indicated preprocessing instructions (using DMVR) including generating tfrecords.

Any help with figuring out what we might be missing would be greatly appreciated.

How to prepare the data for projects/mbt?

The dataset processing is unclear. The readme only shows "Additionally, pre-process the training dataset in the same way as done by the ViViT project here." And vivit refers the pre-processing to hmdb pro-processing in dmvr.

From this information, it is pretty hard for users to run the code. The only data processing provided is for hmdb. But MBT uses RGB+spectrogram. Could you please provide a bit of information on how to process the data so that users can run the code?

Thank you so much!

Question(s) on imagenet_dataset

(thanks for fixing #57 !)

I started playing around with ViT training by using imagneet_vit_config.py. However, I'm seeing the following error in dataset_lib which I'm not sure how to fix

ValueError: Requested slice [:50000] incompatible with 1000 examples.
The error occurs for validation set presumably because the library is demanding 50000 validation samples. Two question:

  • Any advice on how I can fix this issue on my ened
  • Why does the module specify slice size instead of just using plain validataion or train in dataset_lib.imagenet_dataset?

ERROR: ResolutionImpossible

I use conda to create python3.9, and then enter this directory and use command, "pip install . -i https://pypi.tuna.tsinghua.edu.cn/simple"(coz I'm in China), to install the environment. But something goes wrong.

Details are as follows:

$ conda create -n vivit python=3.9 -y
$ conda activate vivit
$ python -m pip install --upgrade pip -i https://pypi.tuna.tsinghua.edu.cn/simple
$ pip install . -i https://pypi.tuna.tsinghua.edu.cn/simple

......
Collecting jax>=0.2.21
  Using cached https://pypi.tuna.tsinghua.edu.cn/packages/61/0a/9aef811fa9a5a1f0eb7e3a05a97c6ed89d0956788a83c3195088049cc882/jax-0.2.28.tar.gz (887 kB)
  Preparing metadata (setup.py) ... done
INFO: pip is looking at multiple versions of immutabledict to determine which version is compatible with other requirements. This could take a while.
INFO: pip is looking at multiple versions of flax to determine which version is compatible with other requirements. This could take a while.
INFO: pip is looking at multiple versions of clu to determine which version is compatible with other requirements. This could take a while.
INFO: pip is looking at multiple versions of <Python from Requires-Python> to determine which version is compatible with other requirements. This could take a while.
INFO: pip is looking at multiple versions of absl-py to determine which version is compatible with other requirements. This could take a while.
INFO: pip is looking at multiple versions of scenic to determine which version is compatible with other requirements. This could take a while.
ERROR: Cannot install scenic because these package versions have conflicting dependencies.

The conflict is caused by:
    tf-models-nightly 2.8.0.dev20220217 depends on tf-nightly
    tf-models-nightly 2.8.0.dev20220216 depends on tf-nightly
    tf-models-nightly 2.8.0.dev20220215 depends on tf-nightly
    tf-models-nightly 2.8.0.dev20220214 depends on tf-nightly
    tf-models-nightly 2.8.0.dev20220213 depends on tf-nightly
    tf-models-nightly 2.8.0.dev20220212 depends on tf-nightly
    tf-models-nightly 2.8.0.dev20220211 depends on tf-nightly
    tf-models-nightly 2.7.0.dev20220210 depends on tf-nightly
    tf-models-nightly 2.7.0.dev20220209 depends on tf-nightly
    tf-models-nightly 2.7.0.dev20220208 depends on tf-nightly
    tf-models-nightly 2.7.0.dev20220207 depends on tf-nightly
    tf-models-nightly 2.7.0.dev20220206 depends on tf-nightly
    tf-models-nightly 2.7.0.dev20220205 depends on tf-nightly
    tf-models-nightly 2.7.0.dev20220204 depends on tf-nightly
    tf-models-nightly 2.7.0.dev20220203 depends on tf-nightly
    tf-models-nightly 2.7.0.dev20220202 depends on tf-nightly
    tf-models-nightly 2.7.0.dev20220201 depends on tf-nightly
    tf-models-nightly 2.7.0.dev20220131 depends on tf-nightly
    tf-models-nightly 2.7.0.dev20220130 depends on tf-nightly
    tf-models-nightly 2.7.0.dev20220129 depends on tf-nightly
    tf-models-nightly 2.7.0.dev20220128 depends on tf-nightly
    tf-models-nightly 2.7.0.dev20220127 depends on tf-nightly
    tf-models-nightly 2.7.0.dev20220126 depends on tf-nightly
    tf-models-nightly 2.7.0.dev20220125 depends on tf-nightly
    tf-models-nightly 2.7.0.dev20220124 depends on tf-nightly
    tf-models-nightly 2.7.0.dev20220123 depends on tf-nightly
    tf-models-nightly 2.7.0.dev20220122 depends on tf-nightly
    tf-models-nightly 2.7.0.dev20220121 depends on tf-nightly
    tf-models-nightly 2.7.0.dev20220120 depends on tf-nightly
    tf-models-nightly 2.7.0.dev20220119 depends on tf-nightly
    tf-models-nightly 2.7.0.dev20220118 depends on tf-nightly
    tf-models-nightly 2.7.0.dev20220117 depends on tf-nightly
    tf-models-nightly 2.7.0.dev20220116 depends on tf-nightly
    tf-models-nightly 2.7.0.dev20220115 depends on tf-nightly
    tf-models-nightly 2.7.0.dev20220114 depends on tf-nightly
    tf-models-nightly 2.7.0.dev20220113 depends on tf-nightly
    tf-models-nightly 2.7.0.dev20220112 depends on tf-nightly
    tf-models-nightly 2.7.0.dev20220111 depends on tf-nightly
    tf-models-nightly 2.7.0.dev20220110 depends on tf-nightly
    tf-models-nightly 2.7.0.dev20220109 depends on tf-nightly
    tf-models-nightly 2.7.0.dev20220108 depends on tf-nightly
    tf-models-nightly 2.7.0.dev20220107 depends on tf-nightly
    tf-models-nightly 2.7.0.dev20220106 depends on tf-nightly
    tf-models-nightly 2.7.0.dev20220105 depends on tf-nightly
    tf-models-nightly 2.7.0.dev20220104 depends on tf-nightly
    tf-models-nightly 2.7.0.dev20220103 depends on tf-nightly
    tf-models-nightly 2.7.0.dev20220102 depends on tf-nightly
    tf-models-nightly 2.7.0.dev20220101 depends on tf-nightly
    tf-models-nightly 2.7.0.dev20211231 depends on tf-nightly
    tf-models-nightly 2.7.0.dev20211230 depends on tf-nightly
    tf-models-nightly 2.7.0.dev20211229 depends on tf-nightly
    tf-models-nightly 2.7.0.dev20211228 depends on tf-nightly
    tf-models-nightly 2.7.0.dev20211227 depends on tf-nightly
    tf-models-nightly 2.7.0.dev20211226 depends on tf-nightly
    tf-models-nightly 2.7.0.dev20211225 depends on tf-nightly
    tf-models-nightly 2.7.0.dev20211224 depends on tf-nightly
    tf-models-nightly 2.7.0.dev20211223 depends on tf-nightly
    tf-models-nightly 2.7.0.dev20211222 depends on tf-nightly
    tf-models-nightly 2.7.0.dev20211221 depends on tf-nightly
    tf-models-nightly 2.7.0.dev20211220 depends on tf-nightly
    tf-models-nightly 2.7.0.dev20211219 depends on tf-nightly
    tf-models-nightly 2.7.0.dev20211218 depends on tf-nightly
    tf-models-nightly 2.7.0.dev20211217 depends on tf-nightly
    tf-models-nightly 2.7.0.dev20211216 depends on tf-nightly
    tf-models-nightly 2.7.0.dev20211215 depends on tf-nightly
    tf-models-nightly 2.7.0.dev20211214 depends on tf-nightly
    tf-models-nightly 2.7.0.dev20211213 depends on tf-nightly
    tf-models-nightly 2.7.0.dev20211212 depends on tf-nightly
    tf-models-nightly 2.7.0.dev20211211 depends on tf-nightly
    tf-models-nightly 2.7.0.dev20211210 depends on tf-nightly
    tf-models-nightly 2.7.0.dev20211209 depends on tf-nightly
    tf-models-nightly 2.7.0.dev20211208 depends on tf-nightly
    tf-models-nightly 2.7.0.dev20211207 depends on tf-nightly
    tf-models-nightly 2.7.0.dev20211206 depends on tf-nightly
    tf-models-nightly 2.7.0.dev20211205 depends on tf-nightly
    tf-models-nightly 2.7.0.dev20211204 depends on tf-nightly
    tf-models-nightly 2.7.0.dev20211203 depends on tf-nightly
    tf-models-nightly 2.7.0.dev20211202 depends on tf-nightly
    tf-models-nightly 2.7.0.dev20211201 depends on tf-nightly
    tf-models-nightly 2.7.0.dev20211130 depends on tf-nightly
    tf-models-nightly 2.7.0.dev20211129 depends on tf-nightly
    tf-models-nightly 2.7.0.dev20211128 depends on tf-nightly
    tf-models-nightly 2.7.0.dev20211127 depends on tf-nightly
    tf-models-nightly 2.7.0.dev20211126 depends on tf-nightly
    tf-models-nightly 2.7.0.dev20211125 depends on tf-nightly
    tf-models-nightly 2.7.0.dev20211124 depends on tf-nightly
    tf-models-nightly 2.7.0.dev20211123 depends on tf-nightly
    tf-models-nightly 2.7.0.dev20211122 depends on tf-nightly
    tf-models-nightly 2.7.0.dev20211121 depends on tf-nightly
    tf-models-nightly 2.7.0.dev20211120 depends on tf-nightly
    tf-models-nightly 2.7.0.dev20211119 depends on tf-nightly
    tf-models-nightly 2.7.0.dev20211118 depends on tf-nightly
    tf-models-nightly 2.7.0.dev20211117 depends on tf-nightly

To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip attempt to solve the dependency conflict

ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/topics/dependency-resolution/#dealing-with-dependency-conflicts

What should I do to fix this? I don't know how to balance the versions of all dependecies, which I guess can solve it.

ViT ImageNet Pretrained Model Problem with ViViT

Hey,

I download the pretrained ImageNet21k ViT b_16 model from the urls mentioned in the configuration files, and replace the path in the config file, but for both scenic and bigvision versions I face some problems.

When I put the path to scenic pretrained version, it gives the following error:
raise ValueError('No checkpoint for the pretrained model is found in: '
ValueError: No checkpoint for the pretrained model is found in: /content/ViT_B_16_ImageNet21k

When I put the path to bigvision pretrained version, this happens:
File "/usr/local/lib/python3.7/dist-packages/scenic/train_lib/pretrain_utils.py", line 269, in convert_bigvision_to_scenic_checkpoint
restored_params = tree['opt']['target']
KeyError: 'opt'

Can you please tell me what should I do?

Thanks in advance.

Questions about Table 1 in the paper owl-vit

In Table 1, you have "backbone" and "Image-Level" models to encode visual features (ViT-B/32 and CLIP in Row9, etc.). And according to the paper, the backbone of OWL-ViT should be the same as Image-Level model, which is further finetuned on the detection dataset.
I have two questions here:

  • Are the Image-Level models different if the corresponding backbones are different? E.g., in Row9 and Row10, Image-Level models are both CLIP, however, their backbones are different (VIT-B/32 and ViT-B/16). In my opinion, the visual models of CLIP used in Row9 and Row10 should be different.
  • Do you directly use the pre-trained models (CLIP/LiT) as Image-Level models or re-train the models in your side if you change the structure of the original models?

Input Data for ViViT

Hello,

I followed the data processing instruction to prepare the tfrecord files, but I am getting the following error.

2021-12-18 12:30:38.113382: W tensorflow/core/framework/op_kernel.cc:1745] OP_REQUIRES failed at example_parsing_ops.cc:480 : INVALID_ARGUMENT: Name: <unknown>, Key: image/encoded, Index: 0.  Number of values != expected.  values size: 36 but output shape: []
2021-12-18 12:30:38.113829: W tensorflow/core/framework/op_kernel.cc:1745] OP_REQUIRES failed at example_parsing_ops.cc:480 : INVALID_ARGUMENT: Name: <unknown>, Key: image/encoded, Index: 0.  Number of values != expected.  values size: 36 but output shape: []
2021-12-18 12:30:38.114214: W tensorflow/core/framework/op_kernel.cc:1745] OP_REQUIRES failed at example_parsing_ops.cc:480 : INVALID_ARGUMENT: Name: <unknown>, Key: image/encoded, Index: 0.  Number of values != expected.  values size: 100 but output shape: []
2021-12-18 12:30:38.114425: W tensorflow/core/framework/op_kernel.cc:1745] OP_REQUIRES failed at example_parsing_ops.cc:480 : INVALID_ARGUMENT: Name: <unknown>, Key: image/encoded, Index: 0.  Number of values != expected.  values size: 100 but output shape: []
Traceback (most recent call last):
  File "/content/scenic/scenic/projects/vivit/main.py", line 44, in <module>
    app.run(main=main)
  File "/usr/local/lib/python3.7/dist-packages/scenic/app.py", line 64, in run
    app.run(functools.partial(_run_main, main=main))
  File "/usr/local/lib/python3.7/dist-packages/absl/app.py", line 303, in run
    _run_main(main, args)
2021-12-18 12:30:38.116240: W tensorflow/core/framework/op_kernel.cc:1745] OP_REQUIRES failed at example_parsing_ops.cc:480 : INVALID_ARGUMENT: Name: <unknown>, Key: image/encoded, Index: 0.  Number of values != expected.  values size: 36 but output shape: []
  File "/usr/local/lib/python3.7/dist-packages/absl/app.py", line 251, in _run_main
    sys.exit(main(argv))
  File "/usr/local/lib/python3.7/dist-packages/scenic/app.py", line 96, in _run_main
    main(rng=rng, config=FLAGS.config, workdir=FLAGS.workdir, writer=writer)
2021-12-18 12:30:38.117261: W tensorflow/core/framework/op_kernel.cc:1745] OP_REQUIRES failed at example_parsing_ops.cc:480 : INVALID_ARGUMENT: Name: <unknown>, Key: image/encoded, Index: 0.  Number of values != expected.  values size: 100 but output shape: []
2021-12-18 12:30:38.117745: W tensorflow/core/framework/op_kernel.cc:1745] OP_REQUIRES failed at example_parsing_ops.cc:480 : INVALID_ARGUMENT: Name: <unknown>, Key: image/encoded, Index: 0.  Number of values != expected.  values size: 100 but output shape: []
2021-12-18 12:30:38.117916: W tensorflow/core/framework/op_kernel.cc:1745] OP_REQUIRES failed at example_parsing_ops.cc:480 : INVALID_ARGUMENT: Name: <unknown>, Key: image/encoded, Index: 0.  Number of values != expected.  values size: 100 but output shape: []
  File "/content/scenic/scenic/projects/vivit/main.py", line 40, in main
    writer=writer)
2021-12-18 12:30:38.118571: W tensorflow/core/framework/op_kernel.cc:1745] OP_REQUIRES failed at example_parsing_ops.cc:480 : INVALID_ARGUMENT: Name: <unknown>, Key: image/encoded, Index: 0.  Number of values != expected.  values size: 100 but output shape: []
  File "/usr/local/lib/python3.7/dist-packages/scenic/projects/vivit/trainer.py", line 230, in train
    train_batch = next(dataset.train_iter)
  File "/usr/local/lib/python3.7/dist-packages/flax/jax_utils.py", line 147, in prefetch_to_device
    enqueue(size)  # Fill up the buffer.
  File "/usr/local/lib/python3.7/dist-packages/flax/jax_utils.py", line 144, in enqueue
    for data in itertools.islice(iterator, n):
  File "/usr/local/lib/python3.7/dist-packages/scenic/projects/vivit/data/video_tfrecord_dataset.py", line 453, in <genexpr>
    map_keys(dataset_utils.tf_to_numpy(data)) for data in iter(dataset)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/data/ops/iterator_ops.py", line 800, in __next__
    return self._next_internal()
  File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/data/ops/iterator_ops.py", line 786, in _next_internal
    output_shapes=self._flat_output_shapes)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/ops/gen_dataset_ops.py", line 2845, in iterator_get_next
2021-12-18 12:30:38.122323: W tensorflow/core/framework/op_kernel.cc:1745] OP_REQUIRES failed at example_parsing_ops.cc:480 : INVALID_ARGUMENT: Name: <unknown>, Key: image/encoded, Index: 0.  Number of values != expected.  values size: 95 but output shape: []
2021-12-18 12:30:38.122700: W tensorflow/core/framework/op_kernel.cc:1745] OP_REQUIRES failed at example_parsing_ops.cc:480 : INVALID_ARGUMENT: Name: <unknown>, Key: image/encoded, Index: 0.  Number of values != expected.  values size: 100 but output shape: []
    _ops.raise_from_not_ok_status(e, name)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/ops.py", line 7107, in raise_from_not_ok_status
2021-12-18 12:30:38.126788: W tensorflow/core/framework/op_kernel.cc:1745] OP_REQUIRES failed at example_parsing_ops.cc:480 : INVALID_ARGUMENT: Name: <unknown>, Key: image/encoded, Index: 0.  Number of values != expected.  values size: 36 but output shape: []
2021-12-18 12:30:38.127206: W tensorflow/core/framework/op_kernel.cc:1745] OP_REQUIRES failed at example_parsing_ops.cc:480 : INVALID_ARGUMENT: Name: <unknown>, Key: image/encoded, Index: 0.  Number of values != expected.  values size: 36 but output shape: []
2021-12-18 12:30:38.127677: W tensorflow/core/framework/op_kernel.cc:1745] OP_REQUIRES failed at example_parsing_ops.cc:480 : INVALID_ARGUMENT: Name: <unknown>, Key: image/encoded, Index: 0.  Number of values != expected.  values size: 100 but output shape: []
2021-12-18 12:30:38.127769: W tensorflow/core/framework/op_kernel.cc:1745] OP_REQUIRES failed at example_parsing_ops.cc:480 : INVALID_ARGUMENT: Name: <unknown>, Key: image/encoded, Index: 0.  Number of values != expected.  values size: 95 but output shape: []
    raise core._status_to_exception(e) from None  # pylint: disable=protected-access
tensorflow.python.framework.errors_impl.InvalidArgumentError: Name: <unknown>, Key: image/encoded, Index: 0.  Number of values != expected.  values size: 36 but output shape: []
	 [[{{node ParseSingleSequenceExample/ParseSequenceExample/ParseSequenceExampleV2}}]] [Op:IteratorGetNext]
2021-12-18 12:30:38.132332: W tensorflow/core/framework/op_kernel.cc:1745] OP_REQUIRES failed at example_parsing_ops.cc:480 : INVALID_ARGUMENT: Name: <unknown>, Key: image/encoded, Index: 0.  Number of values != expected.  values size: 95 but output shape: []
2021-12-18 12:30:38.134191: W tensorflow/core/framework/op_kernel.cc:1745] OP_REQUIRES failed at example_parsing_ops.cc:480 : INVALID_ARGUMENT: Name: <unknown>, Key: image/encoded, Index: 0.  Number of values != expected.  values size: 100 but output shape: []
2021-12-18 12:30:38.135281: W tensorflow/core/framework/op_kernel.cc:1745] OP_REQUIRES failed at example_parsing_ops.cc:480 : INVALID_ARGUMENT: Name: <unknown>, Key: image/encoded, Index: 0.  Number of values != expected.  values size: 100 but output shape: []
2021-12-18 12:30:38.135846: W tensorflow/core/framework/op_kernel.cc:1745] OP_REQUIRES failed at example_parsing_ops.cc:480 : INVALID_ARGUMENT: Name: <unknown>, Key: image/encoded, Index: 0.  Number of values != expected.  values size: 100 but output shape: []

I ran the test_video_tfrecord_dataset.py and it gave the following result

Ran 4 tests in 0.113s

OK

Working with Imagenet-21K

Hi,

Thanks again for this wonderful library!
This is not really an issue - sort of a feature request.
What would be the best way to work with ImageNet-21K using Scenic?
TFDS support only the ImageNet-1K dataset.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.