Giter Club home page Giter Club logo

clip-tf2's People


robertbiehl avatar


 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar


 avatar  avatar  avatar  avatar  avatar  avatar  avatar

clip-tf2's Issues

Conversion failed

Hi, thank you for providing this conversion script. I am running into this error, not sure if what's the issue here:

Classify image:
Text options: ['a diagram', 'a dog', 'a cat', 'a neural network']
Pytorch: [[0.2441   0.003271 0.000827 0.752   ]]
Tensorflow: [[0.24351107 0.00320389 0.0008252  0.7524599 ]]
Traceback (most recent call last):
  File "", line 64, in <module>
  File "/opt/conda/lib/python3.7/site-packages/absl/", line 312, in run
    _run_main(main, args)
  File "/opt/conda/lib/python3.7/site-packages/absl/", line 258, in _run_main
  File "", line 40, in main
    converter.verify(FLAGS.model, model, image_url, text_options, verbose=True)
  File "/home/jupyter/CLIP-tf2/converter/", line 233, in verify
    assert np.abs(torch_probs - tf_probs).sum() < 1e-3, f"PyTorch and Tensorflow results should be almost equal: torch_probs={torch_probs}, tf_probs={tf_probs}"
AssertionError: PyTorch and Tensorflow results should be almost equal: torch_probs=[[0.2441   0.003271 0.000827 0.752   ]], tf_probs=[[0.24351107 0.00320389 0.0008252  0.7524599 ]]

My command looks like this:
python --model RN50 --output models/CLIP_{model}

Any help would be appreciated, thanks!

Text encoder input shape

When saving only the text encoder and trying to encode text I get this error:

ValueError: Exception encountered when calling layer 'visual' (type VisualTransformer).

Could not find matching concrete function to call loaded from the SavedModel. Got:
  Positional arguments (1 total):
    * <tf.Tensor 'x:0' shape=(1, 77) dtype=int32>
  Keyword arguments: {'training': False}

 Expected these arguments to match one of the following 4 option(s):

Option 1:
  Positional arguments (1 total):
    * TensorSpec(shape=(None, None, None, 3), dtype=tf.float32, name='x')
  Keyword arguments: {'training': False}

Option 2:
  Positional arguments (1 total):
    * TensorSpec(shape=(None, None, None, 3), dtype=tf.float32, name='x')
  Keyword arguments: {'training': True}

Option 3:
  Positional arguments (1 total):
    * TensorSpec(shape=(None, None, None, 3), dtype=tf.float32, name='input_1')
  Keyword arguments: {'training': False}

Option 4:
  Positional arguments (1 total):
    * TensorSpec(shape=(None, None, None, 3), dtype=tf.float32, name='input_1')
  Keyword arguments: {'training': True}

Call arguments received by layer 'visual' (type VisualTransformer):
  • args=('tensor([[49406,  1628, 49407,     0,     0,     0,     0,     0,     0,     0,\n             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,\n             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,\n             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,\n             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,\n             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,\n             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,\n             0,     0,     0,     0,     0,     0,     0]], dtype=torch.int32)',)
  • kwargs=<class 'inspect._empty'>

The saved model is generated with python3 --model ViT-B/32 --text_output models/CLIP_text_{model} and loaded with model = tf.keras.models.load_model("CLIP_text_ViT-B_32")

This seems to me like the typical image dimensions and not the text encoder input. Furthermore, the model is of the class VisualTransformer, which doesn't seem right. I would appreciate any help :)

Assuming I get the text encoder working, is there any way to get the tokenizer into the Keras saved model as well? I would like to serve it using tfserve so preferably the only input would be a string in the end.

CLIP_image_ViT-L_14 image encoder unequal dimensions.

After converting the L_14 model using python3 --model ViT-L/14 --image_output models/CLIP_image_ViT-L_14/ and loading it using image_encoder = tf.keras.models.load_model("CLIP_image_ViT-L_14/", compile=False) it fails to embed an image with this error:

In [5]: image_encoder(np.random.random((1, 480, 480 ,3)))
ValueError                                Traceback (most recent call last)
Cell In[5], line 1
----> 1 image_encoder(np.random.random((1, 480, 480 ,3)))

File ~/.local/lib/python3.10/site-packages/keras/utils/, in filter_traceback.<locals>.error_handler(*args, **kwargs)
     67     filtered_tb = _process_traceback_frames(e.__traceback__)
     68     # To get the full stack trace, call:
     69     # `tf.debugging.disable_traceback_filtering()`
---> 70     raise e.with_traceback(filtered_tb) from None
     71 finally:
     72     del filtered_tb

File ~/.local/lib/python3.10/site-packages/tensorflow/python/framework/, in _create_c_op(graph, node_def, inputs, control_inputs, op_def, extract_traceback)
   1964   c_op = pywrap_tf_session.TF_FinishOperation(op_desc)
   1965 except errors.InvalidArgumentError as e:
   1966   # Convert to ValueError for backwards compatibility.
-> 1967   raise ValueError(e.message)
   1969 # Record the current Python stack trace as the creating stacktrace of this
   1970 # TF_Operation.
   1971 if extract_traceback:

ValueError: Exception encountered when calling layer 'visual' (type VisualTransformer).

Dimensions must be equal, but are 1157 and 257 for '{{node add}} = AddV2[T=DT_FLOAT](concat, add/ReadVariableOp)' with input shapes: [1,1157,1024], [257,1024].

Call arguments received by layer 'visual' (type VisualTransformer):
  • args=('tf.Tensor(shape=(1, 480, 480, 3), dtype=float32)',)
  • kwargs=<class 'inspect._empty'>

However the converting itself throws no error even though this also performs validation.

Have a problem when I convert the tf2 pb file to tf1.15 pb file, could you help me? Thanks.

I need the tensorflow 1.x pd file of clip text encoder, so I have to convert the TF 2.x pb file. Here is the code:
---------------------------------- code start -------------------------

import os
import tensorflow as tf
from tensorflow import keras
from tensorflow.python.framework.convert_to_constants import convert_variables_to_constants_v2
import pdb

TF2_model_input = 'CLIP_RN50' # the input path
TF115_model_output = 'dir_for_saving_TF1.15_pb'
#os.makedirs(TF115_model_output, exist_ok = True)

model = tf.keras.models.load_model(TF2_model_input) # read the model ckpt in TF2.8

full_model = tf.function(lambda serving_default_image,serving_default_text: # change the model to concrete function model(serving_default_image,serving_default_text))
full_model = full_model.get_concrete_function(tf.TensorSpec(shape=(None, None, None, 3), dtype=tf.float32, name='input/0'),tf.TensorSpec(shape=(None, None, None), dtype=tf.int64, name='input/1')) # Note: if the input information is not in the inherited tf.keras.Model, the Concrete Function is needed to define the input information via TensorSpec.

frozen_func = convert_variables_to_constants_v2(full_model) # change the model parameters to constants

frozen_func.graph.as_graph_def() # change the model graph to graph def

layers = [ for op in frozen_func.graph.get_operations()] # debug, check the parameters and parameter names (needed in TF1.15)
for layer in layers:
print("Frozen model inputs: ", frozen_func.inputs)
print("Frozen model outputs: ", frozen_func.outputs), # save the Frozen graph using TF1.15
------------------------------------------------- Code ende -------------------------------------------------
but, it gives me an error:
=========================== Error ==================================
WARNING:tensorflow:No training configuration found in save file, so the model was not compiled. Compile it manually.
Traceback (most recent call last):
File "", line 19, in
full_model = full_model.get_concrete_function(tf.TensorSpec(shape=(None, None, None, 3), dtype=tf.float32, name='input/0'),tf.TensorSpec(shape=(None, None, None), dtype=tf.int64, name='input/1'))
File "/218019043/software/anaconda3/envs/dassl/lib/python3.7/site-packages/tensorflow/python/eager/", line 1264, in get_concrete_function
concrete = self._get_concrete_function_garbage_collected(*args, **kwargs)
File "/218019043/software/anaconda3/envs/dassl/lib/python3.7/site-packages/tensorflow/python/eager/", line 1244, in _get_concrete_function_garbage_collected
self._initialize(args, kwargs, add_initializers_to=initializers)
File "/218019043/software/anaconda3/envs/dassl/lib/python3.7/site-packages/tensorflow/python/eager/", line 786, in _initialize
*args, **kwds))
File "/218019043/software/anaconda3/envs/dassl/lib/python3.7/site-packages/tensorflow/python/eager/", line 2983, in _get_concrete_function_internal_garbage_collected
graph_function, _ = self._maybe_define_function(args, kwargs)
File "/218019043/software/anaconda3/envs/dassl/lib/python3.7/site-packages/tensorflow/python/eager/", line 3292, in _maybe_define_function
graph_function = self._create_graph_function(args, kwargs)
File "/218019043/software/anaconda3/envs/dassl/lib/python3.7/site-packages/tensorflow/python/eager/", line 3140, in _create_graph_function
File "/218019043/software/anaconda3/envs/dassl/lib/python3.7/site-packages/tensorflow/python/framework/", line 1161, in func_graph_from_py_func
func_outputs = python_func(*func_args, **func_kwargs)
File "/218019043/software/anaconda3/envs/dassl/lib/python3.7/site-packages/tensorflow/python/eager/", line 677, in wrapped_fn
out = weak_wrapped_fn().wrapped(*args, **kwds)
File "/218019043/software/anaconda3/envs/dassl/lib/python3.7/site-packages/tensorflow/python/framework/", line 1147, in autograph_handler
raise e.ag_error_metadata.to_exception(e)
ValueError: in user code:

File "", line 18, in None  *
    lambda serving_default_image,serving_default_text: model(serving_default_image,serving_default_text))
File "/218019043/software/anaconda3/envs/dassl/lib/python3.7/site-packages/keras/utils/", line 67, in error_handler  **
    raise e.with_traceback(filtered_tb) from None
File "/218019043/software/anaconda3/envs/dassl/lib/python3.7/site-packages/keras/saving/saved_model/", line 166, in replace_training_and_call
    return wrapped_call(*args, **kwargs)

ValueError: Exception encountered when calling layer "clip" (type CLIP).

Could not find matching concrete function to call loaded from the SavedModel. Got:
  Positional arguments (2 total):
    * Tensor("input:0", shape=(None, None, None, 3), dtype=float32)
    * True
  Keyword arguments: {}

 Expected these arguments to match one of the following 4 option(s):

Option 1:
  Positional arguments (2 total):
    * (TensorSpec(shape=(None, None, None, 3), dtype=tf.float32, name='input/0'), TensorSpec(shape=(None, None, None), dtype=tf.int64, name='input/1'))
    * False
  Keyword arguments: {}

Option 2:
  Positional arguments (2 total):
    * (TensorSpec(shape=(None, None, None, 3), dtype=tf.float32, name='input/0'), TensorSpec(shape=(None, None, None), dtype=tf.int64, name='input/1'))
    * True
  Keyword arguments: {}

Option 3:
  Positional arguments (2 total):
    * (TensorSpec(shape=(None, None, None, 3), dtype=tf.float32, name='image'), TensorSpec(shape=(None, None, None), dtype=tf.int64, name='text'))
    * False
  Keyword arguments: {}

Option 4:
  Positional arguments (2 total):
    * (TensorSpec(shape=(None, None, None, 3), dtype=tf.float32, name='image'), TensorSpec(shape=(None, None, None), dtype=tf.int64, name='text'))
    * True
  Keyword arguments: {}

Call arguments received:
  • args=('tf.Tensor(shape=(None, None, None, 3), dtype=float32)', 'tf.Tensor(shape=(None, None, None), dtype=int64)')
  • kwargs=<class 'inspect._empty'>

===========================Error End==============================

Thanks a lot

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.