Giter Club home page Giter Club logo

clip-tf2's Issues

CLIP_image_ViT-L_14 image encoder unequal dimensions.

After converting the L_14 model using python3 convert_clip.py --model ViT-L/14 --image_output models/CLIP_image_ViT-L_14/ and loading it using image_encoder = tf.keras.models.load_model("CLIP_image_ViT-L_14/", compile=False) it fails to embed an image with this error:

In [5]: image_encoder(np.random.random((1, 480, 480 ,3)))
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[5], line 1
----> 1 image_encoder(np.random.random((1, 480, 480 ,3)))

File ~/.local/lib/python3.10/site-packages/keras/utils/traceback_utils.py:70, in filter_traceback.<locals>.error_handler(*args, **kwargs)
     67     filtered_tb = _process_traceback_frames(e.__traceback__)
     68     # To get the full stack trace, call:
     69     # `tf.debugging.disable_traceback_filtering()`
---> 70     raise e.with_traceback(filtered_tb) from None
     71 finally:
     72     del filtered_tb

File ~/.local/lib/python3.10/site-packages/tensorflow/python/framework/ops.py:1967, in _create_c_op(graph, node_def, inputs, control_inputs, op_def, extract_traceback)
   1964   c_op = pywrap_tf_session.TF_FinishOperation(op_desc)
   1965 except errors.InvalidArgumentError as e:
   1966   # Convert to ValueError for backwards compatibility.
-> 1967   raise ValueError(e.message)
   1969 # Record the current Python stack trace as the creating stacktrace of this
   1970 # TF_Operation.
   1971 if extract_traceback:

ValueError: Exception encountered when calling layer 'visual' (type VisualTransformer).

Dimensions must be equal, but are 1157 and 257 for '{{node add}} = AddV2[T=DT_FLOAT](concat, add/ReadVariableOp)' with input shapes: [1,1157,1024], [257,1024].

Call arguments received by layer 'visual' (type VisualTransformer):
  • args=('tf.Tensor(shape=(1, 480, 480, 3), dtype=float32)',)
  • kwargs=<class 'inspect._empty'>

However the converting itself throws no error even though this also performs validation.

Text encoder input shape

When saving only the text encoder and trying to encode text I get this error:

ValueError: Exception encountered when calling layer 'visual' (type VisualTransformer).

Could not find matching concrete function to call loaded from the SavedModel. Got:
  Positional arguments (1 total):
    * <tf.Tensor 'x:0' shape=(1, 77) dtype=int32>
  Keyword arguments: {'training': False}

 Expected these arguments to match one of the following 4 option(s):

Option 1:
  Positional arguments (1 total):
    * TensorSpec(shape=(None, None, None, 3), dtype=tf.float32, name='x')
  Keyword arguments: {'training': False}

Option 2:
  Positional arguments (1 total):
    * TensorSpec(shape=(None, None, None, 3), dtype=tf.float32, name='x')
  Keyword arguments: {'training': True}

Option 3:
  Positional arguments (1 total):
    * TensorSpec(shape=(None, None, None, 3), dtype=tf.float32, name='input_1')
  Keyword arguments: {'training': False}

Option 4:
  Positional arguments (1 total):
    * TensorSpec(shape=(None, None, None, 3), dtype=tf.float32, name='input_1')
  Keyword arguments: {'training': True}

Call arguments received by layer 'visual' (type VisualTransformer):
  • args=('tensor([[49406,  1628, 49407,     0,     0,     0,     0,     0,     0,     0,\n             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,\n             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,\n             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,\n             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,\n             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,\n             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,\n             0,     0,     0,     0,     0,     0,     0]], dtype=torch.int32)',)
  • kwargs=<class 'inspect._empty'>

The saved model is generated with python3 convert_clip.py --model ViT-B/32 --text_output models/CLIP_text_{model} and loaded with model = tf.keras.models.load_model("CLIP_text_ViT-B_32")

This seems to me like the typical image dimensions and not the text encoder input. Furthermore, the model is of the class VisualTransformer, which doesn't seem right. I would appreciate any help :)

Assuming I get the text encoder working, is there any way to get the tokenizer into the Keras saved model as well? I would like to serve it using tfserve so preferably the only input would be a string in the end.

Have a problem when I convert the tf2 pb file to tf1.15 pb file, could you help me? Thanks.

I need the tensorflow 1.x pd file of clip text encoder, so I have to convert the TF 2.x pb file. Here is the code:
---------------------------------- code start -------------------------

import os
import tensorflow as tf
from tensorflow import keras
from tensorflow.python.framework.convert_to_constants import convert_variables_to_constants_v2
import pdb

TF2_model_input = 'CLIP_RN50' # the input path
TF115_model_output = 'dir_for_saving_TF1.15_pb'
#os.makedirs(TF115_model_output, exist_ok = True)

model = tf.keras.models.load_model(TF2_model_input) # read the model ckpt in TF2.8

full_model = tf.function(lambda serving_default_image,serving_default_text: # change the model to concrete function model(serving_default_image,serving_default_text))
full_model = full_model.get_concrete_function(tf.TensorSpec(shape=(None, None, None, 3), dtype=tf.float32, name='input/0'),tf.TensorSpec(shape=(None, None, None), dtype=tf.int64, name='input/1')) # Note: if the input information is not in the inherited tf.keras.Model, the Concrete Function is needed to define the input information via TensorSpec.

frozen_func = convert_variables_to_constants_v2(full_model) # change the model parameters to constants

frozen_func.graph.as_graph_def() # change the model graph to graph def

layers = [op.name for op in frozen_func.graph.get_operations()] # debug, check the parameters and parameter names (needed in TF1.15)
for layer in layers:
print(layer)
print("Frozen model inputs: ", frozen_func.inputs)
print("Frozen model outputs: ", frozen_func.outputs)

tf.io.write_graph(graph_or_graph_def=frozen_func.graph, # save the Frozen graph using TF1.15
logdir=frozen_out_path,
name="tf1.15_frozen_graph_model.pb",
as_text=False)
------------------------------------------------- Code ende -------------------------------------------------
but, it gives me an error:
=========================== Error ==================================
WARNING:tensorflow:No training configuration found in save file, so the model was not compiled. Compile it manually.
Traceback (most recent call last):
File "convert_tf2pb_to_tf115pb.py", line 19, in
full_model = full_model.get_concrete_function(tf.TensorSpec(shape=(None, None, None, 3), dtype=tf.float32, name='input/0'),tf.TensorSpec(shape=(None, None, None), dtype=tf.int64, name='input/1'))
File "/218019043/software/anaconda3/envs/dassl/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 1264, in get_concrete_function
concrete = self._get_concrete_function_garbage_collected(*args, **kwargs)
File "/218019043/software/anaconda3/envs/dassl/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 1244, in _get_concrete_function_garbage_collected
self._initialize(args, kwargs, add_initializers_to=initializers)
File "/218019043/software/anaconda3/envs/dassl/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 786, in _initialize
*args, **kwds))
File "/218019043/software/anaconda3/envs/dassl/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 2983, in _get_concrete_function_internal_garbage_collected
graph_function, _ = self._maybe_define_function(args, kwargs)
File "/218019043/software/anaconda3/envs/dassl/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 3292, in _maybe_define_function
graph_function = self._create_graph_function(args, kwargs)
File "/218019043/software/anaconda3/envs/dassl/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 3140, in _create_graph_function
capture_by_value=self._capture_by_value),
File "/218019043/software/anaconda3/envs/dassl/lib/python3.7/site-packages/tensorflow/python/framework/func_graph.py", line 1161, in func_graph_from_py_func
func_outputs = python_func(*func_args, **func_kwargs)
File "/218019043/software/anaconda3/envs/dassl/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 677, in wrapped_fn
out = weak_wrapped_fn().wrapped(*args, **kwds)
File "/218019043/software/anaconda3/envs/dassl/lib/python3.7/site-packages/tensorflow/python/framework/func_graph.py", line 1147, in autograph_handler
raise e.ag_error_metadata.to_exception(e)
ValueError: in user code:

File "convert_tf2pb_to_tf115pb.py", line 18, in None  *
    lambda serving_default_image,serving_default_text: model(serving_default_image,serving_default_text))
File "/218019043/software/anaconda3/envs/dassl/lib/python3.7/site-packages/keras/utils/traceback_utils.py", line 67, in error_handler  **
    raise e.with_traceback(filtered_tb) from None
File "/218019043/software/anaconda3/envs/dassl/lib/python3.7/site-packages/keras/saving/saved_model/utils.py", line 166, in replace_training_and_call
    return wrapped_call(*args, **kwargs)

ValueError: Exception encountered when calling layer "clip" (type CLIP).

Could not find matching concrete function to call loaded from the SavedModel. Got:
  Positional arguments (2 total):
    * Tensor("input:0", shape=(None, None, None, 3), dtype=float32)
    * True
  Keyword arguments: {}

 Expected these arguments to match one of the following 4 option(s):

Option 1:
  Positional arguments (2 total):
    * (TensorSpec(shape=(None, None, None, 3), dtype=tf.float32, name='input/0'), TensorSpec(shape=(None, None, None), dtype=tf.int64, name='input/1'))
    * False
  Keyword arguments: {}

Option 2:
  Positional arguments (2 total):
    * (TensorSpec(shape=(None, None, None, 3), dtype=tf.float32, name='input/0'), TensorSpec(shape=(None, None, None), dtype=tf.int64, name='input/1'))
    * True
  Keyword arguments: {}

Option 3:
  Positional arguments (2 total):
    * (TensorSpec(shape=(None, None, None, 3), dtype=tf.float32, name='image'), TensorSpec(shape=(None, None, None), dtype=tf.int64, name='text'))
    * False
  Keyword arguments: {}

Option 4:
  Positional arguments (2 total):
    * (TensorSpec(shape=(None, None, None, 3), dtype=tf.float32, name='image'), TensorSpec(shape=(None, None, None), dtype=tf.int64, name='text'))
    * True
  Keyword arguments: {}

Call arguments received:
  • args=('tf.Tensor(shape=(None, None, None, 3), dtype=float32)', 'tf.Tensor(shape=(None, None, None), dtype=int64)')
  • kwargs=<class 'inspect._empty'>

===========================Error End==============================

Thanks a lot

Conversion failed

Hi, thank you for providing this conversion script. I am running into this error, not sure if what's the issue here:

Classify image: https://github.com/openai/CLIP/blob/main/CLIP.png?raw=true
Text options: ['a diagram', 'a dog', 'a cat', 'a neural network']
Pytorch: [[0.2441   0.003271 0.000827 0.752   ]]
Tensorflow: [[0.24351107 0.00320389 0.0008252  0.7524599 ]]
Traceback (most recent call last):
  File "convert_clip.py", line 64, in <module>
    app.run(main)
  File "/opt/conda/lib/python3.7/site-packages/absl/app.py", line 312, in run
    _run_main(main, args)
  File "/opt/conda/lib/python3.7/site-packages/absl/app.py", line 258, in _run_main
    sys.exit(main(argv))
  File "convert_clip.py", line 40, in main
    converter.verify(FLAGS.model, model, image_url, text_options, verbose=True)
  File "/home/jupyter/CLIP-tf2/converter/convert.py", line 233, in verify
    assert np.abs(torch_probs - tf_probs).sum() < 1e-3, f"PyTorch and Tensorflow results should be almost equal: torch_probs={torch_probs}, tf_probs={tf_probs}"
AssertionError: PyTorch and Tensorflow results should be almost equal: torch_probs=[[0.2441   0.003271 0.000827 0.752   ]], tf_probs=[[0.24351107 0.00320389 0.0008252  0.7524599 ]]

My command looks like this:
python convert_clip.py --model RN50 --output models/CLIP_{model}

Any help would be appreciated, thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.