robertbiehl / clip-tf2 Goto Github PK

View Code? Open in Web Editor NEW

48.0 48.0 6.0 64 KB

OpenAI CLIP converted to Tensorflow 2/Keras

License: MIT License

Python 100.00%

ai artificial-intelligence clip deep-learning deep-neural-networks openai tensorflow2 zero-shot-learning

clip-tf2's People

Contributors

Stargazers

Watchers

Forkers

templeblock leterax bryanlyon antoniobck kippapollo oysterqaq

clip-tf2's Issues

Conversion failed

Hi, thank you for providing this conversion script. I am running into this error, not sure if what's the issue here:

Classify image: https://github.com/openai/CLIP/blob/main/CLIP.png?raw=true
Text options: ['a diagram', 'a dog', 'a cat', 'a neural network']
Pytorch: [[0.2441   0.003271 0.000827 0.752   ]]
Tensorflow: [[0.24351107 0.00320389 0.0008252  0.7524599 ]]
Traceback (most recent call last):
  File "convert_clip.py", line 64, in <module>
    app.run(main)
  File "/opt/conda/lib/python3.7/site-packages/absl/app.py", line 312, in run
    _run_main(main, args)
  File "/opt/conda/lib/python3.7/site-packages/absl/app.py", line 258, in _run_main
    sys.exit(main(argv))
  File "convert_clip.py", line 40, in main
    converter.verify(FLAGS.model, model, image_url, text_options, verbose=True)
  File "/home/jupyter/CLIP-tf2/converter/convert.py", line 233, in verify
    assert np.abs(torch_probs - tf_probs).sum() < 1e-3, f"PyTorch and Tensorflow results should be almost equal: torch_probs={torch_probs}, tf_probs={tf_probs}"
AssertionError: PyTorch and Tensorflow results should be almost equal: torch_probs=[[0.2441   0.003271 0.000827 0.752   ]], tf_probs=[[0.24351107 0.00320389 0.0008252  0.7524599 ]]

My command looks like this:
python convert_clip.py --model RN50 --output models/CLIP_{model}

Any help would be appreciated, thanks!

Text encoder input shape

When saving only the text encoder and trying to encode text I get this error:

ValueError: Exception encountered when calling layer 'visual' (type VisualTransformer).

Could not find matching concrete function to call loaded from the SavedModel. Got:
  Positional arguments (1 total):
    * <tf.Tensor 'x:0' shape=(1, 77) dtype=int32>
  Keyword arguments: {'training': False}

 Expected these arguments to match one of the following 4 option(s):

Option 1:
  Positional arguments (1 total):
    * TensorSpec(shape=(None, None, None, 3), dtype=tf.float32, name='x')
  Keyword arguments: {'training': False}

Option 2:
  Positional arguments (1 total):
    * TensorSpec(shape=(None, None, None, 3), dtype=tf.float32, name='x')
  Keyword arguments: {'training': True}

Option 3:
  Positional arguments (1 total):
    * TensorSpec(shape=(None, None, None, 3), dtype=tf.float32, name='input_1')
  Keyword arguments: {'training': False}

Option 4:
  Positional arguments (1 total):
    * TensorSpec(shape=(None, None, None, 3), dtype=tf.float32, name='input_1')
  Keyword arguments: {'training': True}

Call arguments received by layer 'visual' (type VisualTransformer):
  • args=('tensor([[49406,  1628, 49407,     0,     0,     0,     0,     0,     0,     0,\n             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,\n             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,\n             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,\n             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,\n             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,\n             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,\n             0,     0,     0,     0,     0,     0,     0]], dtype=torch.int32)',)
  • kwargs=<class 'inspect._empty'>

The saved model is generated with python3 convert_clip.py --model ViT-B/32 --text_output models/CLIP_text_{model} and loaded with model = tf.keras.models.load_model("CLIP_text_ViT-B_32")

This seems to me like the typical image dimensions and not the text encoder input. Furthermore, the model is of the class VisualTransformer, which doesn't seem right. I would appreciate any help :)

Assuming I get the text encoder working, is there any way to get the tokenizer into the Keras saved model as well? I would like to serve it using tfserve so preferably the only input would be a string in the end.

CLIP_image_ViT-L_14 image encoder unequal dimensions.

After converting the L_14 model using python3 convert_clip.py --model ViT-L/14 --image_output models/CLIP_image_ViT-L_14/ and loading it using image_encoder = tf.keras.models.load_model("CLIP_image_ViT-L_14/", compile=False) it fails to embed an image with this error:

In [5]: image_encoder(np.random.random((1, 480, 480 ,3)))
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[5], line 1
----> 1 image_encoder(np.random.random((1, 480, 480 ,3)))

File ~/.local/lib/python3.10/site-packages/keras/utils/traceback_utils.py:70, in filter_traceback.<locals>.error_handler(*args, **kwargs)
     67     filtered_tb = _process_traceback_frames(e.__traceback__)
     68     # To get the full stack trace, call:
     69     # `tf.debugging.disable_traceback_filtering()`
---> 70     raise e.with_traceback(filtered_tb) from None
     71 finally:
     72     del filtered_tb

File ~/.local/lib/python3.10/site-packages/tensorflow/python/framework/ops.py:1967, in _create_c_op(graph, node_def, inputs, control_inputs, op_def, extract_traceback)
   1964   c_op = pywrap_tf_session.TF_FinishOperation(op_desc)
   1965 except errors.InvalidArgumentError as e:
   1966   # Convert to ValueError for backwards compatibility.
-> 1967   raise ValueError(e.message)
   1969 # Record the current Python stack trace as the creating stacktrace of this
   1970 # TF_Operation.
   1971 if extract_traceback:

ValueError: Exception encountered when calling layer 'visual' (type VisualTransformer).

Dimensions must be equal, but are 1157 and 257 for '{{node add}} = AddV2[T=DT_FLOAT](concat, add/ReadVariableOp)' with input shapes: [1,1157,1024], [257,1024].

Call arguments received by layer 'visual' (type VisualTransformer):
  • args=('tf.Tensor(shape=(1, 480, 480, 3), dtype=float32)',)
  • kwargs=<class 'inspect._empty'>

However the converting itself throws no error even though this also performs validation.

How to save text encoder alone?

Can this line save the correct text encoder model？

Open CLIP support

Have a problem when I convert the tf2 pb file to tf1.15 pb file, could you help me? Thanks.

I need the tensorflow 1.x pd file of clip text encoder, so I have to convert the TF 2.x pb file. Here is the code:
---------------------------------- code start -------------------------

import os
import tensorflow as tf
from tensorflow import keras
from tensorflow.python.framework.convert_to_constants import convert_variables_to_constants_v2
import pdb

TF2_model_input = 'CLIP_RN50' # the input path
TF115_model_output = 'dir_for_saving_TF1.15_pb'
#os.makedirs(TF115_model_output, exist_ok = True)

model = tf.keras.models.load_model(TF2_model_input) # read the model ckpt in TF2.8

full_model = tf.function(lambda serving_default_image,serving_default_text: # change the model to concrete function model(serving_default_image,serving_default_text))
full_model = full_model.get_concrete_function(tf.TensorSpec(shape=(None, None, None, 3), dtype=tf.float32, name='input/0'),tf.TensorSpec(shape=(None, None, None), dtype=tf.int64, name='input/1')) # Note: if the input information is not in the inherited tf.keras.Model, the Concrete Function is needed to define the input information via TensorSpec.

frozen_func = convert_variables_to_constants_v2(full_model) # change the model parameters to constants

frozen_func.graph.as_graph_def() # change the model graph to graph def

layers = [op.name for op in frozen_func.graph.get_operations()] # debug, check the parameters and parameter names (needed in TF1.15)
for layer in layers:
print(layer)
print("Frozen model inputs: ", frozen_func.inputs)
print("Frozen model outputs: ", frozen_func.outputs)

tf.io.write_graph(graph_or_graph_def=frozen_func.graph, # save the Frozen graph using TF1.15
logdir=frozen_out_path,
name="tf1.15_frozen_graph_model.pb",
as_text=False)
------------------------------------------------- Code ende -------------------------------------------------
but, it gives me an error:
=========================== Error ==================================
WARNING:tensorflow:No training configuration found in save file, so the model was not compiled. Compile it manually.
Traceback (most recent call last):
File "convert_tf2pb_to_tf115pb.py", line 19, in
full_model = full_model.get_concrete_function(tf.TensorSpec(shape=(None, None, None, 3), dtype=tf.float32, name='input/0'),tf.TensorSpec(shape=(None, None, None), dtype=tf.int64, name='input/1'))
File "/218019043/software/anaconda3/envs/dassl/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 1264, in get_concrete_function
concrete = self._get_concrete_function_garbage_collected(*args, **kwargs)
File "/218019043/software/anaconda3/envs/dassl/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 1244, in _get_concrete_function_garbage_collected
self._initialize(args, kwargs, add_initializers_to=initializers)
File "/218019043/software/anaconda3/envs/dassl/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 786, in _initialize
*args, **kwds))
File "/218019043/software/anaconda3/envs/dassl/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 2983, in _get_concrete_function_internal_garbage_collected
graph_function, _ = self._maybe_define_function(args, kwargs)
File "/218019043/software/anaconda3/envs/dassl/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 3292, in _maybe_define_function
graph_function = self._create_graph_function(args, kwargs)
File "/218019043/software/anaconda3/envs/dassl/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 3140, in _create_graph_function
capture_by_value=self._capture_by_value),
File "/218019043/software/anaconda3/envs/dassl/lib/python3.7/site-packages/tensorflow/python/framework/func_graph.py", line 1161, in func_graph_from_py_func
func_outputs = python_func(*func_args, **func_kwargs)
File "/218019043/software/anaconda3/envs/dassl/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 677, in wrapped_fn
out = weak_wrapped_fn().wrapped(*args, **kwds)
File "/218019043/software/anaconda3/envs/dassl/lib/python3.7/site-packages/tensorflow/python/framework/func_graph.py", line 1147, in autograph_handler
raise e.ag_error_metadata.to_exception(e)
ValueError: in user code:

File "convert_tf2pb_to_tf115pb.py", line 18, in None  *
    lambda serving_default_image,serving_default_text: model(serving_default_image,serving_default_text))
File "/218019043/software/anaconda3/envs/dassl/lib/python3.7/site-packages/keras/utils/traceback_utils.py", line 67, in error_handler  **
    raise e.with_traceback(filtered_tb) from None
File "/218019043/software/anaconda3/envs/dassl/lib/python3.7/site-packages/keras/saving/saved_model/utils.py", line 166, in replace_training_and_call
    return wrapped_call(*args, **kwargs)

ValueError: Exception encountered when calling layer "clip" (type CLIP).

Could not find matching concrete function to call loaded from the SavedModel. Got:
  Positional arguments (2 total):
    * Tensor("input:0", shape=(None, None, None, 3), dtype=float32)
    * True
  Keyword arguments: {}

 Expected these arguments to match one of the following 4 option(s):

Option 1:
  Positional arguments (2 total):
    * (TensorSpec(shape=(None, None, None, 3), dtype=tf.float32, name='input/0'), TensorSpec(shape=(None, None, None), dtype=tf.int64, name='input/1'))
    * False
  Keyword arguments: {}

Option 2:
  Positional arguments (2 total):
    * (TensorSpec(shape=(None, None, None, 3), dtype=tf.float32, name='input/0'), TensorSpec(shape=(None, None, None), dtype=tf.int64, name='input/1'))
    * True
  Keyword arguments: {}

Option 3:
  Positional arguments (2 total):
    * (TensorSpec(shape=(None, None, None, 3), dtype=tf.float32, name='image'), TensorSpec(shape=(None, None, None), dtype=tf.int64, name='text'))
    * False
  Keyword arguments: {}

Option 4:
  Positional arguments (2 total):
    * (TensorSpec(shape=(None, None, None, 3), dtype=tf.float32, name='image'), TensorSpec(shape=(None, None, None), dtype=tf.int64, name='text'))
    * True
  Keyword arguments: {}

Call arguments received:
  • args=('tf.Tensor(shape=(None, None, None, 3), dtype=float32)', 'tf.Tensor(shape=(None, None, None), dtype=int64)')
  • kwargs=<class 'inspect._empty'>

===========================Error End==============================

Thanks a lot

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.