robertbiehl / clip-tf2 Goto Github PK
View Code? Open in Web Editor NEWOpenAI CLIP converted to Tensorflow 2/Keras
License: MIT License
OpenAI CLIP converted to Tensorflow 2/Keras
License: MIT License
Hi, thank you for providing this conversion script. I am running into this error, not sure if what's the issue here:
Classify image: https://github.com/openai/CLIP/blob/main/CLIP.png?raw=true
Text options: ['a diagram', 'a dog', 'a cat', 'a neural network']
Pytorch: [[0.2441 0.003271 0.000827 0.752 ]]
Tensorflow: [[0.24351107 0.00320389 0.0008252 0.7524599 ]]
Traceback (most recent call last):
File "convert_clip.py", line 64, in <module>
app.run(main)
File "/opt/conda/lib/python3.7/site-packages/absl/app.py", line 312, in run
_run_main(main, args)
File "/opt/conda/lib/python3.7/site-packages/absl/app.py", line 258, in _run_main
sys.exit(main(argv))
File "convert_clip.py", line 40, in main
converter.verify(FLAGS.model, model, image_url, text_options, verbose=True)
File "/home/jupyter/CLIP-tf2/converter/convert.py", line 233, in verify
assert np.abs(torch_probs - tf_probs).sum() < 1e-3, f"PyTorch and Tensorflow results should be almost equal: torch_probs={torch_probs}, tf_probs={tf_probs}"
AssertionError: PyTorch and Tensorflow results should be almost equal: torch_probs=[[0.2441 0.003271 0.000827 0.752 ]], tf_probs=[[0.24351107 0.00320389 0.0008252 0.7524599 ]]
My command looks like this:
python convert_clip.py --model RN50 --output models/CLIP_{model}
Any help would be appreciated, thanks!
When saving only the text encoder and trying to encode text I get this error:
ValueError: Exception encountered when calling layer 'visual' (type VisualTransformer).
Could not find matching concrete function to call loaded from the SavedModel. Got:
Positional arguments (1 total):
* <tf.Tensor 'x:0' shape=(1, 77) dtype=int32>
Keyword arguments: {'training': False}
Expected these arguments to match one of the following 4 option(s):
Option 1:
Positional arguments (1 total):
* TensorSpec(shape=(None, None, None, 3), dtype=tf.float32, name='x')
Keyword arguments: {'training': False}
Option 2:
Positional arguments (1 total):
* TensorSpec(shape=(None, None, None, 3), dtype=tf.float32, name='x')
Keyword arguments: {'training': True}
Option 3:
Positional arguments (1 total):
* TensorSpec(shape=(None, None, None, 3), dtype=tf.float32, name='input_1')
Keyword arguments: {'training': False}
Option 4:
Positional arguments (1 total):
* TensorSpec(shape=(None, None, None, 3), dtype=tf.float32, name='input_1')
Keyword arguments: {'training': True}
Call arguments received by layer 'visual' (type VisualTransformer):
• args=('tensor([[49406, 1628, 49407, 0, 0, 0, 0, 0, 0, 0,\n 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n 0, 0, 0, 0, 0, 0, 0]], dtype=torch.int32)',)
• kwargs=<class 'inspect._empty'>
The saved model is generated with python3 convert_clip.py --model ViT-B/32 --text_output models/CLIP_text_{model}
and loaded with model = tf.keras.models.load_model("CLIP_text_ViT-B_32")
This seems to me like the typical image dimensions and not the text encoder input. Furthermore, the model is of the class VisualTransformer, which doesn't seem right. I would appreciate any help :)
Assuming I get the text encoder working, is there any way to get the tokenizer into the Keras saved model as well? I would like to serve it using tfserve so preferably the only input would be a string in the end.
After converting the L_14 model using python3 convert_clip.py --model ViT-L/14 --image_output models/CLIP_image_ViT-L_14/
and loading it using image_encoder = tf.keras.models.load_model("CLIP_image_ViT-L_14/", compile=False)
it fails to embed an image with this error:
In [5]: image_encoder(np.random.random((1, 480, 480 ,3)))
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[5], line 1
----> 1 image_encoder(np.random.random((1, 480, 480 ,3)))
File ~/.local/lib/python3.10/site-packages/keras/utils/traceback_utils.py:70, in filter_traceback.<locals>.error_handler(*args, **kwargs)
67 filtered_tb = _process_traceback_frames(e.__traceback__)
68 # To get the full stack trace, call:
69 # `tf.debugging.disable_traceback_filtering()`
---> 70 raise e.with_traceback(filtered_tb) from None
71 finally:
72 del filtered_tb
File ~/.local/lib/python3.10/site-packages/tensorflow/python/framework/ops.py:1967, in _create_c_op(graph, node_def, inputs, control_inputs, op_def, extract_traceback)
1964 c_op = pywrap_tf_session.TF_FinishOperation(op_desc)
1965 except errors.InvalidArgumentError as e:
1966 # Convert to ValueError for backwards compatibility.
-> 1967 raise ValueError(e.message)
1969 # Record the current Python stack trace as the creating stacktrace of this
1970 # TF_Operation.
1971 if extract_traceback:
ValueError: Exception encountered when calling layer 'visual' (type VisualTransformer).
Dimensions must be equal, but are 1157 and 257 for '{{node add}} = AddV2[T=DT_FLOAT](concat, add/ReadVariableOp)' with input shapes: [1,1157,1024], [257,1024].
Call arguments received by layer 'visual' (type VisualTransformer):
• args=('tf.Tensor(shape=(1, 480, 480, 3), dtype=float32)',)
• kwargs=<class 'inspect._empty'>
However the converting itself throws no error even though this also performs validation.
Can this line save the correct text encoder model?
I need the tensorflow 1.x pd file of clip text encoder, so I have to convert the TF 2.x pb file. Here is the code:
---------------------------------- code start -------------------------
import os
import tensorflow as tf
from tensorflow import keras
from tensorflow.python.framework.convert_to_constants import convert_variables_to_constants_v2
import pdb
TF2_model_input = 'CLIP_RN50' # the input path
TF115_model_output = 'dir_for_saving_TF1.15_pb'
#os.makedirs(TF115_model_output, exist_ok = True)
model = tf.keras.models.load_model(TF2_model_input) # read the model ckpt in TF2.8
full_model = tf.function(lambda serving_default_image,serving_default_text: # change the model to concrete function model(serving_default_image,serving_default_text))
full_model = full_model.get_concrete_function(tf.TensorSpec(shape=(None, None, None, 3), dtype=tf.float32, name='input/0'),tf.TensorSpec(shape=(None, None, None), dtype=tf.int64, name='input/1')) # Note: if the input information is not in the inherited tf.keras.Model, the Concrete Function is needed to define the input information via TensorSpec.
frozen_func = convert_variables_to_constants_v2(full_model) # change the model parameters to constants
frozen_func.graph.as_graph_def() # change the model graph to graph def
layers = [op.name for op in frozen_func.graph.get_operations()] # debug, check the parameters and parameter names (needed in TF1.15)
for layer in layers:
print(layer)
print("Frozen model inputs: ", frozen_func.inputs)
print("Frozen model outputs: ", frozen_func.outputs)
tf.io.write_graph(graph_or_graph_def=frozen_func.graph, # save the Frozen graph using TF1.15
logdir=frozen_out_path,
name="tf1.15_frozen_graph_model.pb",
as_text=False)
------------------------------------------------- Code ende -------------------------------------------------
but, it gives me an error:
=========================== Error ==================================
WARNING:tensorflow:No training configuration found in save file, so the model was not compiled. Compile it manually.
Traceback (most recent call last):
File "convert_tf2pb_to_tf115pb.py", line 19, in
full_model = full_model.get_concrete_function(tf.TensorSpec(shape=(None, None, None, 3), dtype=tf.float32, name='input/0'),tf.TensorSpec(shape=(None, None, None), dtype=tf.int64, name='input/1'))
File "/218019043/software/anaconda3/envs/dassl/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 1264, in get_concrete_function
concrete = self._get_concrete_function_garbage_collected(*args, **kwargs)
File "/218019043/software/anaconda3/envs/dassl/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 1244, in _get_concrete_function_garbage_collected
self._initialize(args, kwargs, add_initializers_to=initializers)
File "/218019043/software/anaconda3/envs/dassl/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 786, in _initialize
*args, **kwds))
File "/218019043/software/anaconda3/envs/dassl/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 2983, in _get_concrete_function_internal_garbage_collected
graph_function, _ = self._maybe_define_function(args, kwargs)
File "/218019043/software/anaconda3/envs/dassl/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 3292, in _maybe_define_function
graph_function = self._create_graph_function(args, kwargs)
File "/218019043/software/anaconda3/envs/dassl/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 3140, in _create_graph_function
capture_by_value=self._capture_by_value),
File "/218019043/software/anaconda3/envs/dassl/lib/python3.7/site-packages/tensorflow/python/framework/func_graph.py", line 1161, in func_graph_from_py_func
func_outputs = python_func(*func_args, **func_kwargs)
File "/218019043/software/anaconda3/envs/dassl/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 677, in wrapped_fn
out = weak_wrapped_fn().wrapped(*args, **kwds)
File "/218019043/software/anaconda3/envs/dassl/lib/python3.7/site-packages/tensorflow/python/framework/func_graph.py", line 1147, in autograph_handler
raise e.ag_error_metadata.to_exception(e)
ValueError: in user code:
File "convert_tf2pb_to_tf115pb.py", line 18, in None *
lambda serving_default_image,serving_default_text: model(serving_default_image,serving_default_text))
File "/218019043/software/anaconda3/envs/dassl/lib/python3.7/site-packages/keras/utils/traceback_utils.py", line 67, in error_handler **
raise e.with_traceback(filtered_tb) from None
File "/218019043/software/anaconda3/envs/dassl/lib/python3.7/site-packages/keras/saving/saved_model/utils.py", line 166, in replace_training_and_call
return wrapped_call(*args, **kwargs)
ValueError: Exception encountered when calling layer "clip" (type CLIP).
Could not find matching concrete function to call loaded from the SavedModel. Got:
Positional arguments (2 total):
* Tensor("input:0", shape=(None, None, None, 3), dtype=float32)
* True
Keyword arguments: {}
Expected these arguments to match one of the following 4 option(s):
Option 1:
Positional arguments (2 total):
* (TensorSpec(shape=(None, None, None, 3), dtype=tf.float32, name='input/0'), TensorSpec(shape=(None, None, None), dtype=tf.int64, name='input/1'))
* False
Keyword arguments: {}
Option 2:
Positional arguments (2 total):
* (TensorSpec(shape=(None, None, None, 3), dtype=tf.float32, name='input/0'), TensorSpec(shape=(None, None, None), dtype=tf.int64, name='input/1'))
* True
Keyword arguments: {}
Option 3:
Positional arguments (2 total):
* (TensorSpec(shape=(None, None, None, 3), dtype=tf.float32, name='image'), TensorSpec(shape=(None, None, None), dtype=tf.int64, name='text'))
* False
Keyword arguments: {}
Option 4:
Positional arguments (2 total):
* (TensorSpec(shape=(None, None, None, 3), dtype=tf.float32, name='image'), TensorSpec(shape=(None, None, None), dtype=tf.int64, name='text'))
* True
Keyword arguments: {}
Call arguments received:
• args=('tf.Tensor(shape=(None, None, None, 3), dtype=float32)', 'tf.Tensor(shape=(None, None, None), dtype=int64)')
• kwargs=<class 'inspect._empty'>
===========================Error End==============================
Thanks a lot
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.