Comments (13)
Thank you for helping me out. I have obtained the equivalent numbers as reported in the paper.
Can you please also let me know how to reproduce the Table 3, Table 4 and Table 5 reported in the paper
from gill.
What's the error message here? (if there is none, what does it say when you exit with ctrl+c?)
from gill.
There isn't any error messages. Anf if I do ctrl + c this is displayed
1%|▉ | 16/2064 [5:48:32<743:32:32, 1307.01s/it]
Traceback (most recent call last):
File "evals/generate_visdial_images.py", line 70, in
return_outputs = model.generate_for_images_and_texts(
File "/home/mbzuaiser/gill/gill/models.py", line 688, in generate_for_images_and_texts
img = utils.get_image_from_url(self.path_array[img_idx])
File "/home/mbzuaiser/gill/gill/utils.py", line 27, in get_image_from_url
response = requests.get(url)
File "/home/mbzuaiser/gill/venv/lib/python3.8/site-packages/requests/api.py", line 73, in get
return request("get", url, params=params, **kwargs)
File "/home/mbzuaiser/gill/venv/lib/python3.8/site-packages/requests/api.py", line 59, in request
return session.request(method=method, url=url, **kwargs)
File "/home/mbzuaiser/gill/venv/lib/python3.8/site-packages/requests/sessions.py", line 589, in request
resp = self.send(prep, **send_kwargs)
File "/home/mbzuaiser/gill/venv/lib/python3.8/site-packages/requests/sessions.py", line 703, in send
r = adapter.send(request, **kwargs)
File "/home/mbzuaiser/gill/venv/lib/python3.8/site-packages/requests/adapters.py", line 486, in send
resp = conn.urlopen(
File "/home/mbzuaiser/gill/venv/lib/python3.8/site-packages/urllib3/connectionpool.py", line 714, in urlopen
httplib_response = self._make_request(
File "/home/mbzuaiser/gill/venv/lib/python3.8/site-packages/urllib3/connectionpool.py", line 466, in _make_request
six.raise_from(e, None)
File "", line 3, in raise_from
File "/home/mbzuaiser/gill/venv/lib/python3.8/site-packages/urllib3/connectionpool.py", line 461, in _make_request
httplib_response = conn.getresponse()
File "/home/mbzuaiser/anaconda3/lib/python3.8/http/client.py", line 1347, in getresponse
response.begin()
File "/home/mbzuaiser/anaconda3/lib/python3.8/http/client.py", line 307, in begin
version, status, reason = self._read_status()
File "/home/mbzuaiser/anaconda3/lib/python3.8/http/client.py", line 268, in _read_status
line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
File "/home/mbzuaiser/anaconda3/lib/python3.8/socket.py", line 669, in readinto
return self._sock.recv_into(b)
File "/home/mbzuaiser/anaconda3/lib/python3.8/ssl.py", line 1241, in recv_into
return self.read(nbytes, buffer)
File "/home/mbzuaiser/anaconda3/lib/python3.8/ssl.py", line 1099, in read
return self._sslobj.read(len, buffer)
KeyboardInterrupt
from gill.
I see, it looks like it's having trouble reading one of the URLs for image retrieval. Could you try pulling from HEAD (c7de07a) and seeing if it works? I've disabled loading of retrieval embeddings by default since we don't need them for evals.
from gill.
For Visdial I am getting the error
Traceback (most recent call last):
File "evals/generate_visdial_images.py", line 27, in
model = models.load_gill('checkpoints/gill_opt/', load_ret_embs=False)
File "/home/mbzuaiser/gill/gill/models.py", line 895, in load_gill
emb_matrix = torch.tensor(emb_matrix, dtype=logit_scale.dtype).to(logit_scale.device)
TypeError: must be real number, not NoneType
from gill.
Sorry, this should be fixed with d85ad06. Not sure why I didn't catch it when I ran the eval earlier.
from gill.
Thank you for the help kohjingyu
What is the maximum epochs you have trained to get the final result reported in the paper As I am not able to reproduce the same number given in the table. OR Is there some other issues which can create this problem and what is the image size have you used fro calculating the LPIPS score OR what is the image resize operation have you used like cv2, F.intrepolate or PIL resize
LPIPS SCore(VIST) - reproduced (0.7314)
LPIPS SCore(VIST) -reproduced (0.7811)
Clip Score(VIST) - (reproduced (0.64018)
Clip Score(VISDial) - reproduced (0.64401)
from gill.
Was this a model you trained yourself? The models we released were trained as follows:
what is the image size have you used fro calculating the LPIPS score
Since the CLIP scores you have are similar to those of the paper, it seems like the issue might be with resizing for LPIPS. We have to resize them to 256x256 since the model being used is AlexNet. We used the torchvision resize for this:
Lines 35 to 36 in 232eb02
from gill.
Those tables are mostly ablation results and we probably won't be releasing the scripts for those. For the contextual image retrieval eval, you can refer to the FROMAGe repo for instructions.
from gill.
@kohjingyu How many iterations did you have per epoch? (you highlighted 20k iterations with a batch size of 200) Was it 200 iterations/epoch for a total of 100 epochs?
from gill.
@kohjingyu How many iterations did you have per epoch? (you highlighted 20k iterations with a batch size of 200) Was it 200 iterations/epoch for a total of 100 epochs?
The epoch doesn't really matter since the data is randomly shuffled. I think it only affects how often the evals are run. I think I used 2000 iterations / epoch for 10 epochs, but in principle the iterations * batch_size is the only one that affects the final results (i.e., the model should see ~4m image-text pairs). Hope that makes sense!
from gill.
Thanks for your answer, I was also trying to figure out the number of image-text pairs you used for each epoch in your training. In this setup, your model saw 400k randomly selected image-text pairs from the training set in each epoch, right?
from gill.
That’s correct.
from gill.
Related Issues (20)
- Clarification on precomputing the visual embeddings HOT 1
- How to get cc3m_embeddings HOT 1
- About the running log HOT 4
- Normalization of cc3m features HOT 1
- How could this affect the performance? HOT 10
- About error when running Precomputing Text Embeddings and Train HOT 2
- shape mismatch in the example notebook HOT 2
- [solved]
- why don't you use universal representation in one task?
- GILL Image Retrieval Code on VIST HOT 1
- Inference shape is not 8 HOT 1
- Visdial相关问题
- Error size mismatch when load decision model HOT 2
- RuntimeError: CUDA error: no kernel image is available for execution on the device CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
- param.grad is None !
- shape mismatch in the example "Multimodal Dialogue" HOT 1
- FID Evaluation on CC3M and VIST
- i try to dowmload cc3m using tools recommand by readme.md, but the number of picture can be download only 10% . is it normal?
- about [img] token and train data
- environment conflict
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from gill.