Comments (12)
It's hard to give an average time since it depends on your hardware. If you are on CPU and haven't installed FAISS, please consider installing FAISS-CPU as it will speed up the computation of the knn graph. The bottleneck is the computation of the synthetic dictionary via CSLS.
from muse.
4 hours seems abnormally long, there might be something wrong. What command did you use exactly?
from muse.
python supervised.py --src_lang en --tgt_lang hi --src_emb wiki.en.vec --tgt_emb wiki.hi.vec --n_iter 5 --dico_train default
*Updated top-level description/comment.
from muse.
Hi @gvishal, this is what I get on my machine using your command:
............
............
INFO - 12/28/17 02:11:23 - 0:00:45 - Monolingual source word similarity score average: 0.65108
INFO - 12/28/17 02:11:23 - 0:00:45 - Found 2032 pairs of words in the dictionary (1500 unique). 0 other pairs contained at least one unknown word (0 in lang1, 0 in lang2)
INFO - 12/28/17 02:11:23 - 0:00:45 - 1500 source words - nn - Precision at k = 1: 23.800000
INFO - 12/28/17 02:11:23 - 0:00:45 - 1500 source words - nn - Precision at k = 5: 41.133333
INFO - 12/28/17 02:11:23 - 0:00:45 - 1500 source words - nn - Precision at k = 10: 48.133333
INFO - 12/28/17 02:11:23 - 0:00:45 - Found 2032 pairs of words in the dictionary (1500 unique). 0 other pairs contained at least one unknown word (0 in lang1, 0 in lang2)
INFO - 12/28/17 02:11:38 - 0:01:00 - 1500 source words - csls_knn_10 - Precision at k = 1: 33.333333
INFO - 12/28/17 02:11:39 - 0:01:00 - 1500 source words - csls_knn_10 - Precision at k = 5: 51.200000
INFO - 12/28/17 02:11:39 - 0:01:00 - 1500 source words - csls_knn_10 - Precision at k = 10: 58.133333
INFO - 12/28/17 02:11:42 - 0:01:04 - Building the train dictionary ...
INFO - 12/28/17 02:11:42 - 0:01:04 - New train dictionary of 4117 pairs.
INFO - 12/28/17 02:11:42 - 0:01:04 - Mean cosine (nn method, S2T build, 10000 max size): 0.62036
INFO - 12/28/17 02:12:19 - 0:01:40 - Building the train dictionary ...
INFO - 12/28/17 02:12:19 - 0:01:40 - New train dictionary of 4424 pairs.
INFO - 12/28/17 02:12:19 - 0:01:40 - Mean cosine (csls_knn_10 method, S2T build, 10000 max size): 0.61113
............
............
INFO - 12/28/17 02:12:19 - 0:01:40 - * Saving the mapping to /private/home/guismay/code/MUSE/dumped/ajsd040n1x/best_mapping.t7 ...
INFO - 12/28/17 02:12:19 - 0:01:40 - End of refinement iteration 0.
INFO - 12/28/17 02:12:19 - 0:01:40 - Starting refinement iteration 1...
This is using a P100 GPU, but without FAISS. Are you sure you are using the GPU?
from muse.
The process is consuming memory on the gpu and is constantly hogging a CPU ~~~, but it is consuming 0% GPU cycles~~~. I'll debug what's happening. TF works fine, I have CUDA 8 and python 3.5.
Update: I ran a sample PyTorch code on GPU and even that works fine.
Additionally, the process does not respond to Ctrl-C, I have to kill it using other means!
from muse.
The problem should be there in get_word_translation_accuracy
:
https://github.com/facebookresearch/MUSE/blob/master/src/evaluation/word_translation.py#L115-L129
I guess the problem comes from the get_nn_avg_dist
call. Can you try to add a print before / after line 121 or something, to see if this is really the slow part?
from muse.
Here's the log, I added a bunch of more debug statements.
Yes, that looks like the issue. There's been no update in 40 mins after that.
INFO - 12/28/17 17:19:58 - 0:00:11 - Loaded 200000 pre-trained word embeddings
INFO - 12/28/17 17:20:16 - 0:00:28 - Loaded 158016 pre-trained word embeddings
INFO - 12/28/17 17:20:19 - 0:00:32 - Found 8704 pairs of words in the dictionary (4998 unique). 0 other pairs contained at least one unknown word (0 in lang1, 0 in lang2)
INFO - 12/28/17 17:20:19 - 0:00:32 - Starting refinement iteration 0...
INFO - 12/28/17 17:20:20 - 0:00:32 - ====================================================================
INFO - 12/28/17 17:20:20 - 0:00:32 - Dataset Found Not found Rho
INFO - 12/28/17 17:20:20 - 0:00:32 - ====================================================================
INFO - 12/28/17 17:20:20 - 0:00:32 - EN_MTurk-771 771 0 0.6689
INFO - 12/28/17 17:20:20 - 0:00:32 - EN_MTurk-287 286 1 0.6773
INFO - 12/28/17 17:20:20 - 0:00:32 - EN_SIMLEX-999 998 1 0.3823
INFO - 12/28/17 17:20:20 - 0:00:32 - EN_WS-353-REL 252 0 0.6820
INFO - 12/28/17 17:20:20 - 0:00:32 - EN_RW-STANFORD 1323 711 0.5080
INFO - 12/28/17 17:20:20 - 0:00:32 - EN_MC-30 30 0 0.8123
INFO - 12/28/17 17:20:20 - 0:00:32 - EN_WS-353-ALL 353 0 0.7388
INFO - 12/28/17 17:20:20 - 0:00:32 - EN_VERB-143 144 0 0.3973
INFO - 12/28/17 17:20:20 - 0:00:32 - EN_MEN-TR-3k 3000 0 0.7637
INFO - 12/28/17 17:20:20 - 0:00:32 - EN_YP-130 130 0 0.5333
INFO - 12/28/17 17:20:20 - 0:00:32 - EN_RG-65 65 0 0.7974
INFO - 12/28/17 17:20:20 - 0:00:32 - EN_SEMEVAL17 379 9 0.7216
INFO - 12/28/17 17:20:20 - 0:00:32 - EN_WS-353-SIM 203 0 0.7811
INFO - 12/28/17 17:20:20 - 0:00:32 - ====================================================================
INFO - 12/28/17 17:20:20 - 0:00:32 - Monolingual source word similarity score average: 0.65108
INFO - 12/28/17 17:20:20 - 0:00:32 - Found 2032 pairs of words in the dictionary (1500 unique). 0 other pairs contained at least one unknown word (0 in lang1, 0 in lang2)
INFO - 12/28/17 17:20:20 - 0:00:32 - 1500 source words - nn - Precision at k = 1: 23.800000
INFO - 12/28/17 17:20:20 - 0:00:33 - 1500 source words - nn - Precision at k = 5: 41.133333
INFO - 12/28/17 17:20:20 - 0:00:33 - 1500 source words - nn - Precision at k = 10: 48.133333
INFO - 12/28/17 17:20:20 - 0:00:33 - Found 2032 pairs of words in the dictionary (1500 unique). 0 other pairs contained at least one unknown word (0 in lang1, 0 in lang2)
INFO - 12/28/17 17:20:20 - 0:00:33 - get_nn_avg_dist started.
INFO - 12/28/17 17:20:20 - 0:00:33 - Faiss available.
INFO - 12/28/17 17:20:20 - 0:00:33 - GPU available.
INFO - 12/28/17 17:20:20 - 0:00:33 - before faiss.GpuIndexFlatIP(res, emb.shape[1], config)
from muse.
Interesting. The line faiss.GpuIndexFlatIP(res, emb.shape[1], config)
is not supposed to run anything, this is just some initialization. Maybe the issue is coming from FAISS.
Can you try here:
https://github.com/facebookresearch/MUSE/blob/master/src/utils.py#L151
to replace if FAISS_AVAILABLE:
by if False
and see if this works?
from muse.
Disabling FAISS doesn't to be working, it doesn't respond to Ctrl-C also.
I instead ran FAISS in CPU mode and it is working! Much faster than GPU, I'd say :P
Update: It's taking a long time on some other step as well. It's been almost 2 hours.
INFO - 12/28/17 19:10:24 - 0:00:37 - Monolingual source word similarity score average: 0.65108
INFO - 12/28/17 19:10:24 - 0:00:37 - Found 2032 pairs of words in the dictionary (1500 unique). 0 other pairs contained at least one unknown word (0 in lang1, 0 in lang2)
INFO - 12/28/17 19:10:25 - 0:00:38 - 1500 source words - nn - Precision at k = 1: 23.800000
INFO - 12/28/17 19:10:25 - 0:00:38 - 1500 source words - nn - Precision at k = 5: 41.133333
INFO - 12/28/17 19:10:25 - 0:00:38 - 1500 source words - nn - Precision at k = 10: 48.133333
INFO - 12/28/17 19:10:25 - 0:00:38 - Found 2032 pairs of words in the dictionary (1500 unique). 0 other pairs contained at least one unknown word (0 in lang1, 0 in lang2)
INFO - 12/28/17 19:10:25 - 0:00:38 - get_nn_avg_dist started.
INFO - 12/28/17 19:10:25 - 0:00:38 - Faiss available.
INFO - 12/28/17 19:10:25 - 0:00:38 - Searching in Index started.
INFO - 12/28/17 19:11:39 - 0:01:52 - Searching in Index finished.
INFO - 12/28/17 19:11:39 - 0:01:52 - get_nn_avg_dist finished.
INFO - 12/28/17 19:11:39 - 0:01:52 - Faiss available.
INFO - 12/28/17 19:11:39 - 0:01:52 - Searching in Index started.
INFO - 12/28/17 19:12:54 - 0:03:07 - Searching in Index finished.
from muse.
What happens when you disable FAISS? What line is blocking in that case? This should be easier to debug.
from muse.
Okay, I'll try that. I think the issue is with pytorch getting stuck somewhere.
from muse.
I'm closing the issue since you got it working with FAISS-CPU. Feel free to re-open if you have more issues.
from muse.
Related Issues (20)
- Does the corpus size affect the mapping learned?
- Access denied error from dl.fbaipublicfiles.com HOT 2
- New languages
- why unsupervised can achieve Word alignment?
- Can some one give the dictionary tree of the whole project? Like in the data/crosslingual or monlingual/.. HOT 5
- non-parallel chinese traditional - english
- evaluate.py error
- openssl ssl_read ssl_error_syscall errno 110
- Reproducing Results in Table 1 HOT 1
- IndexError: index out of range in self
- AttributeError: 'Namespace' object has no attribute 'dico_max_rank'
- Assertion Error while using the unsupervised way.
- Tokenization issue in to-En bilingual dictionaries
- They hated the kid HOT 1
- Bad outcome in ja-en task HOT 1
- Rush Shhh INPUT aUTOMATION
- ValueError: too many values to unpack (expected 2) in unsupervised.py
- Will pytorch's deprecation of volatile affect the result?
- [ML Question] Is it possible somehow to translate two or three words ?
- Tried on GloVe?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from muse.