khui / copacrr Goto Github PK
View Code? Open in Web Editor NEWThe code for COPACRR Neural IR model.
License: Apache License 2.0
The code for COPACRR Neural IR model.
License: Apache License 2.0
Hi, @andrewyates @khui
I understood from the paper and previous issue comments that calculating the similarity matrices differ based on the input corpus! But, will the code for computing similarity matrices(for PACRR) be made publicly available?
Thanks,
Laksh
I believe there is a file missing under the utils folder since I get the following error when running bash bin/evals.sh
:
ImportError: No module named 'utils.eval_utils'
Hi all,
Could you please share the context vectors data to be able to run with context option?
Additionally, in your papers, you haven't mentioned anything about how do you constructed the IDF vectors for each query. I think it is very important for anyone who wants to develop this model to know where or how did you get it.
Thanks,
https://github.com/khui/repacrr/blob/aa5251280455a06314fe6064393649dff60bd496/pred_per_epoch.py#L57
On line 57 you have for qid in qid_cwid_pred:
this is a redundant loop as you'll do another loop like this in line 61. Minor bug, but wanted to report it anyway.
While trying to evaluate the model using bin/evals.sh
I get the following RuntimeWarning:
Traceback (most recent calls WITHOUT Sacred internals):
File "/home/joaolages/Desktop/repacrr/evals/docpairs.py", line 130, in main
best_pred_dir, argmax_epoch, argmax_run, argmax_ndcg, argmax_err = get_epoch_from_val(pred_dirs, val_dirs)
File "/home/joaolages/Desktop/repacrr/utils/eval_utils.py", line 67, in get_epoch_from_val
argmax_run, argmax_ndcg, argmax_err = test_epoch_ndcg_err[best_epoch]
KeyError: 0
Which leads to the following error:
Traceback (most recent calls WITHOUT Sacred internals):
File "/home/joaolages/Desktop/repacrr/evals/rerank.py", line 180, in main
best_pred_dir, argmax_epoch, argmax_run, argmax_ndcg, argmax_err = get_epoch_from_val(pred_dirs, val_dirs)
File "/home/joaolages/Desktop/repacrr/utils/eval_utils.py", line 67, in get_epoch_from_val
argmax_run, argmax_ndcg, argmax_err = test_epoch_ndcg_err[best_epoch]
KeyError: 0
I believe that the problem in here is due to the train_test_years
variable set in utils/config.py
as
train_test_years = {'wt12_13':['wt11', 'wt14']}
I have trained and predicted the model with wt09_10 and wt11 respectively, which I have set on both bash scripts. Both the functions docpairs.py
and rerank.py
don't look into these train_years
and test_year
variable passed in the config.
Hi there!
I have been trying to understand where do the files under data/trec_runs/wt**/
come from. I thought they would be an intersection between this QL submissions from TREC and the qrel files from that year, but I haven't been able to reproduce what you have.
Maybe I am confused and this files are actually for the RERANK-ALL rather than the RERANK evaluation.
Hi there!
I grouped up the similarity matrices that are missing in your download link. Could you provide them please? There are around 650 missing, it's a small percentage compared to the ~115k total. Also, the 95th query_idf vector is missing as well.
Edit: These matrices use the query description.
Edit2: The folder cosine/desc_doc_mat/269
has only empty matrices - I think this might be related to the fact that the topic 269 has no description
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.