yuzhimanhua / match Goto Github PK

View Code? Open in Web Editor NEW

136.0 6.0 23.0 16.7 MB

MATCH: Metadata-Aware Text Classification in A Large Hierarchy (WWW'21)

Home Page: https://arxiv.org/abs/2102.07349

License: Apache License 2.0

Python 66.90% Shell 2.47% Makefile 1.07% C++ 29.56%

metadata text-classification extreme-multi-label-classification microsoft-academic-graph scientific-text-mining

match's People

Contributors

Stargazers

Watchers

match's Issues

Cuda version

I am seeing the following error

RuntimeError: Expected object of scalar type Long but got scalar type Int for sequence element 1 in sequence argument at position #1 'tensors'

Can you please tell me which version you were using for
cudatoolkit and cudnn

Thanks
abhishek

Unable to reproduce paper's results on MeSH dataset

Hello,

I am reaching out to you regarding the results of your paper, where you evaluated your model's performance on both the MAG and MeSH datasets.

I have tried to reproduce your results on both datasets using your code and the same hyperparameters as mentioned in the paper, and I was able to obtain results similar to the ones reported in the paper for the MAG dataset. However, when I ran the same experiments on the MeSH dataset, I found that my results were worse than those reported in the paper. My result in MeSH is

P@1,3,5: 0.9117345916709328 , 0.7319429296414183 , 0.5970285129209607
NDCG@1,3,5: 0.9117345916709328 , 0.791101895548494 , 0.7191552905597784

I was wondering if there are any additional parameters or configurations that I need to consider while running the experiments on the MeSH dataset. Is it possible that the configuration for the MeSH dataset is different from the one used for the MAG dataset?

Thank you for your time and I look forward to your response.

Is a text transformation step needed?

In the README it states:

NOTE: If you would like to run our code on your own datasets, there is no need to represent each paper/author/word as a number. Just make sure that (1) each paper/venue/author/word name does not have whitespace inside

I noticed that in vocabulary.txt the words are all lowercase, and many "words" are actually multiple words separated by underscores.

If I'm starting with titles and abstracts that include capitalization and punctuation do I need to transform that in some way before putting it into the "text" field in the .json file?

Other datasets?

Thank you very much for your work and making the codebase public; it is very inspiring :)
I am planning to implement something similar and had the following questions:

Is there a specific reason why MATCH was not tested on other popular hierarchical datasets like WordOfSciences, NYTimes, or RCV1-V2?
Also, the readme says that experiments were done on NVIDIA GTX 1080. Can you please share the time it took for training?

Will MATCH work well on a small label hierarchy (~120 labels)?

Are there any modifications we should make to run MATCH on a small label hierarchy?

Using another label hierarchy as metadata to predict other labels

I want to use MATCH to do multi-label text classification on scientific papers using a hierarchical biomimicry label taxonomy I have. Is there a way to use the MeSH labels and MAG fields of study as metadata to improve predictions?

yuzhimanhua / match Goto Github PK

match's People

Contributors

Stargazers

Watchers

Forkers

match's Issues

Cuda version

Unable to reproduce paper's results on MeSH dataset

Is a text transformation step needed?

Other datasets?

Will MATCH work well on a small label hierarchy (~120 labels)?

Using another label hierarchy as metadata to predict other labels

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent