siddsax / xml-cnn Goto Github PK

View Code? Open in Web Editor NEW

62.0 3.0 8.0 11.62 MB

Pytorch implementation of the paper Deep learning for extreme multi-label text classification

Home Page: https://www.getmerlin.in

Python 95.30% MATLAB 4.70%

multi-label-classification deep-learning cnn-text-classification

xml-cnn's Issues

How to solve the problem that topK's K is different for every input text?

The output topK's K is fixed now.

Do you think training a classifier to predict the value of K for every input is a good solution?

Thank you very much.

Pooling Method

Hi siddsax,

I wanna ask about the pooling layer because you're using a sliding max pooling. XML-CNN use dynamic max pooling with the definition "For a p document with m words, we evenly divide its m-dimensional feature map into p chunks, each chunk is pooled to a single feature by taking the largest value within that chunk".

Regarding file not found "x_train.npz"

Saving Model to: Gen_data_CNN_Z_dim-100_mb_size-20_hidden_dims-512_preproc-0_loss-BCELoss_sequence_length-500_embedding_dim-300_params.vocab_size=30000
Traceback (most recent call last):
File "main.py", line 57, in
x_tr, x_te, y_tr, y_te, params.vocabulary, params.vocabulary_inv, params = save_load_data(params, save=params.load_data)
File "../utils\futils.py", line 153, in save_load_data
x_tr = sparse.load_npz(params.data_path + '/x_train.npz')
File "C:\Users\dc\anaconda\envs\riya\lib\site-packages\scipy\sparse_matrix_io.py", line 131, in load_npz
with np.load(file, **PICKLE_KWARGS) as loaded:
File "C:\Users\dc\anaconda\envs\riya\lib\site-packages\numpy\lib\npyio.py", line 415, in load
fid = open(os_fspath(file), "rb")
FileNotFoundError: [Errno 2] No such file or directory: '../datasets/rcv/x_train.npz'

The link of RCV1 dataset is invalid

Hi, when I got into the link of the RCV dataset, I found "404 not found", could you provide another link of the RCV dataset? If possible could you provide other datasets in your paper. It's a little hard for me to understand the code without the dataset. Thank you very much!

What would be the format of the input dataset?

Hi there,

I am interested in trying XML-CNN on my own dataset. I have collection of documents, and their labels. Could you please help me understand how I can feed it to your tool? Or, if you provide me samples, that would also be helpful. I tried to go through the RCV file you mentioned in the README file, but it's not really clear. Thanks.

about data preprocessing

Hello, I want to run XML-CNN on several benchmark, but i don't know how to deal with data, can you provide the script you use for data preprocessing in WIKI31K , AMAZON and others, and the downloading source of dataset?
Thanks~

siddsax / xml-cnn Goto Github PK

xml-cnn's Issues

How to solve the problem that topK's K is different for every input text?

Pooling Method

Regarding file not found "x_train.npz"

The link of RCV1 dataset is invalid

What would be the format of the input dataset?

about data preprocessing

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent