gabrielspmoreira / chameleon_recsys Goto Github PK

Source code of CHAMELEON - A Deep Learning Meta-Architecture for News Recommender Systems

License: MIT License

Python 88.82% Shell 4.53% Jupyter Notebook 6.65%

deep-learning deep-neural-networks lstm lstm-neural-network lstm-neural-networks news-recommendation recommendation-algorithms recommendation-engine recommendation-system recommender-system rnn rnn-tensorflow tensorflow word-embeddings word2vec

chameleon_recsys's People

Contributors

Stargazers

Watchers

Forkers

marlesson marcelomata kojr1234 ztx0728 sumitsidana mindis shubhampachori12110095 ripingit diogoflorencio chiragsingla finesure2017 mr-yoni alejandronotario jiaruixu pyalex royhuang9 karthickrajas aishgrt1 kif2006 xw-jia shyam15287 azizilyosov dieviegas knowledgehacker xiaoli0 qianrenjian olasojiamujo lingrui bablookumarroy trendingtechnology leo23 allensmile bamrami q710245300 curlykonda szq261299 tobertcnn bigapartmentsin whonor ligenxun jiniaoxu yueyedeai asimilyas victorof victorleejw guedes-joaofelipe haonanzhang-vv haolun-wu libin19861023 tpnguyen returaj brj9836 shaonakundu sayi21cn chard-to-go duanchao guoshenli vkolebcev christinataft worldie-com jatinsharechat eloubase aidenzich undarmaa ventful kbalde aliang-rec coderpql aitek230telu365 laikh1 benwaldner aucan fainan ambier demonoid81 peterhamfelt soumyapalissery noah-janzen pricilakrepeki

chameleon_recsys's Issues

Shape mismatch error while running run_nar_train_gcom_local.sh

 File "/home/mukesh_ubuntu/miniconda3/envs/chameleon/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 671, in run
    run_metadata=run_metadata)
  File "/home/mukesh_ubuntu/miniconda3/envs/chameleon/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1156, in run
    run_metadata=run_metadata)
  File "/home/mukesh_ubuntu/miniconda3/envs/chameleon/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1255, in run
    raise six.reraise(*original_exc_info)
  File "/home/mukesh_ubuntu/miniconda3/envs/chameleon/lib/python3.6/site-packages/six.py", line 693, in reraise
    raise value
  File "/home/mukesh_ubuntu/miniconda3/envs/chameleon/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1240, in run
    return self._sess.run(*args, **kwargs)
  File "/home/mukesh_ubuntu/miniconda3/envs/chameleon/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1320, in run
    run_metadata=run_metadata))
  File "/mnt/d/news_reco/chameleon_recsys/nar_module/nar/nar_model.py", line 1656, in after_run
    self.clicked_items_state.update_items_coocurrences(batch_clicked_items)
  File "/mnt/d/news_reco/chameleon_recsys/nar_module/nar/nar_model.py", line 1371, in update_items_coocurrences
    self.items_coocurrences[rows, cols] += 1
  File "/home/mukesh_ubuntu/miniconda3/envs/chameleon/lib/python3.6/site-packages/scipy/sparse/_index.py", line 124, in __setitem__
    raise ValueError("shape mismatch in assignment")
ValueError: shape mismatch in assignment

I am using globo dataset available on https://www.kaggle.com/gspmoreira/news-portal-user-interactions-by-globocom

Flag Mismatch for NAR Trainer

Hi,

In the runfile run_nar_train_gcom_local.sh, there seems to be a Flag mismatch.

In the code, only one path is required that should enable loading labels, metadata and embeddings for the ACR:

        tf.logging.info('Loading ACR module assets')
        acr_label_encoders, articles_metadata_df, content_article_embeddings_matrix = \
                load_acr_module_resources(FLAGS.acr_module_resources_path)

In the runfile, however, there are two Flags, for ACR metadata and embeddings, which are unused in the code.

Perhaps the runfile should be adjusted to execute the code properly (e.g. setting acr_module_resources_path), or alternatively, the code must be adjusted two handle the two Flags separately.

how to predict with the model

The code shows how to train and evaluate model ,but there seems is not enough information to predict with this model.

Content of Adressa News Dataset

Hi All,
The Adressa News Dataset has news content(main body) in Norwegian and not in English, as mentioned in their website as well. How did the authors handle this problem? Did they translate the content of each article or use it as it is?

Untraceable config number in nar_preprocess_adressa.py

Hi there,

Can anyone help me to find out where are these numbers come from?
For example, '_elapsed_ms_since_last_click' is an untraceable config. I cannot figure out why we should fill 1371436 for stddev or 789935.7 for avg.

Missing some objects from ACR module

Hi. After this commit it was removed some objects from the unloading process of acr module to work on kaggle files. But some references to it are in the code, throwing errors.
Here and here.
I removed those lines and tried to execute, but I'm still getting errors. Is this error in code or am I running wrong?

Missing ‘eval_sessions_negative_samples.json’ in benchmarks modules

Hi，

Can anyone help me find out the two files marked by a red circle in "chameleon_recsys/nar_module/scripts/benchmarks/run_sr-gnn_adressa.sh", or how to run sr-gnn module.
Best,

Requirements & Dependencies

Hello,
In order to run the code and reproduce results, one would need the specific package versions you used for the development and experiments.

Could you provide requirements.txt or some other specification of the packages and version?

In the README the dependencies appear to be underspecified. Especially with regards to the Tensorflow version 1.12.0, the required packages are crucial.

Thank you in advance.

Some links have expired

Some links of Adressa dataset preprocessing have expired ,such as dataproc_preprocessing/create_cluster.sh, dataproc_preprocessing/browse_cluster.sh,(dataproc_preprocessing/nar_preprocessing_addressa_01_dataproc.ipynb

Some more concise code suggestions during data preprocessing

def flatten_list_series(series_of_lists):
    return pd.DataFrame(series_of_lists.apply(pd.Series).stack().reset_index(name='item'))['item']

equals

def fun(series_of_lists):
    L =[]
    for i in series_of_lists:
        L.extend(i)
    return L

dict(articles_original_df[['url','id_encoded']].apply(lambda x: (x['url'], x['id_encoded']), axis=1).values)

equals to

articles_original_df.set_index('url')['id_encode'].to_dict()

Globo Dataset missing Assets (Labels)

Hi,

With the provided Globo dataset, we cannot train the ACR because of the article contents could not be provided. Progressing to the NAR training, it seems that some assets are missing.

In nar_trainer_gcom.py the following method tries to derserialise the labels, metadata and article embeddings from a pickle file. However, this pickle file is not provided, or, more precisely, the "acr_label_encoders" are missing.

tf.logging.info('Loading ACR module assets')
        acr_label_encoders, articles_metadata_df, content_article_embeddings_matrix = \
                load_acr_module_resources(FLAGS.acr_module_resources_path)

def load_acr_module_resources(acr_module_resources_path):
    (acr_label_encoders, articles_metadata_df, content_article_embeddings) = \
              deserialize(acr_module_resources_path)

    tf.logging.info("Read ACR label encoders for: {}".format(acr_label_encoders.keys()))
    tf.logging.info("Read ACR articles metadata: {}".format(len(articles_metadata_df)))
    tf.logging.info("Read ACR article content embeddings: {}".format(content_article_embeddings.shape))

    return acr_label_encoders, articles_metadata_df, content_article_embeddings

Similarly, in nar_utils.py, this method cannot be executed because the folder ''/pickles/" does not contain 'nar_label_encoders'

def load_nar_module_preprocessing_resources(nar_module_preprocessing_resources_path):
    #{'nar_label_encoders', 'nar_standard_scalers'}
    nar_resources = \
              deserialize(nar_module_preprocessing_resources_path)

    nar_label_encoders = nar_resources['nar_label_encoders']
    tf.logging.info("Read NAR label encoders for: {}".format(nar_label_encoders.keys()))

    return nar_label_encoders

How can we get these labels? Or am I overlooking something?

Thanks

Get predictions on input data

Hi,
When using NAR module, I want to see a rank list for predictions in other to have intuitive evaluation on recommendations articles after training process. But I only see the evaluation code to calculate metrics. Can you help me to get this ranking list?

Thank you very much.

"Shape mismatch" while running run_nar_train_adressa_local.sh

ERROR:tensorflow:ERROR: shape mismatch in assignment
Traceback (most recent call last):
  File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/hengshiou/Documents/chameleon_recsys/nar_module/nar/nar_trainer_adressa.py", line 573, in <module>
    tf.app.run()
  File "/usr/lib/python3.7/site-packages/tensorflow/python/platform/app.py", line 40, in run
    _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
  File "/usr/lib/python3.7/site-packages/absl/app.py", line 299, in run
    _run_main(main, args)
  File "/usr/lib/python3.7/site-packages/absl/app.py", line 250, in _run_main
    sys.exit(main(argv))
  File "/home/hengshiou/Documents/chameleon_recsys/nar_module/nar/nar_trainer_adressa.py", line 497, in main
    model.train(input_fn=lambda: prepare_dataset_iterator(training_files_chunk, session_features_config,
  File "/usr/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 367, in train
    loss = self._train_model(input_fn, hooks, saving_listeners)
  File "/usr/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1158, in _train_model
    return self._train_model_default(input_fn, hooks, saving_listeners)
  File "/usr/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1192, in _train_model_default
    saving_listeners)
  File "/usr/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1484, in _train_with_estimator_spec
    _, loss = mon_sess.run([estimator_spec.train_op, estimator_spec.loss])
  File "/usr/lib/python3.7/site-packages/tensorflow/python/training/monitored_session.py", line 754, in run
    run_metadata=run_metadata)
  File "/usr/lib/python3.7/site-packages/tensorflow/python/training/monitored_session.py", line 1252, in run
    run_metadata=run_metadata)
  File "/usr/lib/python3.7/site-packages/tensorflow/python/training/monitored_session.py", line 1353, in run
    raise six.reraise(*original_exc_info)
  File "/usr/lib/python3.7/site-packages/six.py", line 693, in reraise
    raise value
  File "/usr/lib/python3.7/site-packages/tensorflow/python/training/monitored_session.py", line 1338, in run
    return self._sess.run(*args, **kwargs)
  File "/usr/lib/python3.7/site-packages/tensorflow/python/training/monitored_session.py", line 1419, in run
    run_metadata=run_metadata))
  File "/home/hengshiou/Documents/chameleon_recsys/nar_module/nar/nar_model.py", line 1659, in after_run
    self.clicked_items_state.update_items_coocurrences(batch_clicked_items)
  File "/home/hengshiou/Documents/chameleon_recsys/nar_module/nar/nar_model.py", line 1371, in update_items_coocurrences
    self.items_coocurrences[rows, cols] += 1
  File "/usr/lib/python3.7/site-packages/scipy/sparse/_index.py", line 124, in __setitem__
    raise ValueError("shape mismatch in assignment")
ValueError: shape mismatch in assignment

Hi there, would you mind clarifying the issue right here? "the shape mismatch in assignment." I follow the given instruction on the README

Why not use the author info during the ACR?

Hi,
I have read the paper and I am wondering why the author was only considered during the NER part and not the ACR part. The paper doesn't really explain why this could not be considered as a metadata attribute or at least i am missing it.
Best,