Giter Club home page Giter Club logo

gabrielspmoreira / chameleon_recsys Goto Github PK

View Code? Open in Web Editor NEW
277.0 277.0 79.0 732 KB

Source code of CHAMELEON - A Deep Learning Meta-Architecture for News Recommender Systems

License: MIT License

Python 88.82% Shell 4.53% Jupyter Notebook 6.65%
deep-learning deep-neural-networks lstm lstm-neural-network lstm-neural-networks news-recommendation recommendation-algorithms recommendation-engine recommendation-system recommender-system rnn rnn-tensorflow tensorflow word-embeddings word2vec

chameleon_recsys's People

Contributors

gabrielspmoreira avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

chameleon_recsys's Issues

Shape mismatch error while running run_nar_train_gcom_local.sh

 File "/home/mukesh_ubuntu/miniconda3/envs/chameleon/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 671, in run
    run_metadata=run_metadata)
  File "/home/mukesh_ubuntu/miniconda3/envs/chameleon/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1156, in run
    run_metadata=run_metadata)
  File "/home/mukesh_ubuntu/miniconda3/envs/chameleon/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1255, in run
    raise six.reraise(*original_exc_info)
  File "/home/mukesh_ubuntu/miniconda3/envs/chameleon/lib/python3.6/site-packages/six.py", line 693, in reraise
    raise value
  File "/home/mukesh_ubuntu/miniconda3/envs/chameleon/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1240, in run
    return self._sess.run(*args, **kwargs)
  File "/home/mukesh_ubuntu/miniconda3/envs/chameleon/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1320, in run
    run_metadata=run_metadata))
  File "/mnt/d/news_reco/chameleon_recsys/nar_module/nar/nar_model.py", line 1656, in after_run
    self.clicked_items_state.update_items_coocurrences(batch_clicked_items)
  File "/mnt/d/news_reco/chameleon_recsys/nar_module/nar/nar_model.py", line 1371, in update_items_coocurrences
    self.items_coocurrences[rows, cols] += 1
  File "/home/mukesh_ubuntu/miniconda3/envs/chameleon/lib/python3.6/site-packages/scipy/sparse/_index.py", line 124, in __setitem__
    raise ValueError("shape mismatch in assignment")
ValueError: shape mismatch in assignment

I am using globo dataset available on https://www.kaggle.com/gspmoreira/news-portal-user-interactions-by-globocom

Flag Mismatch for NAR Trainer

Hi,

In the runfile run_nar_train_gcom_local.sh, there seems to be a Flag mismatch.

In the code, only one path is required that should enable loading labels, metadata and embeddings for the ACR:

        tf.logging.info('Loading ACR module assets')
        acr_label_encoders, articles_metadata_df, content_article_embeddings_matrix = \
                load_acr_module_resources(FLAGS.acr_module_resources_path)

In the runfile, however, there are two Flags, for ACR metadata and embeddings, which are unused in the code.

Perhaps the runfile should be adjusted to execute the code properly (e.g. setting acr_module_resources_path), or alternatively, the code must be adjusted two handle the two Flags separately.

Content of Adressa News Dataset

Hi All,
The Adressa News Dataset has news content(main body) in Norwegian and not in English, as mentioned in their website as well. How did the authors handle this problem? Did they translate the content of each article or use it as it is?

Untraceable config number in nar_preprocess_adressa.py

Screenshot from 2020-02-01 16-21-06

Hi there,

Can anyone help me to find out where are these numbers come from?
For example, '_elapsed_ms_since_last_click' is an untraceable config. I cannot figure out why we should fill 1371436 for stddev or 789935.7 for avg.

Missing some objects from ACR module

Hi. After this commit it was removed some objects from the unloading process of acr module to work on kaggle files. But some references to it are in the code, throwing errors.
Here and here.
I removed those lines and tried to execute, but I'm still getting errors. Is this error in code or am I running wrong?

Requirements & Dependencies

Hello,
In order to run the code and reproduce results, one would need the specific package versions you used for the development and experiments.

Could you provide requirements.txt or some other specification of the packages and version?

In the README the dependencies appear to be underspecified. Especially with regards to the Tensorflow version 1.12.0, the required packages are crucial.

Thank you in advance.

Some links have expired

Some links of Adressa dataset preprocessing have expired ,such as dataproc_preprocessing/create_cluster.sh, dataproc_preprocessing/browse_cluster.sh,(dataproc_preprocessing/nar_preprocessing_addressa_01_dataproc.ipynb

Some more concise code suggestions during data preprocessing

def flatten_list_series(series_of_lists):
    return pd.DataFrame(series_of_lists.apply(pd.Series).stack().reset_index(name='item'))['item']

equals

def fun(series_of_lists):
    L =[]
    for i in series_of_lists:
        L.extend(i)
    return L 
dict(articles_original_df[['url','id_encoded']].apply(lambda x: (x['url'], x['id_encoded']), axis=1).values)

equals to

articles_original_df.set_index('url')['id_encode'].to_dict()

Globo Dataset missing Assets (Labels)

Hi,

With the provided Globo dataset, we cannot train the ACR because of the article contents could not be provided. Progressing to the NAR training, it seems that some assets are missing.

In nar_trainer_gcom.py the following method tries to derserialise the labels, metadata and article embeddings from a pickle file. However, this pickle file is not provided, or, more precisely, the "acr_label_encoders" are missing.

tf.logging.info('Loading ACR module assets')
        acr_label_encoders, articles_metadata_df, content_article_embeddings_matrix = \
                load_acr_module_resources(FLAGS.acr_module_resources_path)

def load_acr_module_resources(acr_module_resources_path):
    (acr_label_encoders, articles_metadata_df, content_article_embeddings) = \
              deserialize(acr_module_resources_path)

    tf.logging.info("Read ACR label encoders for: {}".format(acr_label_encoders.keys()))
    tf.logging.info("Read ACR articles metadata: {}".format(len(articles_metadata_df)))
    tf.logging.info("Read ACR article content embeddings: {}".format(content_article_embeddings.shape))

    return acr_label_encoders, articles_metadata_df, content_article_embeddings

Similarly, in nar_utils.py, this method cannot be executed because the folder ''/pickles/" does not contain 'nar_label_encoders'

def load_nar_module_preprocessing_resources(nar_module_preprocessing_resources_path):
    #{'nar_label_encoders', 'nar_standard_scalers'}
    nar_resources = \
              deserialize(nar_module_preprocessing_resources_path)

    nar_label_encoders = nar_resources['nar_label_encoders']
    tf.logging.info("Read NAR label encoders for: {}".format(nar_label_encoders.keys()))

    return nar_label_encoders    

How can we get these labels? Or am I overlooking something?

Thanks

Get predictions on input data

Hi,
When using NAR module, I want to see a rank list for predictions in other to have intuitive evaluation on recommendations articles after training process. But I only see the evaluation code to calculate metrics. Can you help me to get this ranking list?

Thank you very much.

"Shape mismatch" while running run_nar_train_adressa_local.sh

ERROR:tensorflow:ERROR: shape mismatch in assignment
Traceback (most recent call last):
  File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/hengshiou/Documents/chameleon_recsys/nar_module/nar/nar_trainer_adressa.py", line 573, in <module>
    tf.app.run()
  File "/usr/lib/python3.7/site-packages/tensorflow/python/platform/app.py", line 40, in run
    _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
  File "/usr/lib/python3.7/site-packages/absl/app.py", line 299, in run
    _run_main(main, args)
  File "/usr/lib/python3.7/site-packages/absl/app.py", line 250, in _run_main
    sys.exit(main(argv))
  File "/home/hengshiou/Documents/chameleon_recsys/nar_module/nar/nar_trainer_adressa.py", line 497, in main
    model.train(input_fn=lambda: prepare_dataset_iterator(training_files_chunk, session_features_config,
  File "/usr/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 367, in train
    loss = self._train_model(input_fn, hooks, saving_listeners)
  File "/usr/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1158, in _train_model
    return self._train_model_default(input_fn, hooks, saving_listeners)
  File "/usr/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1192, in _train_model_default
    saving_listeners)
  File "/usr/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1484, in _train_with_estimator_spec
    _, loss = mon_sess.run([estimator_spec.train_op, estimator_spec.loss])
  File "/usr/lib/python3.7/site-packages/tensorflow/python/training/monitored_session.py", line 754, in run
    run_metadata=run_metadata)
  File "/usr/lib/python3.7/site-packages/tensorflow/python/training/monitored_session.py", line 1252, in run
    run_metadata=run_metadata)
  File "/usr/lib/python3.7/site-packages/tensorflow/python/training/monitored_session.py", line 1353, in run
    raise six.reraise(*original_exc_info)
  File "/usr/lib/python3.7/site-packages/six.py", line 693, in reraise
    raise value
  File "/usr/lib/python3.7/site-packages/tensorflow/python/training/monitored_session.py", line 1338, in run
    return self._sess.run(*args, **kwargs)
  File "/usr/lib/python3.7/site-packages/tensorflow/python/training/monitored_session.py", line 1419, in run
    run_metadata=run_metadata))
  File "/home/hengshiou/Documents/chameleon_recsys/nar_module/nar/nar_model.py", line 1659, in after_run
    self.clicked_items_state.update_items_coocurrences(batch_clicked_items)
  File "/home/hengshiou/Documents/chameleon_recsys/nar_module/nar/nar_model.py", line 1371, in update_items_coocurrences
    self.items_coocurrences[rows, cols] += 1
  File "/usr/lib/python3.7/site-packages/scipy/sparse/_index.py", line 124, in __setitem__
    raise ValueError("shape mismatch in assignment")
ValueError: shape mismatch in assignment

Hi there, would you mind clarifying the issue right here? "the shape mismatch in assignment." I follow the given instruction on the README

Why not use the author info during the ACR?

Hi,
I have read the paper and I am wondering why the author was only considered during the NER part and not the ACR part. The paper doesn't really explain why this could not be considered as a metadata attribute or at least i am missing it.
Best,

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.