gabrielspmoreira / chameleon_recsys Goto Github PK
View Code? Open in Web Editor NEWSource code of CHAMELEON - A Deep Learning Meta-Architecture for News Recommender Systems
License: MIT License
Source code of CHAMELEON - A Deep Learning Meta-Architecture for News Recommender Systems
License: MIT License
File "/home/mukesh_ubuntu/miniconda3/envs/chameleon/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 671, in run
run_metadata=run_metadata)
File "/home/mukesh_ubuntu/miniconda3/envs/chameleon/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1156, in run
run_metadata=run_metadata)
File "/home/mukesh_ubuntu/miniconda3/envs/chameleon/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1255, in run
raise six.reraise(*original_exc_info)
File "/home/mukesh_ubuntu/miniconda3/envs/chameleon/lib/python3.6/site-packages/six.py", line 693, in reraise
raise value
File "/home/mukesh_ubuntu/miniconda3/envs/chameleon/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1240, in run
return self._sess.run(*args, **kwargs)
File "/home/mukesh_ubuntu/miniconda3/envs/chameleon/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1320, in run
run_metadata=run_metadata))
File "/mnt/d/news_reco/chameleon_recsys/nar_module/nar/nar_model.py", line 1656, in after_run
self.clicked_items_state.update_items_coocurrences(batch_clicked_items)
File "/mnt/d/news_reco/chameleon_recsys/nar_module/nar/nar_model.py", line 1371, in update_items_coocurrences
self.items_coocurrences[rows, cols] += 1
File "/home/mukesh_ubuntu/miniconda3/envs/chameleon/lib/python3.6/site-packages/scipy/sparse/_index.py", line 124, in __setitem__
raise ValueError("shape mismatch in assignment")
ValueError: shape mismatch in assignment
I am using globo dataset available on https://www.kaggle.com/gspmoreira/news-portal-user-interactions-by-globocom
Hi,
In the runfile run_nar_train_gcom_local.sh
, there seems to be a Flag mismatch.
In the code, only one path is required that should enable loading labels, metadata and embeddings for the ACR:
tf.logging.info('Loading ACR module assets')
acr_label_encoders, articles_metadata_df, content_article_embeddings_matrix = \
load_acr_module_resources(FLAGS.acr_module_resources_path)
In the runfile, however, there are two Flags, for ACR metadata and embeddings, which are unused in the code.
Perhaps the runfile should be adjusted to execute the code properly (e.g. setting acr_module_resources_path
), or alternatively, the code must be adjusted two handle the two Flags separately.
The code shows how to train and evaluate model ,but there seems is not enough information to predict with this model.
Hi All,
The Adressa News Dataset has news content(main body) in Norwegian and not in English, as mentioned in their website as well. How did the authors handle this problem? Did they translate the content of each article or use it as it is?
Hi. After this commit it was removed some objects from the unloading process of acr module to work on kaggle files. But some references to it are in the code, throwing errors.
Here and here.
I removed those lines and tried to execute, but I'm still getting errors. Is this error in code or am I running wrong?
Hello,
In order to run the code and reproduce results, one would need the specific package versions you used for the development and experiments.
Could you provide requirements.txt
or some other specification of the packages and version?
In the README the dependencies appear to be underspecified. Especially with regards to the Tensorflow version 1.12.0, the required packages are crucial.
Thank you in advance.
Some links of Adressa dataset preprocessing have expired ,such as dataproc_preprocessing/create_cluster.sh, dataproc_preprocessing/browse_cluster.sh,(dataproc_preprocessing/nar_preprocessing_addressa_01_dataproc.ipynb
def flatten_list_series(series_of_lists):
return pd.DataFrame(series_of_lists.apply(pd.Series).stack().reset_index(name='item'))['item']
equals
def fun(series_of_lists):
L =[]
for i in series_of_lists:
L.extend(i)
return L
dict(articles_original_df[['url','id_encoded']].apply(lambda x: (x['url'], x['id_encoded']), axis=1).values)
equals to
articles_original_df.set_index('url')['id_encode'].to_dict()
Hi,
With the provided Globo dataset, we cannot train the ACR because of the article contents could not be provided. Progressing to the NAR training, it seems that some assets are missing.
In nar_trainer_gcom.py
the following method tries to derserialise the labels, metadata and article embeddings from a pickle file. However, this pickle file is not provided, or, more precisely, the "acr_label_encoders" are missing.
tf.logging.info('Loading ACR module assets')
acr_label_encoders, articles_metadata_df, content_article_embeddings_matrix = \
load_acr_module_resources(FLAGS.acr_module_resources_path)
def load_acr_module_resources(acr_module_resources_path):
(acr_label_encoders, articles_metadata_df, content_article_embeddings) = \
deserialize(acr_module_resources_path)
tf.logging.info("Read ACR label encoders for: {}".format(acr_label_encoders.keys()))
tf.logging.info("Read ACR articles metadata: {}".format(len(articles_metadata_df)))
tf.logging.info("Read ACR article content embeddings: {}".format(content_article_embeddings.shape))
return acr_label_encoders, articles_metadata_df, content_article_embeddings
Similarly, in nar_utils.py
, this method cannot be executed because the folder ''/pickles/" does not contain 'nar_label_encoders'
def load_nar_module_preprocessing_resources(nar_module_preprocessing_resources_path):
#{'nar_label_encoders', 'nar_standard_scalers'}
nar_resources = \
deserialize(nar_module_preprocessing_resources_path)
nar_label_encoders = nar_resources['nar_label_encoders']
tf.logging.info("Read NAR label encoders for: {}".format(nar_label_encoders.keys()))
return nar_label_encoders
How can we get these labels? Or am I overlooking something?
Thanks
Hi,
When using NAR module, I want to see a rank list for predictions in other to have intuitive evaluation on recommendations articles after training process. But I only see the evaluation code to calculate metrics. Can you help me to get this ranking list?
Thank you very much.
ERROR:tensorflow:ERROR: shape mismatch in assignment
Traceback (most recent call last):
File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/hengshiou/Documents/chameleon_recsys/nar_module/nar/nar_trainer_adressa.py", line 573, in <module>
tf.app.run()
File "/usr/lib/python3.7/site-packages/tensorflow/python/platform/app.py", line 40, in run
_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
File "/usr/lib/python3.7/site-packages/absl/app.py", line 299, in run
_run_main(main, args)
File "/usr/lib/python3.7/site-packages/absl/app.py", line 250, in _run_main
sys.exit(main(argv))
File "/home/hengshiou/Documents/chameleon_recsys/nar_module/nar/nar_trainer_adressa.py", line 497, in main
model.train(input_fn=lambda: prepare_dataset_iterator(training_files_chunk, session_features_config,
File "/usr/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 367, in train
loss = self._train_model(input_fn, hooks, saving_listeners)
File "/usr/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1158, in _train_model
return self._train_model_default(input_fn, hooks, saving_listeners)
File "/usr/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1192, in _train_model_default
saving_listeners)
File "/usr/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1484, in _train_with_estimator_spec
_, loss = mon_sess.run([estimator_spec.train_op, estimator_spec.loss])
File "/usr/lib/python3.7/site-packages/tensorflow/python/training/monitored_session.py", line 754, in run
run_metadata=run_metadata)
File "/usr/lib/python3.7/site-packages/tensorflow/python/training/monitored_session.py", line 1252, in run
run_metadata=run_metadata)
File "/usr/lib/python3.7/site-packages/tensorflow/python/training/monitored_session.py", line 1353, in run
raise six.reraise(*original_exc_info)
File "/usr/lib/python3.7/site-packages/six.py", line 693, in reraise
raise value
File "/usr/lib/python3.7/site-packages/tensorflow/python/training/monitored_session.py", line 1338, in run
return self._sess.run(*args, **kwargs)
File "/usr/lib/python3.7/site-packages/tensorflow/python/training/monitored_session.py", line 1419, in run
run_metadata=run_metadata))
File "/home/hengshiou/Documents/chameleon_recsys/nar_module/nar/nar_model.py", line 1659, in after_run
self.clicked_items_state.update_items_coocurrences(batch_clicked_items)
File "/home/hengshiou/Documents/chameleon_recsys/nar_module/nar/nar_model.py", line 1371, in update_items_coocurrences
self.items_coocurrences[rows, cols] += 1
File "/usr/lib/python3.7/site-packages/scipy/sparse/_index.py", line 124, in __setitem__
raise ValueError("shape mismatch in assignment")
ValueError: shape mismatch in assignment
Hi there, would you mind clarifying the issue right here? "the shape mismatch in assignment." I follow the given instruction on the README
Hi,
I have read the paper and I am wondering why the author was only considered during the NER part and not the ACR part. The paper doesn't really explain why this could not be considered as a metadata attribute or at least i am missing it.
Best,
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.