rn5l / session-rec Goto Github PK

View Code? Open in Web Editor NEW

382.0 10.0 77.0 2.23 MB

Python-based framework for building and evaluating session-based and session-aware recommender systems.

Python 93.64% Shell 0.04% Dockerfile 0.03% Batchfile 0.01% CSS 1.17% HTML 5.08% JavaScript 0.02%

session-rec's People

Stargazers

Watchers

session-rec's Issues

self.last_ts

Dear Developers,

This issue related to vsknn.py

I mentioned that self.last_ts=-1
Then there is a code:

if self.dwelling_time:
                if self.last_ts > 0:
                    self.dwelling_times.append( timestamp - self.last_ts )
                self.last_ts = timestamp

I think self.last_ts>0 is never true. So is it ok if I remove the lines of code:

 if self.last_ts > 0:
                    self.dwelling_times.append( timestamp - self.last_ts )

Will it affect to the result?

Which knn algos are user-based, and which are item-base?

the code is so long, and your answer will help me. thank you.

How to Configure Diginetica dataset to run in the framework

I have processed Diginetica data by running python run_preprocessing.py conf/preprocess/session_based/window/diginetica.yml and obtained several files such as train-item-views_train_full.0. I renamed the folder from data/diginetica/slices to data/diginetica/prepared.
After preprocessing the data, I used conf/example_all_neural.yml and configured it for diginetica data by changing yml file to look as follows

type: single # single|window|opt
key: baselines_and_models #added to the csv names
evaluation: evaluation_user_based # evaluation_user_based
data:
  name: diginetica #added in the end of the csv names
  folder: data/diginetica/prepared/
  prefix: train-item-views
  type: csv # hdf|csv(default)

results:
  folder: results/session-aware/diginetica/

After that I ran THEANO_FLAGS="device=cuda0,floatX=float32" CUDA_DEVICE_ORDER=PCI_BUS_ID python run_config.py conf/example_all_neural.yml and I am getting an error that says

Using TensorFlow backend.
init
Traceback (most recent call last):
  File "run_config.py", line 62, in main
    run_file(c)
  File "run_config.py", line 164, in run_file
    run_single(conf)
  File "run_config.py", line 213, in run_single
    m.init(train)
UnboundLocalError: local variable 'train' referenced before assignment

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "run_config.py", line 919, in <module>
    main(sys.argv[1], out=sys.argv[2] if len(sys.argv) > 2 else None)
  File "run_config.py", line 73, in main
    print('error for config ', list[0])
UnboundLocalError: local variable 'list' referenced before assignment

I have tested my srec37 environment on yoochoose data and it trains without a problem. I am wondering what is wrong with diginetica configuration, or I might have skipped or overlooked something.

Can't run using Docker

Hey,

Thanks for this amazing git! I pulled your docker file, but failed to run following your steps.

When I run ./dpython run_preprocesing.py conf/preprocess/window/rsc15.yml, it is telling me that

python: can't open file 'run_preprocesing.py': [Errno 2] No such file or directory

When I run ./dpython run_config.py conf/in conf/out or ./dpython run_config.py conf/example_next.yml, it is telling me that

Traceback (most recent call last):
File "run_config.py", line 12, in
import yaml
ImportError: No module named 'yaml'

I am under session-rec folder. Do you have any idea what might cause these?

pickle_model `# not working for tensorflow models` label

Hi, thanks for making this repository and work public.

Which is the recommended way of "exporting" a model trained through the experiments? I mean, I was requested at my research to get a trained model, put a new input (like, another session) and see what is the recommended outcome.

I've noticed there's the pickle_model label at the conf files, but most of them are labelled as not working for tensorflow models.

session-rec/conf/save/8tracks/window/window_8tracks_sgnn.yml

Lines 12 to 14 in 3b13883

 results: 

 folder: results/window/8tracks/ 

 #pickle_models: results/models/music-window/ # not working for tensorflow models

session-rec/run_config.py

Lines 570 to 588 in 3e6d76e

 def save_model(key, algorithm, conf): 

 ''' 

  Save the model object for reuse with FileModel 

  -------- 

  algorithm : object 

  Dictionary of all results res[algorithm_key][metric_key] 

  conf : object 

  Configuration dictionary, has to include results.pickel_models 

  ''' 

 file_name = conf['results']['folder'] + '/' + conf['key'] + '_' + conf['data']['name'] + '_' + key + '.pkl' 

 file_name = Path(file_name) 

 ensure_dir(file_name) 

 file = open(file_name, 'wb') 

 # pickle.dump(algorithm, file) 

 dill.dump(algorithm, file) 

 file.close()

Is there a recommended way of dealing with this? Maybe something simple that would take less effort than fixing the pickle_model problem?

What dose the rpop stand for?

What kind of popularity recommendation is it?

Could not find the VSTAN algorithm implementation

Is the VSTAN algorithm implementation available in this repo? I could not find it.

Variable not found in preprocessing

The two path+file and variables are missing in the run_preprocessing.py.

session-rec/run_preprocessing.py

Line 78 in d5aca66

method_to_call( data, path+file, **conf['params'] )

Index out of Bounds Error for Session-Aware Recommendation (Opt and Single Mode)

Using a config file based off of the example_session_aware_opt.yml. This error persists with many different models, but I am currently working with hgru4rec.py.

My data set is labeled with the default headers of [SessionId, ItemId, Time, UserId], and I have ensured there are no new new items or users in the valid/test set.

Any help would be appreciated.

- class: hgru4rec.hgru4rec.HGRU4Rec
  params: {
    n_epochs: 1, 
    session_layers: 10,
    user_layers: 10,
    loss: 'top1'
  }

START evaluation of  10000  actions in  5000  sessions
    eval process:  0  of  10000 actions:  0.0  % in 0.08836889266967773 s
Traceback (most recent call last):
  File "run_config.py", line 62, in main
    run_file(c)
  File "run_config.py", line 169, in run_file
    run_opt(conf)
  File "run_config.py", line 417, in run_opt
    run_opt_single(conf, i, globals)
  File "run_config.py", line 277, in run_opt_single
    eval_algorithm(train, test, k, a, evaluation, metrics, results, conf, iteration=iteration, out=False)
  File "run_config.py", line 515, in eval_algorithm
    results[key] = eval.evaluate_sessions(algorithm, metrics, test, train)
  File "<arbitrary_path>/session-rec/evaluation/evaluation_user_based.py", line 170, in evaluate_sessions
    m.add(preds, rest[0], for_item=current_item, session=current_session, position=position)
IndexError: index 0 is out of bounds for axis 0 with size 0

self.min_time

Dear Developers,

This issue related to vsknn.py

There is attribute self.min_time of vsknn class
I saw this attribute only three times. Is it ok if I remove the line of code

if time < self.min_time :
                        self.min_time = time

?
I couldn't catch that self.min_time effected to the result. If it effects then could you explain it, please?
Thanks in advance!

timestamp parameter of predict_next()

Dear Developers,

This issue related to vsknn.py

What is the role of timestamp parameter in predict_next() method? Could you give a brief explanation? When timestamp=0 then it means timestamp equals to the current time?
Thanks in advance!

Better performance after fine-tune parameters

Hi, I am running your code on Diginitica datasets. In concrete, I use sgnn.py (SR-GNN) as an example and find that I can get better results if I tune the parameter from lr=0.0001 l2=0.000007 lr_dc=0.63 to lr=0.001 l2=0.00001 lr_dc=0.1, just the same in the SR-GNN original paper.

The recall@20 and Mrr@20 can be improved from 30.70 15.41 to 45.52 15.91. If so, I wonder whether the conclusion reported in your paper that heuristic models get better results than such neural-based methods is proper.

Thank you and best regards.

The conclusions in your paper raise some concerns.

Hi!

I still have some concerns about the parameter settings you choose. Would you please help me verify my working process? First, I download the Diginetica datasets from the official link. Next, I use the config file "conf/preprocess/session-based/diginetica.yml" and preprocess.py to generate processed datasets. Then, I use the config file "conf/save/diginetica/window/window_digi_sgnn.yml" and run_config.py to train the SR-GNN model. When I change the parameter settings (window_digi_sgnn.yml, line 35) from params: { lr: 0.0001, l2: 0.000007, lr_dc: 0.63, lr_dc_step: 3, nonhybrid: True, epoch_n: 10 } to params: { lr: 0.001, l2: 0.00001, lr_dc: 0.1, lr_dc_step: 3, nonhybrid: True, epoch_n: 10 }, the improvements can be significantly observed (both loss and HR & MRR results). I check this repository because SR-GNN has been proved more robust than the previous methods in all recent deep-learning literature on the Diginetica dataset. But, in your work Empirical analysis of session-based recommendation algorithms (UMUAI-2020), the reported HR@20 of SR-GNN is only 0.3638, much lower than their official report of 0.5073. I can understand that using the sub-dataset will degrade the performance of deep-learning-based methods, but the magnitude of degradation is too much.

Besides, I have noticed that the best neural method you claimed is GRU4REC (you use the official code of the improved version GRU4REC-TOPK (2018) instead of the original GRU4REC (2016)). However, according to the results in RepeatNet: A Repeat Aware Neural Recommendation Machine for Session-Based Recommendation (AAAI-2019), the HR@20 and MRR@20 are 0.4523 and 0.1416, but in your work Empirical analysis of session-based recommendation algorithms, the HR@20 and MRR@20 are 0.4639 and 0.1644, even higher than results on a larger dataset. So it's weird!

I am not sure whether I have made some mistakes in my experiments, and I would appreciate it if I could get constructive responses from you.

Best regards.

missing examples with MF

Hi, there are some papers published adding the matrix fatorization models at their benchmarks.

However, these models are missing at the .yml configuration examples.

Why most of probability score for recommendations so low

Hi, may I ask a question regarding the recommendation results? I tested multiple algo on different dataset, but besides the first recommendation, most of the score is very low even equals to 0. I saw the same issue in the data/rsc15/recommendations files. Is this the common fact for session based algo? Can you help me to understand? Thanks.

	results:
	folder: results/window/8tracks/
	#pickle_models: results/models/music-window/ # not working for tensorflow models

	def save_model(key, algorithm, conf):
	'''
	Save the model object for reuse with FileModel
	--------
	algorithm : object
	Dictionary of all results res[algorithm_key][metric_key]
	conf : object
	Configuration dictionary, has to include results.pickel_models
	'''

	file_name = conf['results']['folder'] + '/' + conf['key'] + '_' + conf['data']['name'] + '_' + key + '.pkl'
	file_name = Path(file_name)
	ensure_dir(file_name)
	file = open(file_name, 'wb')

	# pickle.dump(algorithm, file)
	dill.dump(algorithm, file)

	file.close()

rn5l / session-rec Goto Github PK

session-rec's People

Stargazers

Watchers

Forkers

session-rec's Issues

Recommend Projects

Recommend Topics

Recommend Org