contentwise / contentwise-impressions Goto Github PK

This repository contains the code used to run generate the data splits, run the hyperparameter tunings, and export the results presented in our article "ContentWise Impressions: An industrial dataset with impressions included"

Home Page: https://forms.office.com/Pages/ResponsePage.aspx?id=K3EXCvNtXUKAjjCd8ope6_zxBj9DRzpKnC4jkclZQupUQ0szOVhTQ1FCT0tZSEw1T1g0RzVBRVhSSC4u

License: GNU Affero General Public License v3.0

Python 81.30% Jupyter Notebook 1.66% Cython 17.04%

dataset collaborative-filtering python3 recommender-systems impressions interactions dask cython

contentwise-impressions's People

Contributors

Stargazers

Watchers

Forkers

hosseinfani recsyspolimi

contentwise-impressions's Issues

Inquiry about recommendation id

Hi there,

Thanks for sharing the dataset. I have read through the corresponding paper and got a few questions that I hope could be clarified.

In the impression with direct link dataset, each row is associated with a recommendation id. May I check if this id is the identifier of the recommendation algorithm used to generate the row or is it purely just treated as a key to merge the impression with direct link dataset with the interaction dataset? If the latter is the case, can I ask if there is anyway to distinguish impressions generated by different recommendation algorithms?

Thanks for in advance for your help. :)

Best wishes,
Helen

Cannot read the dataset

I receive an exception at line# 48 when trying to run <run_generate_splits.py> and read the dataset. The exception is raised from the apache parquet. I have installed pyarrow-0.17.1. Below is the stacktrace:

File "contentwise-impressions-master/run_generate_splits.py", line 48, in
dataset.read_dataset()
File "contentwise-impressions-master\Utils\decorator.py", line 21, in timed
result = method(*args, **kwargs)
File "contentwise-impressions-master\Utils\dataset.py", line 229, in read_dataset
self._local_interactions_file_path)
File "contentwise-impressions-master\Utils\dataset.py", line 213, in _read_parquet
return ddf.read_parquet(path=local_file_path, engine="pyarrow")
File "C:\Program Files\Anaconda3\lib\site-packages\dask\dataframe\io\parquet.py", line 1154, in read_parquet
categories=categories, index=index, infer_divisions=infer_divisions)
File "C:\Program Files\Anaconda3\lib\site-packages\dask\dataframe\io\parquet.py", line 706, in _read_pyarrow
pf = piece.get_metadata(_open)
TypeError: get_metadata() takes 1 positional argument but 2 were given

Problem

Hello,
when i used this datase, i had a problem as follows.
The situation 'when a impressions_without_direct_link is presented, a user interacts with a item which not appear in the impressions ' is what situation??
is it contained in rec_id=-1 in interactions.csv?

Hope your reply, thank you very much!

No timestamps for impressions

Can you add a timestamp for impressions without direct link? Without that, any time-based split is not possible.

Can I get Item content matrix?

Hello, Thanks for your amazing work!

I'm now checking the baseline you provided, before starting my research. And I found that 'ItemKNNRecommender.py' utilizes the item content matrix to compute similarity. But the paper and the dataset you published don't contain item information except for series length and type of the episodes. And there are no parts to make the ICM matrix in 'dataset.py'. My main questions are as follows.

Are the results you reported on the paper as an Item KNN based on the user rating matrix?
If not, how can I get the item content matrix you used to report results?

Once again, I'm really thankful for providing a great dataset!

Validation and test splits are mixed up.

It looks like a bug in dataset.py:

        self.URM = {
            "train": train_split,
            "test": validation_split,
            "validation": test_split
        }

validation_split has name "test" and test_split has name "validation".

Could We Use `*-direct-link` Data to Reconstruct the Recommendation Grid?

Hi ContentWise team,

Thanks for this great work! I have a question about reconstructing the grid of recommended items.

Since we have impressions-*direct-link datasets that contain row positions of recommended series lists. I wonder if it is possible to reconstruct the grid of recommendation items that user viewed before?

For example,

Find [user1, rec1, row=1] from impressions-direct-link dataset (user identifier is retrieved from interactions);
Find [user1, row={2,3,4,5}] from impressions-none-direct-link dataset;
Combine them together as a 5-row grid of recommendations where user1 interacted with the first row only.

Looking forward to hearing your suggestions about this idea. Is it possible (since we don't know the timestamps in impressions-none-direct-link)?

contentwise / contentwise-impressions Goto Github PK

contentwise-impressions's People

Contributors

Stargazers

Watchers

Forkers

contentwise-impressions's Issues

Inquiry about recommendation id

Cannot read the dataset

Problem

No timestamps for impressions

Can I get Item content matrix?

Validation and test splits are mixed up.

Could We Use `*-direct-link` Data to Reconstruct the Recommendation Grid?

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent