Giter Club home page Giter Club logo

bundle_recommendation's Introduction

Bundle Recommendation

This project aims to provide new data sources for product bundling in real e-commerce platforms with the domains of Electronic, Clothing and Food. We construct three high-quality bundle datasets with rich meta information, particularly bundle intents, through a carefully designed crowd-sourcing task.

1. Worker Basic Information

Figure 1 shows the distribution of workers' age, education, country, occupation, gender and shopping frequency for the two batches. In particular, `Others' in the country distribution includes Argentina, Australia, Anguilla, Netherlands, Albania, Georgia, Tunisia, Belgium, Armenia, Guinea, Austria, Switzerland, Iceland, Lithuania, Egypt, Venezuela, Bangladesh, American Samoa, Vanuatu, Colombia, United Arab Emirates, Ashmore and Cartier Island, Estados Unidos, Wales, Turkey, Angola, Scotland, Philippines, Iran and Bahamas.

basic_information

Figure 1: Worker basic information in the first and second batches.

2. Parameter Tuning and Settings for Bundle Detection

A grid search in {0.0001, 0.001, 0.01} is applied to find out the optimal settings for support and confidence, and both are set as 0.001 across the three domains.

3. Parameter Tuning and Settings for Bundle Completion

The dimension d of item and bundle representations for all methods is 20. Grid search is adopted to find out the best settings for other key parameters. In particular, learning rate and regularization coefficient are searched in {0.0001, 0.001, 0.01}; the number of neighbors K in ItemKNN is searched in {10, 20, 30, 50}; the weight of KL divergence in VAE is searched in {0.001, 0.01, 0.1}; and the batch size is searched in {64, 128, 256}. The optimal parameter settings are shown in Table 1.

        Table 1: Parameter settings for bundle completion (d=20).

Electronic Clothing Food
ItemKNN equation equation equation
BPRMF equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
mean-VAE equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
concat-VAE equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation

4. Parameter Tuning and Settings for Bundle Ranking

The dimension d of representations is set as 20. We apply a same grid search for , , and batch size as in bundle completion. Besides, the predictive layer D for AttList is searched from {20, 50, 100}; the node and message dropout rate for GCN and BGCN is searched in {0, 0.1, 0.3, 0.5}. As the training complexity for GCN and BGCN is quite high, we set the batch size as 2048 as suggested by the original paper. The optimal parameter settings are presented in Table 2. Note that the parameter settings for BGCN is the version without pre-training (i.e. ).

       Table 2: Parameter settings for bundle ranking (d=20).

Electronic Clothing Food
ItemKNN equation equation equation
BPRMF equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
DAM equation
equation
equation
equation
equation
equation
equation
equation
equation
AttList equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
GCN equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
BGCN equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation

5. Statistics of Datasets

        Table 3: Statistics of datasets.

Electronic Clothing Food
#Users 888 965 879
#Items 3499 4487 3767
#Sessions 1145 1181 1161
#Bundles 1750 1910 1784
#Intents 1422 1466 1156
Average Bundle Size 3.52 3.31 3.58
#User-Item Interactions 6165 6326 6395
#User-Bundle Interactions 1753 1912 1785
Density of User-Item Interactions 0.20% 0.15% 0.19%
Density of User-Bundle Interactions 0.11% 0.10% 0.11%

6. Descriptions of Data Files

Under the 'dataset' folder, there are three domains, including clothing, electronic and food. Each domain contains the following 9 data files.

Table 4: The descriptions of the data files.

File Name Descriptions
user_item_pretrain.csv This file contains the user-item interactions aiming to obtain the pre-trained item representations via BPRMF for model initialization.
This is a tab separated list with 3 columns: user ID | item ID | timestamp |
user_item.csv This file contains the user-item interactions.
This is a tab separated list with 3 columns: user ID | item ID | timestamp |
session_item.csv This file contains sessions and their associated items. Each session has at least 2 items.
This is a tab separated list with 2 columns: session ID | item ID |
user_session.csv This file contains users and their associated sessions.
This is a tab separated list with 3 columns: user ID | session ID | timestamp |
session_bundle.csv This file contains sessions and their detected bundles. Each session has at least 1 bundle.
This is a tab separated list with 2 columns: session ID | bundle ID |
The session ID contained in the session_item.csv but not in session_bundle.csv indicates there is no bundle detected in this session.
bundle_intent.csv This file contains bundles and their annotated intents.
This is a tab separated list with 2 columns: bundle ID | intent |
bundle_item.csv This file contains bundles and their associated items. Each bundle has at least 2 items.
This is a tab separated list with 2 columns: bundle ID | item ID |
user_bundle.csv This file contains the user-bundle interactions.
This is a tab separated list with 3 columns: user ID | bundle ID | timestamp |
item_categories.csv This file contains items and their affiliated categories.
This is a tab separated list with 2 columns: item ID | categories |
The format of data in categories column is a list of string.

Acknowledgements

Our datasets are constructed on the basis of Amazon datasets (http://jmcauley.ucsd.edu/data/amazon/links.html).

bundle_recommendation's People

Contributors

bundlerec avatar sunzhuntu avatar kaidf avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.