This project requires Python 2.7 and the following Python libraries installed:
You will also need to have software installed to run and execute a Jupyter Notebook
I don't know what will happen with using Python 3.x :)
Download the original data from official website, unzip then copy .csv files into original_data
folder.
python init_data.py
Please wait a minute. It will create data
folder containing cleaned .csv files.
In a terminal or command window, navigate to the top-level project directory JData
(that contains this README) and run one of the following commands:
ipython notebook prepare.ipynb
or
jupyter notebook prepare.ipynb
This will open the Jupyter Notebook software and project file in your browser.
The original dataset has the following attributes:
1. 用户数据
user_id | 用户ID | 脱敏 |
age | 年龄段 | -1表示未知 |
sex | 性别 | 0表示男,1表示女,2表示保密 |
user_lv_cd | 用户等级 | 有顺序的级别枚举,越高级别数字越大 |
user_reg_tm | 用户注册日期 | 粒度到天 |
2. 商品数据
sku_id | 商品编号 | 脱敏 |
a1 | 属性1 | 枚举,-1表示未知 |
a2 | 属性2 | 枚举,-1表示未知 |
a3 | 属性3 | 枚举,-1表示未知 |
cate | 品类ID | 脱敏 |
brand | 品牌ID | 脱敏 |
3. 评价数据
dt | 截止到时间 | 粒度到天 |
sku_id | 商品编号 | 脱敏 |
comment_num | 累计评论数分段 | 0表示无评论,1表示有1条评论, 2表示有2-10条评论, 3表示有11-50条评论, 4表示大于50条评论 |
has_bad_comment | 是否有差评 | 0表示无,1表示有 |
bad_comment_rate | 差评率 | 差评数占总评论数的比重 |
4. 行为数据
user_id | 用户编号 | 脱敏 |
sku_id | 商品编号 | 脱敏 |
time | 行为时间 | |
model_id | 点击模块编号,如果是点击 | 脱敏 |
type | 1.浏览(指浏览商品详情页); 2.加入购物车;3.购物车删除;4.下单;5.关注;6.点击 | |
cate | 品类ID | 脱敏 |
brand | 品牌ID | 脱敏 |