4paradigm / autox Goto Github PK

View Code? Open in Web Editor NEW

493.0 493.0 128.0 73.73 MB

AutoX is an efficient automl tool, which is mainly aimed at data mining tasks with tabular data.

Home Page: https://autox.readthedocs.io

License: Apache License 2.0

Python 10.31% Jupyter Notebook 89.51% Shell 0.17% Dockerfile 0.01% Makefile 0.01% Batchfile 0.01%

kaggle machine-learning python

autox's People

Contributors

Stargazers

Watchers

Forkers

hyh012356789 zhhwss zhongrenxin komorebiwkx linshiyonghu20 janson91 sasasaer zhongyusunlrhao zhilonglu 591317622a apeopl neozhao98 auto-ml stevenjokess emg110 xiaozhouliu sunying1985 sigma-lm u201815044 hangzhang10 zxhjames qianrenjian iphyer onetoolscollection artificialzeng truelitoufans yonglinz yakaili zhaijunyu emailhy amoyyean jhopepe shenqf1028 flyingwing as85207 anonymouslycn gaozining liutongwei charygao yueyedeai huajinghua zhys513 xrosliang littleboybiggun fanghy06 xx-craziness-xx alaskaw frankbaul taogeanton2 ezineo zmin1217 kelelexu sherpahu alicia-ux laurasanchz2 hjsybyq a86612 xiaodanjiao nullius-2020 jordan2013 173yyh i02-c defanive dandelight pangay arya87 jizhongpeng dynamtics oddecust utopianet yuanjianrui jenniferhe wolfworld6 nifecat alexanderhucheerful xhfei1224 scoutys zhiqiang00 liyaooi deepwindlee livingbody baolanchen mingyang1996 tenya yiming1012 zyijie scchy fxzero cuizhengliang username-yao ioslide weihaoaho yang-charles yqkenanwang huguanglong zhongkailv intjun caixc97 candlumine zoey333

autox's Issues

AutoX_NLP/ nlp_feature.py, fasttext处理效率优化

当前使用fasttext进行特征提取的效率较慢，同等数据量下与BERT-tiny用时相当，可针对性优化。
代码链接：https://github.com/4paradigm/AutoX/blob/master/autox/autox_nlp/feature_engineer/nlp_feature.py

autox_video: 增加在一些公开数据集上的效果，并以表格形式展示在readme中

AutoX_Recommend, 图像内容召回

代码结构可以参考:
https://github.com/4paradigm/AutoX/blob/master/autox/autox_recommend/recall_and_rank/recalls/history_recall.py

内存优化问题

在kaggle环境中运行《值得买》数据集，发现16G内存会爆掉。初步分析是因为特征工程中暴力循环生成了出了大量衍生特征，可以考虑借鉴kaggle上的 memory reduce 代码思路进行内存优化

AutoX_Recommend, netflix数据集处理.

数据处理方法参考
https://github.com/4paradigm/AutoX/blob/master/autox/autox_recommend/data_process/MovieLens_data_process.ipynb

lightgbm.train bug(lightgbm==3.3.2.99)

Mac中 lightgbm==3.3.2.99， lightgbm.train不再包含verbose_eval和early_stopping_rounds接口，改用callbacks接口，调用lgb模型时会报错

File ~/miniforge3/envs/lx/lib/python3.9/site-packages/autox/autox_competition/models/regressor_ts.py:231, in LgbRegressionTs.fit(self, train, test, used_features, target, time_col, ts_unit, Early_Stopping_Rounds, N_round, Verbose, log1p, custom_metric, weight_for_mae)
    226     model = lgb.train(self.params_, trn_data, num_boost_round=self.N_round, valid_sets=[trn_data, val_data],
    227                       verbose_eval=self.Verbose,
    228                       early_stopping_rounds=self.Early_Stopping_Rounds,
    229                       feval=weighted_mae_lgb(weight=weight_for_mae))
    230 else:
--> 231     model = lgb.train(self.params_, trn_data, num_boost_round=self.N_round, valid_sets=[trn_data, val_data],
...
    233                     early_stopping_rounds=self.Early_Stopping_Rounds)
    234 val = model.predict(train.iloc[valid_idx][used_features])
    235 if log1p:

TypeError: train() got an unexpected keyword argument 'verbose_eval'

Sample selection

I would like to ask if AutoX has any plans for sample selection?

Now many data sets are so large that the computing power of individuals and small companies cannot afford.

Can a part of the data be selected for training to approximate the effect of full data training?

ModuleNotFoundError: No module named 'autox.autox_server'

git clone https://github.com/4paradigm/AutoX.git
pip install pytorch_tabnet
pip install ./AutoX
python
from autox import AutoX

ModuleNotFoundError: No module named 'autox.autox_server'

AutoX_NLP/ nlp_feature.py, OOV问题优化

当前Word2Vec和Glove模型无法处理测试数据中未见过的词，需要对测试数据重新进行词表构建，对整体效果影响较大。
代码链接：https://github.com/4paradigm/AutoX/blob/master/autox/autox_nlp/feature_engineer/nlp_feature.py

sohu baseline 的 DeBERTa不是还没中文版的吗，貌似效果很差

https://github.com/4paradigm/AutoX/blob/master/competition_baseline/biendata_sohu_2022/task1_baseline.ipynb

AutoX_Recommend, 数据集处理: Amazon electronic product recommendation

原始数据地址: https://www.kaggle.com/datasets/prokaggler/amazon-electronic-product-recommendation
数据处理方法参考:
https://github.com/4paradigm/AutoX/blob/master/autox/autox_recommend/data_process/MovieLens_data_process.ipynb
以及
https://github.com/4paradigm/AutoX/blob/master/autox/autox_recommend/data_process/Netflix-data-process.ipynb

AutoX_NLP/ nlp_feature.py,glove环境适配

当前glove模型使用的是glove-python-binary包，对windows系统及mac系统安装较困难，可通过其他方式实现glove。
代码链接：https://github.com/4paradigm/AutoX/blob/master/autox/autox_nlp/feature_engineer/nlp_feature.py

AutoX_Recommend, NLP内容召回

代码结构可以参考:
https://github.com/4paradigm/AutoX/blob/master/autox/autox_recommend/recall_and_rank/recalls/history_recall.py

AutoX_Recommend, 数据集处理: Restaurant Recommendations

原始数据地址: https://www.kaggle.com/datasets/teesoong/ml-challenge?select=checkins.csv
数据处理方法参考:
https://github.com/4paradigm/AutoX/blob/master/autox/autox_recommend/data_process/MovieLens_data_process.ipynb
以及
https://github.com/4paradigm/AutoX/blob/master/autox/autox_recommend/data_process/Netflix-data-process.ipynb

AutoX_Recommend, 排序模型, Lightgbm binary

代码结构可以参考: https://github.com/4paradigm/AutoX/blob/master/autox/autox_recommend/recall_and_rank/ranker/ranker.py

AutoX_Recommend, 数据集处理: Amazon product data

原始数据地址: http://jmcauley.ucsd.edu/data/amazon/
数据处理方法参考:
https://github.com/4paradigm/AutoX/blob/master/autox/autox_recommend/data_process/MovieLens_data_process.ipynb
以及
https://github.com/4paradigm/AutoX/blob/master/autox/autox_recommend/data_process/Netflix-data-process.ipynb

AutoX_Recommend, Random graph walk

分组流行商品召回方法的实现

代码结构可以参考:
https://github.com/4paradigm/AutoX/blob/master/autox/autox_recommend/recall_and_rank/recalls/history_recall.py

AutoX_Recommend, 数据集处理: kdd cup 2020

原始数据地址: https://tianchi.aliyun.com/competition/entrance/231785/introduction
数据处理方法参考:
https://github.com/4paradigm/AutoX/blob/master/autox/autox_recommend/data_process/MovieLens_data_process.ipynb
以及
https://github.com/4paradigm/AutoX/blob/master/autox/autox_recommend/data_process/Netflix-data-process.ipynb

pkl格式的模型有办法转成pmml格式的吗？

目前使用AutoXServer生成的模型是pkl格式的，有没有办法转为pmml格式？

AutoX_Recommend, Item特征

代码结构可以参考:
https://github.com/4paradigm/AutoX/blob/master/autox/autox_recommend/recall_and_rank/feature_engineer/user_feature_engineer.py

autox_video: config文件参数更详细的说明。

Stop Cheating us with Fake Stars!

Welcome update to OpenMMLab 2.0

I am Vansin, the technical operator of OpenMMLab. In September of last year, we announced the release of OpenMMLab 2.0 at the World Artificial Intelligence Conference in Shanghai. We invite you to upgrade your algorithm library to OpenMMLab 2.0 using MMEngine, which can be used for both research and commercial purposes. If you have any questions, please feel free to join us on the OpenMMLab Discord at https://discord.gg/amFNsyUBvm or add me on WeChat (van-sin) and I will invite you to the OpenMMLab WeChat group.

Here are the OpenMMLab 2.0 repos branches:

	OpenMMLab 1.0 branch	OpenMMLab 2.0 branch
MMEngine		0.x
MMCV	1.x	2.x
MMDetection	0.x 、1.x、2.x	3.x
MMAction2	0.x	1.x
MMClassification	0.x	1.x
MMSegmentation	0.x	1.x
MMDetection3D	0.x	1.x
MMEditing	0.x	1.x
MMPose	0.x	1.x
MMDeploy	0.x	1.x
MMTracking	0.x	1.x
MMOCR	0.x	1.x
MMRazor	0.x	1.x
MMSelfSup	0.x	1.x
MMRotate	1.x	1.x
MMYOLO		0.x

Attention: please create a new virtual environment for OpenMMLab 2.0.

autox_video: readme完善，具体说明执行AutoTrain以及AutoTest会产生什么效果

安装/installation

autox安装的时候是要提前安装深度学习框架keras嘛？是否支持pytorch或其他框架？

4paradigm / autox Goto Github PK

autox's People

Contributors

Stargazers

Watchers

Forkers

autox's Issues

Welcome update to OpenMMLab 2.0

Recommend Projects

Recommend Topics

Recommend Org