Giter Club home page Giter Club logo

metaspore's Introduction

MetaSpore: One-stop machine learning development platform

MetaSpore is a one-stop end-to-end machine learning development platform that provides a full-cycle framework and development interface for from data preprocessing, model training, offline experiments, online predictions to online experiment traffic bucketization and ab-testing.

MetaSpore Architecture

MetaSpore is developed and opensourced by DMetaSoul team. You could also join our slack user discussion space.

Core Features

MetaSpore has the following features:

  1. One-stop end-to-end development, from offline model training to online prediction and bucketing experiments, with a unified development experience across the entire process;
  2. Deep learning training framework, compatible with PyTorch ecology, supports distributed large-scale sparse feature learning;
  3. The training framework is connected with PySpark to seamlessly read the training data from the data lake and data warehouse;
  4. High-performance online prediction service, supports fast inference for neural network, decision tree, Spark ML, SKLearn and other models; supports heterogeneous hardware inference acceleration;
  5. In the offline unified feature extraction framework, the online feature reading logic is automatically generated, and the feature extraction logic is unified cross offline and online;
  6. Online algorithm application framework, providing model prediction, experiment bucketing and traffic splitting, dynamic hot loading of parameters and rich debug functions;
  7. Rich industry algorithm examples and end-to-end solutions.

Documentation and examples

Installation package download

Training package

We provide precompiled offline training wheel package on pypi, install it via pip:

pip install metaspore

The minimum Python version required is 3.8.

After installation, also install pytorch and pyspark (they are not included as depenencies of metaspore wheel so you could choose pyspark and pytorch versions as needed):

pip install pyspark
pip install torch==1.11.0+cpu -f https://download.pytorch.org/whl/cpu/torch_stable.html

Serving package

We provide prebuilt docker images for MetaSpore Serving Service:

CPU only image

docker pull dmetasoul/metaspore-serving-release:cpu-v1.0.1

GPU image

docker pull dmetasoul/metaspore-serving-release:gpu-v1.0.1

See Run Serving Service in Docker for details.

Compile the code

Community guidelines

Community guidelines

Feedback

For questions about usage, you can post questions in GitHub Discussion, or through GitHub Issue.

Mail

Email us at [email protected].

Slack

Join our user discussion slack channel: MetaSpore User Discussion

Open source projects

MetaSpore is a completely open source project released under the Apache License 2.0. Participation, feedback, and code contributions are welcome.

metaspore's People

Contributors

andafterall avatar cheng-su avatar codingfun2022 avatar dependabot[bot] avatar dmetasoul-opensource avatar dmetasoul01 avatar gufenqing avatar hades-888 avatar intelligencegear avatar is-shidian avatar javyxu avatar liusy12138 avatar longborn avatar qinyy907 avatar raphael-jin avatar trellixvulnteam avatar wikty avatar xuchen-plus avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

metaspore's Issues

[demo] DCN V2

Add DCN V2 to CTR Demo and give benchmark on MovieLens and Criteo datasets.

DCN V2
criteo_d5:
Train AUC: 0.7487 Test AUC: 0.7290
1m uid mid:
Train AUC: 0.8901 Test AUC: 0.8611
25m uid mid:
Train AUC: 0.8888 Test AUC: 0.8323

[demo] Wide&Deep

Add Wide&Deep to CTR Demo and give benchmark on MovieLens and Criteo datasets.

25m uid mid:
Train AUC: 0.8898 Test AUC: 0.8343
1m uid mid:
Train AUC: 0.8937 Test AUC: 0.8682
criteo_d5:
Train AUC: 0.7394 Test AUC: 0.7294

[demo] DCN

Add DCN to CTR Demo and give benchmark on MovieLens and Criteo datasets.

25m uid mid:
Train AUC: 0.8972 Test AUC: 0.8430
1m uid mid:
Train AUC: 0.9021 Test AUC: 0.8746
criteo_d5:
Train AUC: 0.7413 Test AUC: 0.7304

[serving] IPC framework between cpp and python

Goal

To provide a framework for calling python method in user custom scripts. Python code is executed in a separate process rather than embeded in cpp process. The CPython interpreters are run on a per-thread basis to avoid GIL contention.

Design

  1. Control plane via gRPC and unix domain socket between cpp and python;
  2. Data plane via either gRPC for small data and shared memory for large data;
  3. Model packaged with customized python venv and user scripts;
  4. For each cpp compute thread, run a CPython interpreter process with user entry script;
  5. Provide an async iterator style interface for python.

[demo] DeepFM

Add DeepFM to CTR Demo and give benchmark on MovieLens and Criteo datasets.

25m uid mid:
Train AUC: 0.8908 Test AUC: 0.8359
1m uid mid:
Train AUC: 0.8891 Test AUC: 0.8658
criteo_d5:
Train AUC: 0.7531 Test AUC: 0.7271

[demo] PNN

Add PNN to CTR Demo and give benchmark on MovieLens and Criteo datasets.

iPNN
criteo_d5:
Train AUC: 0.7544 Test AUC: 0.7292
1m uid mid:
Train AUC: 0.8914 Test AUC: 0.8649
25m uid mid:
Train AUC: 0.8916 Test AUC: 0.8362

oPNN
criteo_d5:
Train AUC: 0.7533 Test AUC: 0.7287
1m uid mid:
Train AUC: 0.8896 Test AUC: 0.8633
25m uid mid:
Train AUC: 0.8905 Test AUC: 0.8353

[movielens demo] add python.zip when we submit `fg_movielens.py` PySpark job

For the MovieLens Demo, we would better add python.zip before we submit submit fg_movielens.py PySpark job.

def init_spark():
    ## add a line of code here
    subprocess.run(['zip', '-r', './python.zip', 'fg_neg_sampler.py', 'fg_sparse_features_extractor.py', 'fg_gbm_features_extractor.py' ], cwd='./')
    spark = (SparkSession.builder
        .appName('MovieLens Demo')
        .config("spark.executor.memory","10G")
        .config("spark.submit.pyFiles", "python.zip")
        .config("spark.executor.instances","4")
        .config("spark.network.timeout","500")
        .getOrCreate())
    ... 

Moreover, please change the function name generate_spare_features to generate_sparse_features in fg_sparse_features_extractor.py

[demo] MaximalMarginalRelevanceDiversifier demo

  1. Add diversification algorithms. We implemented a diversification model, named "Maximize Marginal Relevance Disperser" which refers to the paper "The Use of MMR, Diversity-Based Reranking for Reordering The dispersing method mentioned in Documents and Producing Summaries". Compared with SimpleDiversifier, MaximalMarginalRelevanceDiversifier can take into account information in multiple dimensions.
  2. Integrating the MaximalMarginalRelevanceDiversifier into the pipeline after we completed the unit test of MaximalMarginalRelevanceDiversifier. In addition, we have updated the configuration information of the diversify method in the Consul file.
    .

[demo] QA Multimodal Retrieval Demo

QA is a text-to-text semantic retrieval demo based on 1M Baike-Question-Answer database.

The demo including the following parts

  1. online system: an end-to-end online retrieval services.
  2. offline system: model training and export, data fetch and index.

[demo] AutoInt

Add AutoInt to CTR Demo and give benchmark on MovieLens and Criteo datasets.

AutoInt
criteo_d5:
Train AUC: 0.7558 Test AUC: 0.7361
1m uid mid:
Train AUC: 0.9028 Test AUC: 0.8741
25m uid mid:
Train AUC: 0.8968 Test AUC: 0.8421

[demo] xDeepFM

Add xDeepFM to CTR Demo and give benchmark on MovieLens and Criteo datasets.

xDeepFM
criteo_d5:
Train AUC: 0.7541 Test AUC: 0.7300
1m uid mid:
Train AUC: 0.8892 Test AUC: 0.8641
25m uid mid:
Train AUC: 0.8911 Test AUC: 0.8367

[training] Support kubeflow pipeline build

  1. Refactor code organization with seperate algo, runner, component and pipeline definitions.
  2. Auto export kubeflow components of built-in algo runners and also a python decorator for customized use.
  3. Load components by name to construct kubeflow pipeline and upload it automatically.

[demo] Text-to-Image Multimodal Retrieval Demo

Text-to-image semantic retrieval demo based on Unsplash Lite 2.5K image dataset. It enables user search image by natural language text.

The demo including the following parts:

  • online retrieval pipeline service
  • offline model export, data fetch and index

[demo] Unify data processing for demo projects

Unify the data processing for movieLens-1m, movielens-25m, criteo-5d and other datasets, including feature generation, match dataset generation, ranking dataset generation, negative sampling, etc.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.