Giter Club home page Giter Club logo

e2eaiok's Introduction

INTRODUCTION

Problem Statement

Modern End to End AI pipeline life cycle is quite complicate with a complex pipeline including data processing, feature engineering, model development, and model deployment & maintenance. The iterative nature for feature engineering, model testing and hyper-parameter optimization makes the process more time-consuming. This complexity creates an entry-barrier for novice and citizen data scientists who might not have such expertise or skills. Meanwhile, people tend to develop larger and larger models to get better performance, which are quite often over-parameterized. Those overparameterized models not only poses significant challenges on AI hardware infrastructure as they require expensive computation power for training, but also posed a challenge when try to deploy in resource constraint environment which is a common need.

Solution with Intel® End-to-End AI Optimization Kit

Intel® End-to-End AI Optimization Kit is a composable toolkits for E2E AI optimization to deliver high performance lightweight networks/models efficiently on commodity HW like CPU, intending to make E2E AI pipelines faster, easier and more accessible.

Making AI Faster: It reduces E2E time on CPU to an acceptable range throughput full pipeline optimization and improved scale-up/out capability on Intel platforms with Intel optimized framework and toolkits, delivers popular lighter DL Models with close enough performance and significantly higher inference throughput.

Making AI Easier: It automates provides simplified toolkits for data processing, distributed training, and compact neural network construction, automates E2E AI pipeline with click to run workflows and can be easily plugged to third party ML solutions/platforms as an independent composable component.

Making AI more accessible: Through built-in optimized, parameterized models generated by smart democratization advisor and domain-specific, neural architected search (NAS) based network constructure, it brings complex DL to commodity HW, everyone can easily access AI on existing CPU clusters without the need to be an expert on data engineering and data science.

This solution is intended for

This solution is intended for citizen data scientists, enterprise users, independent software vendor and partial of cloud service provider.

ARCHITECTURE

Intel® End-to-End AI Optimization Kit

Intel® End-to-End AI Optimization Kit is a composable toolkits for E2E AI optimization to deliver high performance lightweight networks/models efficiently on commodity HW. It is a pipeline framework that streamlines AI optimization technologies in each stage of E2E AI pipeline, including data processing, feature engineering, training, hyper-parameter tunning, and inference. Intel® End-to-End AI Optimization Kit delivers high performance, lightweight models efficiently on commodity hardware.

The key components are

  • RecDP: An one stop toolkit for AI data process. This toolkit provides LLM data processing and Machine Learning Feature Engineering lib in scalable fashion on top of Ray and Spark. It provides simple to use API for data scientists, delivers optimized performance, and can be easily integrated to third party solutions.

    • Auto Feature Engineering: Provides an automatical way to generate new features for any tabular dataset which containing numericals, categoricals and text features. It only takes 3 lines of codes to automatically enrich features based on data analysis, statistics, clustering and multi-feature interacting.
    • LLM Data Preparation. Provides a parallelled easy-to-use data pipeline for LLM data processing. It supports multiple data source such as jsonlines, pdfs, images, audio/vides. Users will be able to perform data extraction, deduplication(near dedup, rouge, exact), splitting, special_character fixing, types of filtering(length, perplexity, profanity, etc), quality analysis(diversity, GPT3 quality, toxicity, perplexity, etc). This tool also support to save output as jsonlines, parquets, or insertion into VectorStores(FaissStore, ChromaStore, ElasticSearchStore).
  • Smart Democratization Advisor (SDA): A user-guided tool to facilitate automation of built-in model democratization via parameterized models, it generates yaml files based on user choice, provided build-in intelligence through parameterized models and leverage SigOpt for HPO. SDA converts the manual model tuning and optimization to assisted autoML and autoHPO. SDA provides a list of build-in optimized models ranging from RecSys, CV, NLP, ASR and RL.

  • Neural Network Constructor: A neural architecture search technology and transfer learning based component to build compact neural network models for specific domains directly. It includes three componments,

    • DE-NAS: It is a multi-model, hardware aware, train-free neural architecture search approach to build models for CV, NLP, ASR directly.
    • Model Adapter: It leverages transfer learning model adaptor to deploy the models in user’s production environment.
    • Deltatuner: It extends the Parameter-Efficient Fine-Tuning (PEFT) with automatically constructing compact delta structures.

For more information, you may read the docs. Architecture

Getting Started

Installing

Install with Baremetal Environment

  • To install all components:

    • To install e2eAIOK in baremetal environment, use pip install e2eAIOK
    • To install latest nightly build, use pip install e2eAIOK --pre
  • To install each individual component:

    • To install SDA, use pip install e2eAIOK-sda
    • To install DE-NAS, use pip install e2eAIOK-denas
    • To install Model Adapter, use pip install e2eAIOK-ModelAdapter
    • To install Deltatuner, use pip install e2eAIOK-deltatuner

Install with Docker Environment

git clone https://github.com/intel/e2eAIOK.git
cd e2eAIOK
git submodule update --init --recursive
python scripts/start_e2eaiok_docker.py --backend [tensorflow, pytorch, pytorch112] --dataset_path ../ --workers host1, host2, host3, host4 --proxy "http://addr:ip"

Intel® End-to-End AI Optimization Kit provides step-by-step demos. Once completed installtion, please refer to the Demo section to use the click-to-run notebooks on colab or get familar with the APIs of each individual componment for a specific workload.

Demos

  • Built-in Models

    • DLRM - RecSys, PyTorch
    • DIEN - RecSys, TensorFlow
    • WND - RecSys, TensorFlow
    • RNNT - Speech Recognition, PyTorch
    • RESNET - Computer vision, TensorFlow
    • BERT - Natual Language Processing, TensorFlow
    • MiniGo - minimalist engine modeled after AlphaGo Zero, TensorFlow
  • Neural network constructor

Performance

Papers and Blogs

Getting Support

e2eaiok's People

Contributors

adkakne avatar chaojun-zhang avatar csdingbin avatar haojinintel avatar huaxz1986 avatar icecreamww avatar jian-zhang avatar kyotoyx avatar peach-he avatar philo-he avatar samanwaysadhu avatar tianyil1 avatar xinyaowa avatar xuechendi avatar yao531441 avatar zhouyu5 avatar zigzagcai avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

e2eaiok's Issues

[v1.1] DeNas NLP Workflow TODOs

Backgroud:
Since currently saving logits function will be triggered every epoch when enabled, which leads to poor performance in training.

current Workaround:
using two steps to enable - step1, enable saving logits to call denas nlp, process will exit after first save logits. step2, disable training logits for KD process.

Next step:

  • [Urgent] Add readme for customer to understand current issue and indicates the proper way of running
  • [improvement] Integrate saving logits and transfer learning into one stage process

[v1.1] DE-NAS NLP Code Refine

DE-NAS NLP Code Refine:

  • Separate MA module dependency of NLP workflow from DE-NAS NLP training process
  • Align BERT configuration file name to HF model configuration name

[v1.0]Unit Test

  • put none hardware-related unittest on travis
  • add UT for DeNas module
  • add test for installation process
    • test pip
    • test run_e2eaiok_docker.py
      @zigzagcai @Peach-He
      Due Date:
      Plan done by WW48.2
      Codes done by WW49.2

[V1.1] Model Adapter Code Refinement

  • Refine requirements.txt and can install in vanilla env
  • remove optuna
  • remove useless code (draft / model_bak)
  • disable distiller save logits in distribution mode
  • bug fix
    • Distiller save logtis
      • can't close readfile
      • UT can't close
      • can't save logits after 1 epoch
    • Trainer will fail to show parameter in tensorboard if directly call evaluate function

[v1.1] update oneapi version

Current oneapi version is "intel/oneapi-aikit:2022.3.0-devel-ubuntu18.04", it's too old, we need to update it to "intel/oneapi-aikit:2023.0.0-devel-ubuntu20.04"

[v1.0] Performance Testing

  • CNN
    • sota nas baseline(projection) -> done
    • CNN DeNas (random search)
    • CNN DeNas (EA)
  • NLP
    • stock NLP
    • NLP DeNas
  • ASR
    • stock ASR
    • ASR DeNas
  • VIT
    • stock VIT
    • VIT DeNas

@Peach-He

[v1.1] Model weights and gradients not synchronized in Model Adapter Doamin Adapter

  • When running ddp training with domain adapter, see no grads sync after each backward process, which is different from what is said in this tutorial

    When the backward() returns, param.grad already contains the synchronized gradient tensor.

  • It is likely to be a bug, which can validate with following scripts:

for name, params in self.network.named_parameters():
    if params.grad is None: continue
    print(f'rank: {dist.get_rank()}, name: {name}, grads: {torch.sum(params.grad)}, value: {torch.sum(params)}')
sys.exit(0)

[v1.1] DE-NAS DE-Score Memory Leakage Issues

Memory leakage issues triggered by the DE-NAS DE-Score calculation Process.

  • In the "the transformer_proxy.py", nas_score was previously returned as a tensor and stored in the searched dict, which caused huge memory consumption by keeping such a large tensor in the search process.
  • In detail, the diversity score was stored as the tensor object in the nas_score, while other scores (etc., salience score) were calculated as the float.

[v1.1]Cannot pip install ModelAdapter

When we try to install latest ModelAdapter library, pip install e2eAIOK-ModelAdapter --pre, it continously report following error,

ERROR: Could not find a version that satisfies the requirement tllib==0.4 (from e2eaiok-modeladapter) (from versions: none)

It seems something wrong with tllib installation. Please fix it.

[v1.1] Code refine for performance test

  • Model Adapter:
    • Add/update finetuner and distiller realted configuration files to meet performance test requirements
    • In task.py, do not create distiller/adapter/finetuner to match strategy, which can save useless definition
    • In train.py, disable record parameters in tensorboard, which can save a lot of time
    • Tensorboard dir will be created in default even user don't need it, fix it in main.py
  • Common Lib
    • In CV dataloader, find evaluation data will be split in distribution mode, which is not proper in some cases. Add an option "ddp_eval_nosplit" to make a complete evaluation.
    • For Earlystop in utils.py, disable tolerance when set target metric

[v1.1] Model Adapter demo

  • Summary
  • Built-in demos
    • Finetuner
    • Distiller
    • Domain Adapter
  • API usage for customized cases
    • Finetuner
    • Distiller
      • basic
      • save logits
      • train with saved logits
    • Domain Adapter

[v0.2] Performance PDT

  • v0.2 performance PDT - due by WW47
    • RNNT small scale
    • ResNet wip software improvement ratio
    • minigo refreshing with RedHat os on CLX
    • Bert
  • wait for PDT review

@Peach-He , please update here for latest status

[v0.3] update pytorch and tensorflow docker to latest

we target to only support pytorch1.12 and tensorflow 2.10 in v0.3
TODO:

  • update DockerfilePytorch -> use oneapi 2022.03(or latest version)
  • update DockerfileTensorflow -> use oneapi 2022.03 + tensorflow 2.10
  • remove DockerfilePytorch for mlperf -> need to make sure dlrm still runnable won't do due conflict
  • update start_e2eaiok_docker.py

assignee: @zigzagcai
task due: WW48

[v0.3] Migrate common libs to e2eAIOK/common

This task is to eliminate redundant effort in different modules
suggested to put all common function in here as base class, if there are some funtion/args are designed for specific task, may derive from base class. So we can keep non-common function as less as possible.
Due:
Design done by WW47.2
Codes contributed to e2eAIOK by WW47.5

  • design doc updated - pls attach arch design in this issue
  • e2eAIOK/common/trainer
  • e2eAIOK/common/dataloader
  • e2eAIOK/common/utils

@csdingbin @huaxz1986

package v0.2

  • weekly build script for minor PYPI release based on master
  • e2eAIOK v0.2 main release (Nov 4th, 2022)
  • github v0.2 release (Nov 4th, 2022)

[v1.2]MA Domain Adapter NLP workflow with HG transformer

  • Design doc
  • Code implementation
    • Domain Pre-training
    • Data selection
    • Tokenizer Augmentation
    • Workflow scripts
    • DE-NAS integration support
  • Benchmark
    • Benchmark on CPU
    • Performance Test on target machine
  • Doucument
    • Readme
    • click-to-run Demo
  • Code Test
    • CICD
    • UT

[v1.1] Workflow

  • code
    • Hugging Face + Distiller
    • Hugging Face + Distiller + Denas
    • Hugging Face + Distiller + Denas + Adapter
  • demo
    • all in one demo
    • Hugging Face + Distiller
    • Hugging Face + Distiller + Denas
    • Hugging Face + Distiller + Denas + Adapter

[v1.1] Model Adapter code merge

  • merge Model Adapter code to public repo with commit history
  • Unit Test
    • finetuner & distiller refine
    • adapter
  • code scan and fix
    • security scan
    • third party scan
  • CI/CD
  • dockerfile check

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.