Giter Club home page Giter Club logo

aidev's Introduction

Scenario : Customer Segmentation

The scenario we have chosen for this tutorial exercise is the following:

An online retailer would like to gain insights through its customers buying behaviours. Given a record of customers online transactions, we perform Customer Value Analysis, described by Recency, Frequency, Monetary value. These characteristics are then used to segment the customers into clusters via machine learning techniques, in this case, k-means clustering.

This is a common use case where businesses want to gain some insight into their clientele, understand different groups of customers they are dealing with, so that businesses can customise the services or campaigns to target individual groups to serve them more effectively.

Note that this example makes use of Azure Machine Learning with Github Actions.

Data

Online Retail Data | This is a transnational data set which contains all the transactions occurring between 01/12/2010 and 09/12/2011 for a UK-based and registered non-store online retail.The company mainly sells unique all-occasion gifts. Many customers of the company are wholesalers.

Getting started with AI development and MLOps

Create azure resources

For this learning experience, you will need to create resources in Azure.

Please follow this documentation to set up an Azure Machine Learning (AML) workspace: https://docs.microsoft.com/en-us/azure/machine-learning/quickstart-create-resources

When creating an AML workspace, the wizard will guide you through creating dependent resources like:

  • storage account
  • key vault
  • application insights
  • container registry

Make sure "create new" container registry when moving through the wizard.

Once your workspace is ready, go to your Azure Machine Learning workspace in the Azure Portal, and launch the studio.

studio

In the left navigation, click "Compute" and click the "+ New" button to create a compute instance for yourself to use.

Once your compute is ready, select the VS Code link

vscode

Create conda environment on your compute instance

VSCode will open - make sure you see your compute and workspace in the task bar and that you have the proper extensions installed (see prerequisites.) Sign in to Azure in VS code if prompted to do so.

remote

Clone this repository.

Open your terminal window and run this command:

sudo chmod -R 777 /anaconda/pkgs

Change directories to .aml/environments/, you will find the conda environment file named, conda_dependencies.yml.

In this directory, run this commands, giving a name, py38_cluster_dev, to this environment:

conda env create ---name py38_cluster_dev --file conda_dependencies.yml

This repo is tested with conda==4.13.0. If conda notifies to do an update, try updating conda. Within that message, it will give command such as

conda update -n base -c defaults conda

Once the installation is complete, run the following command to check that the newly created environment exists.

conda env list

If it exists, you will see the newly created environment named, py38_cluster_dev, listed.

To activate the environment,

conda activate py38_cluster_dev

Note that this procedure can also be done on the terminal within the AML studio.

Note also that you may have to close and re-open your VSCode session in order for your new conda environment to appear as a selectable Kernel in Jupyter Notebooks.

Open and follow notebooks

In VS code, open 00-explore-data-and-prepare-data.ipynb. In the upper right of VS Code, click on "Select Kernel" and choose the environment you just created in the previous step (py38_cluster_dev). If you encounter any issues creating the environment, you can just use the py38_cluster_dev environment.

For more information about the notebooks, see this Readme.md.

From notebooks to operational code

The notebook of most interest in moving from notebooks to operational code is:

01-clustering-by-mini-batch-k-means-mlflow.ipynb

This notebook creates an experiment in our AML workspace, then creates a ML pipeline using two algorithms:

  1. power transformer
  2. k-means

The K-means algorithm prefers data that fits a standard distribution. The power transformer will transform the data into that standard distribution k-means prefers and then k-means will produce the output.

So the pipeline goes like this:

input raw customer data -> power transform transforms data -> output tranformed data -> k-means predicts based on transformed data -> outputs a profile

Sample input here:

{
  "input_data": {
    "columns": [
      "Recency(Days)",
      "Frequency",
      "Monetary(£)"
    ],
    "index": [0,1,2,3],
    "data": [[12.328482, 109.432531, 1647.358550],
          [85.062131, 33.097033, 553.386070],
          [84.559221, 6.956482, 146.513349], 
          [12.817094, 22.335451, 348.376235]]
  }
}

If you compare 01-clustering-by-mini-batch-k-means-mlflow.ipynb, and train.py, you will see a lot of similarities and begin to understand how the notebook and our investigations inform our operational code.

Workflows and MLOps

Architecture.md illustrates the components that made up this sample solution, and how they interact with one another.

Github Actions workflows explains how to configure necessary actions to enable CI/CD.

aidev's People

Contributors

eelwk avatar ryubidragonfire avatar cezapata avatar vianeyja avatar

Watchers

Alessandro Jannuzzi avatar

aidev's Issues

Need a dev container for local development

Some developers will prefer to work locally. In order to create a consistent environment and to accelerate development, we need a dev container.
Dev Container should have:
python (version? 3.8.5 maybe?)
conda
can you create a conda env as part of the start up process? Open question, I don't know.

unit test

add unit test to model-ci workflow

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.