ibm / cloudpakfordata-telco-churn-workshop Goto Github PK

View Code? Open in Web Editor NEW

10.0 16.0 32.0 228.28 MB

This repo contains a workshop for Cloud Pak for Data.

Home Page: https://ibm.github.io/cloudpakfordata-telco-churn-workshop/

License: Apache License 2.0

Jupyter Notebook 99.34% Python 0.37% HTML 0.29%

cloudpakfordata-telco-churn-workshop's Introduction

WARNING: This repository is no longer maintained ⚠️

This repository will not be updated. The repository will be kept available in read-only mode.

Cloud Pak for Data Workshop

The workshop is hosted at: https://ibm.github.io/cloudpakfordata-telco-churn-workshop/

cloudpakfordata-telco-churn-workshop's People

Stargazers

Watchers

cloudpakfordata-telco-churn-workshop's Issues

Git Tag version of workshop delivered Apr 16-17 2020

Perhaps tag with workshop date, or branch name.

Add `CP4D Experience Organize Artifacts`

This is the .zip file with files to import for WKC catalog creation

Highlighting screen shots better: Add Arrows, sequential numbers

Highlighting screen shots better: Add Arrows, sequential numbers (The "launch" button is hard to find).
https://github.com/IBM/cloudpakfordata-telco-churn-workshop/blob/master/workshop/.gitbook/assets/images/wml/project-11-model-disabled.png

CP4D - menu options changed

Exercise 1 - Step 2
From the menu click on Collections -> Virtualized Data, you'll be brought to the My data section. Here you should see the data that the administrator has assigned to you. Choose the three data sets available and click Assign to start importing it to your project.
options seems to have changed.

document use of Data Refinery flows and re-use of the flows

Add tiles to Verizon page for learning paths, openscale.

WKC add missing parts for data lineage .

This is in the CPD Experience Flow .docx, but was removed because it wasn't working. IT looks like a IBM cookie issue, because it worked for others and also after a switch to chrome.
Put this back in the watson-knowledge-catalog-users README.md

update workshop/README.md with new Agenda

The workshop/README.md is out of date.
The Agenda is very incomplete.

Also includes #130

record webcast for OpenScale Notebook module.

Add batch processing of ML scoring

Add NLP content

Alternate deployment of Flask App

Currently, we have instructions to consume machine learning model using a python flask application that is run locally on participants machines. Need to explore ways to run the applications beyond local machines so that users do not face the python dependencies (i.e. containerize and run in play with docker or straight on cluster).

Test Flow of workshop and update for DV <-> WKC <-> Data refinery

telco workshop - WKC updates for Data Steward

Add content for Data Steward adding to an existing enterprise catalog

Data Refinery "Global Themes" is now "Themes"

At the end of the Data Refinery docs, we refer to changing the "Global Themes". This no longer exists, but is now "Themes".
Change the docs, and the screenshot.

Post-2.5 release: Change language and details for add-ons

In Workshop README.md, link of "Exercise 3: Machine Learning with Jupyter" is broken

Clean up pre-work section

Propose we should clean up some of the pre-work:

Move the data set overview section to the introduction page.
Once we use the project import option, we may not need participants to download / clone the repo.
Remove the data load section.

Python Flask App as stand alone module

Removing the python flask app consumer from the machine learning with jupyter module. Would need to create this as a stand alone module which can then be added to the workshop as optional.

record webcast for WKC admin module

Clarify Import Data to Project Module

Current 'Import Data to Project Module' is needed in scenarios where participants are not the Data Engineer who creates the virtualized data assets. Need to clarify if module is needed by non-Data Engineers and/or document when the module is needed.

TotalChurn is auto converted to Decimal; change docs

DB2 Warehouse does an auto-configure to change the data types. One of the results is that our TotalCharges is converted from String -> decimal.
In our workshop, however, we use Data Refinery to convert TotalCharges from String to Decimal.
WE need to change the instructions to reflect this.
Also, we discovered this using DB2 Warehouse on Cloud, so it might not happen in each case.

Install CPD 3.0

Consider entirely removing OpenScale notebook

We've moved away from using the OpenScale notebook.
Also, it requires the Admin creds used to when OpenScale is setup to point to WML.

Investigate whether Admin creds are still needed. If so, this entire notebook should be removed.
Perhaps remove the notebook anyway. #125 will change to configure the monitors from the GUI

Data Connection and Virtualization Typos

Module 'Data Connection and Virtualization for Admins' fixes for typo's and UI changes:

Think we should change module name to remove 'Admins'. Either leave it without admins or make it 'Data Engineers'
Instructions say "Find the tables you created earlier...". Change since the participants may not be loading their own data.
Instructions say "Let's see the new virtualized data from the Data Virtualization tool by clicking View my data.". The UI has changed and now should say "Let's see the new virtualized data from the Data Virtualization tool by clicking View my virtualized data." (Screenshot is already updated).
Typo on 'ou'll be notified that the join has succeeded! "
Add note to section 2 that it only needs to be completed if there are non-data engineers that will use the data.

Synthesize data for Customer Churn use case using OpenScale.

ConfigureOpenScale.ipynb fails

The following code fails if there is no data exists in payload logging table

time.sleep(10)
subscription.quality_monitoring.enable(threshold=0.7, min_records=50)

Add Chatbot content

Add Triaging IOT content

Probably a tile on the customer page in developer.ibm.com

link to "Exercise 3: Machine Learning with Jupyter" broken

In workshop readme, link to "Exercise 3: Machine Learning with Jupyter" is broken.

record webcast for openscle GUI module

Add WKC import feature to admin lab.

Grab a .csv from the .zip file, or make one up.

In README of "monitoring-models-with-openscale-gui", missing pictures

In README of "monitoring-models-with-openscale-gui", missing pictures,

In the Payload logging section, select the type of input. In our case, choose Categorical:

Choose model input type

In the Model details sections, select Manually configure monitors:

Choose model details
==> missing picture
In the Quality alert section, set the alert threshold to 70% and the sample size to 100:
==> missing picture
Set alert threshold and sample size
==> missing picture

Epic - Telco customer churn workshop updates for V2.0

Work for Telco customer churn v2.0

Record OpenScale lecture

Record webcast for WKC user module

add instructions to monitoring-models-with-openscale-gui on how to open the service

Record video for WKC lecture

Move fix for Admin creds on ConfigureOpenScale.ipynb to master

We put a fix in the workshop-ibm-apr-2020 branch:
commit 2f7a24571d013a2e
This needs to be ported to master.

SPSS Modeler

Add modeler flow build of ML model

ConfigureOpenScale.ipynb fails

I have run the out-of-box auto configuration of OpenScale as user1. When I run ConfigureOpenScale.ipynb as user2, it fails. Is it ok to run the notebook when the data mart already exists?

When running the following codes

subscription = ai_client.data_mart.subscriptions.add(WatsonMachineLearningAsset(
model_uid,
problem_type=ProblemType.BINARY_CLASSIFICATION,
input_data_type=InputDataType.STRUCTURED,
label_column='Risk',
prediction_column='predictedLabel',
probability_column='probability',
feature_columns = ["CheckingStatus","LoanDuration","CreditHistory","LoanPurpose","LoanAmount","ExistingSavings","EmploymentDuration","InstallmentPercent","Sex","OthersOnLoan","CurrentResidenceDuration","OwnsProperty","Age","InstallmentPlans","Housing","ExistingCreditsCount","Job","Dependents","Telephone","ForeignWorker"],
categorical_columns = ["CheckingStatus","CreditHistory","LoanPurpose","ExistingSavings","EmploymentDuration","Sex","OthersOnLoan","OwnsProperty","InstallmentPlans","Housing","Job","Telephone","ForeignWorker"]
))

if subscription is None:
print('Subscription already exists; get the existing one')
subscriptions_uids = ai_client.data_mart.subscriptions.get_uids()
for sub in subscriptions_uids:
if ai_client.data_mart.subscriptions.get_details(sub)['entity']['asset']['name'] == MODEL_NAME:
subscription = ai_client.data_mart.subscriptions.get(sub)

It fails with the messages below

subscription = ai_client.data_mart.subscriptions.add(WatsonMachineLearningAsset(

model_uid,

problem_type=ProblemType.BINARY_CLASSIFICATION,

input_data_type=InputDataType.STRUCTURED,

label_column='Risk',

prediction_column='predictedLabel',

probability_column='probability',

feature_columns = ["CheckingStatus","LoanDuration","CreditHistory","LoanPurpose","LoanAmount","ExistingSavings","EmploymentDuration","InstallmentPercent","Sex","OthersOnLoan","CurrentResidenceDuration","OwnsProperty","Age","InstallmentPlans","Housing","ExistingCreditsCount","Job","Dependents","Telephone","ForeignWorker"],

categorical_columns = ["CheckingStatus","CreditHistory","LoanPurpose","ExistingSavings","EmploymentDuration","Sex","OthersOnLoan","OwnsProperty","InstallmentPlans","Housing","Job","Telephone","ForeignWorker"]

))

if subscription is None:

print('Subscription already exists; get the existing one')

subscriptions_uids = ai_client.data_mart.subscriptions.get_uids()

for sub in subscriptions_uids:

    if ai_client.data_mart.subscriptions.get_details(sub)['entity']['asset']['name'] == MODEL_NAME:

        subscription = ai_client.data_mart.subscriptions.get(sub)

IncorrectValue Traceback (most recent call last)
in
7 probability_column='probability',
8 feature_columns = ["CheckingStatus","LoanDuration","CreditHistory","LoanPurpose","LoanAmount","ExistingSavings","EmploymentDuration","InstallmentPercent","Sex","OthersOnLoan","CurrentResidenceDuration","OwnsProperty","Age","InstallmentPlans","Housing","ExistingCreditsCount","Job","Dependents","Telephone","ForeignWorker"],
----> 9 categorical_columns = ["CheckingStatus","CreditHistory","LoanPurpose","ExistingSavings","EmploymentDuration","Sex","OthersOnLoan","OwnsProperty","InstallmentPlans","Housing","Job","Telephone","ForeignWorker"]
10 ))
11

/opt/conda/envs/Python-3.6/lib/python3.6/site-packages/ibm_ai_openscale/subscriptions.py in add(self, asset, deployment_uids, _training_data_schema, _input_data_schema, **kwargs)
222
223 if not keys:
--> 224 raise IncorrectValue('source_uid', 'The asset with source_uid ' + asset.source_uid + ' cannot be found.')
225 else:
226 client = clients[keys[0]]

IncorrectValue: Incorrect "source_uid" provided.
Reason: The asset with source_uid e416bcd7-22a9-430a-b58a-f4885dc561a1 cannot be found.

Create single project asset

Add the assets being imported by participants for various workshop modules into a single project that would be imported as part of the pre-work.

Jupyter Notebook (for ML model build and OpenScale)
AutoAI CSV file

invstigate pip install errors

When running the pip installs for the OpenScale notebook:


!pip install --upgrade watson-machine-learning-client-V4 | tail -n 1
!pip install --upgrade numpy --no-cache | tail -n 1
!pip install --upgrade SciPy --no-cache | tail -n 1
!pip install --upgrade pyspark==2.3 | tail -n 1
!pip install --upgrade scikit-learn==0.20.2 | tail -n 1```

we get errors:
```ERROR: Could not install packages due to an EnvironmentError: [Errno 30] Read-only file system: 'Set.py'

    Uninstalling watson-machine-learning-client-V4-1.0.55:
ERROR: tensorflow 1.13.1 requires tensorboard<1.14.0,>=1.13.0, which is not installed.
ERROR: brunel 2.3 requires JPype1-py3, which is not installed.
Successfully installed numpy-1.18.1
Successfully installed SciPy-1.4.1
Successfully installed py4j-0.10.6 pyspark-2.3.0
Successfully installed scikit-learn-0.20.2```

port `workshop-ibm-apr-2020` changes to `master`

Create script to run that tests ports, commands, etc, from within customer's firewall

Create lecture and slides for Watson Knowledge Catalog

had to modify ConfigureOpenScale.ipynb to save and deploy a model

For my testing, I had to change the following code

published_model_details = wml_client.repository.store_model(model, metadata, training_data=df_data,  pipeline=pipeline)

published_model_details = wml_client.repository.store_model(model=model, pipeline=pipeline, meta_props=metadata, training_data=train_data)

Before I can continue to the next section. Otherwise, the model deployment fails.

ibm / cloudpakfordata-telco-churn-workshop Goto Github PK

cloudpakfordata-telco-churn-workshop's Introduction

WARNING: This repository is no longer maintained ⚠️

Cloud Pak for Data Workshop

cloudpakfordata-telco-churn-workshop's People

Stargazers

Watchers

Forkers

cloudpakfordata-telco-churn-workshop's Issues

Recommend Projects

Recommend Topics

Recommend Org