This repository will not be updated. The repository will be kept available in read-only mode.
The workshop is hosted at: https://ibm.github.io/cloudpakfordata-telco-churn-workshop/
This repo contains a workshop for Cloud Pak for Data.
Home Page: https://ibm.github.io/cloudpakfordata-telco-churn-workshop/
License: Apache License 2.0
This repository will not be updated. The repository will be kept available in read-only mode.
The workshop is hosted at: https://ibm.github.io/cloudpakfordata-telco-churn-workshop/
Perhaps tag with workshop date, or branch name.
This is the .zip file with files to import for WKC catalog creation
Highlighting screen shots better: Add Arrows, sequential numbers (The "launch" button is hard to find).
https://github.com/IBM/cloudpakfordata-telco-churn-workshop/blob/master/workshop/.gitbook/assets/images/wml/project-11-model-disabled.png
Exercise 1 - Step 2
From the menu click on Collections -> Virtualized Data, you'll be brought to the My data section. Here you should see the data that the administrator has assigned to you. Choose the three data sets available and click Assign to start importing it to your project.
options seems to have changed.
This is in the CPD Experience Flow .docx, but was removed because it wasn't working. IT looks like a IBM cookie issue, because it worked for others and also after a switch to chrome.
Put this back in the watson-knowledge-catalog-users README.md
The workshop/README.md is out of date.
The Agenda is very incomplete.
Also includes #130
Currently, we have instructions to consume machine learning model using a python flask application that is run locally on participants machines. Need to explore ways to run the applications beyond local machines so that users do not face the python dependencies (i.e. containerize and run in play with docker or straight on cluster).
Add content for Data Steward adding to an existing enterprise catalog
At the end of the Data Refinery docs, we refer to changing the "Global Themes". This no longer exists, but is now "Themes".
Change the docs, and the screenshot.
In Workshop README.md, link of "Exercise 3: Machine Learning with Jupyter" is broken
Propose we should clean up some of the pre-work:
Removing the python flask app consumer from the machine learning with jupyter module. Would need to create this as a stand alone module which can then be added to the workshop as optional.
Current 'Import Data to Project Module' is needed in scenarios where participants are not the Data Engineer who creates the virtualized data assets. Need to clarify if module is needed by non-Data Engineers and/or document when the module is needed.
DB2 Warehouse does an auto-configure to change the data types. One of the results is that our TotalCharges is converted from String -> decimal.
In our workshop, however, we use Data Refinery to convert TotalCharges from String to Decimal.
WE need to change the instructions to reflect this.
Also, we discovered this using DB2 Warehouse on Cloud, so it might not happen in each case.
We've moved away from using the OpenScale notebook.
Also, it requires the Admin creds used to when OpenScale is setup to point to WML.
Module 'Data Connection and Virtualization for Admins' fixes for typo's and UI changes:
The following code fails if there is no data exists in payload logging table
time.sleep(10)
subscription.quality_monitoring.enable(threshold=0.7, min_records=50)
Probably a tile on the customer page in developer.ibm.com
In workshop readme, link to "Exercise 3: Machine Learning with Jupyter" is broken.
Grab a .csv from the .zip file, or make one up.
In README of "monitoring-models-with-openscale-gui", missing pictures,
In the Payload logging section, select the type of input. In our case, choose Categorical:
Choose model input type
In the Model details sections, select Manually configure monitors:
Choose model details
==> missing picture
In the Quality alert section, set the alert threshold to 70% and the sample size to 100:
==> missing picture
Set alert threshold and sample size
==> missing picture
Work for Telco customer churn v2.0
We put a fix in the workshop-ibm-apr-2020
branch:
commit 2f7a24571d013a2e
This needs to be ported to master.
Add modeler flow build of ML model
I have run the out-of-box auto configuration of OpenScale as user1. When I run ConfigureOpenScale.ipynb as user2, it fails. Is it ok to run the notebook when the data mart already exists?
When running the following codes
subscription = ai_client.data_mart.subscriptions.add(WatsonMachineLearningAsset(
model_uid,
problem_type=ProblemType.BINARY_CLASSIFICATION,
input_data_type=InputDataType.STRUCTURED,
label_column='Risk',
prediction_column='predictedLabel',
probability_column='probability',
feature_columns = ["CheckingStatus","LoanDuration","CreditHistory","LoanPurpose","LoanAmount","ExistingSavings","EmploymentDuration","InstallmentPercent","Sex","OthersOnLoan","CurrentResidenceDuration","OwnsProperty","Age","InstallmentPlans","Housing","ExistingCreditsCount","Job","Dependents","Telephone","ForeignWorker"],
categorical_columns = ["CheckingStatus","CreditHistory","LoanPurpose","ExistingSavings","EmploymentDuration","Sex","OthersOnLoan","OwnsProperty","InstallmentPlans","Housing","Job","Telephone","ForeignWorker"]
))
if subscription is None:
print('Subscription already exists; get the existing one')
subscriptions_uids = ai_client.data_mart.subscriptions.get_uids()
for sub in subscriptions_uids:
if ai_client.data_mart.subscriptions.get_details(sub)['entity']['asset']['name'] == MODEL_NAME:
subscription = ai_client.data_mart.subscriptions.get(sub)
It fails with the messages below
subscription = ai_client.data_mart.subscriptions.add(WatsonMachineLearningAsset(
model_uid,
problem_type=ProblemType.BINARY_CLASSIFICATION,
input_data_type=InputDataType.STRUCTURED,
label_column='Risk',
prediction_column='predictedLabel',
probability_column='probability',
feature_columns = ["CheckingStatus","LoanDuration","CreditHistory","LoanPurpose","LoanAmount","ExistingSavings","EmploymentDuration","InstallmentPercent","Sex","OthersOnLoan","CurrentResidenceDuration","OwnsProperty","Age","InstallmentPlans","Housing","ExistingCreditsCount","Job","Dependents","Telephone","ForeignWorker"],
categorical_columns = ["CheckingStatus","CreditHistory","LoanPurpose","ExistingSavings","EmploymentDuration","Sex","OthersOnLoan","OwnsProperty","InstallmentPlans","Housing","Job","Telephone","ForeignWorker"]
))
if subscription is None:
print('Subscription already exists; get the existing one')
subscriptions_uids = ai_client.data_mart.subscriptions.get_uids()
for sub in subscriptions_uids:
if ai_client.data_mart.subscriptions.get_details(sub)['entity']['asset']['name'] == MODEL_NAME:
subscription = ai_client.data_mart.subscriptions.get(sub)
IncorrectValue Traceback (most recent call last)
in
7 probability_column='probability',
8 feature_columns = ["CheckingStatus","LoanDuration","CreditHistory","LoanPurpose","LoanAmount","ExistingSavings","EmploymentDuration","InstallmentPercent","Sex","OthersOnLoan","CurrentResidenceDuration","OwnsProperty","Age","InstallmentPlans","Housing","ExistingCreditsCount","Job","Dependents","Telephone","ForeignWorker"],
----> 9 categorical_columns = ["CheckingStatus","CreditHistory","LoanPurpose","ExistingSavings","EmploymentDuration","Sex","OthersOnLoan","OwnsProperty","InstallmentPlans","Housing","Job","Telephone","ForeignWorker"]
10 ))
11
/opt/conda/envs/Python-3.6/lib/python3.6/site-packages/ibm_ai_openscale/subscriptions.py in add(self, asset, deployment_uids, _training_data_schema, _input_data_schema, **kwargs)
222
223 if not keys:
--> 224 raise IncorrectValue('source_uid', 'The asset with source_uid ' + asset.source_uid + ' cannot be found.')
225 else:
226 client = clients[keys[0]]
IncorrectValue: Incorrect "source_uid" provided.
Reason: The asset with source_uid e416bcd7-22a9-430a-b58a-f4885dc561a1 cannot be found.
Add the assets being imported by participants for various workshop modules into a single project that would be imported as part of the pre-work.
When running the pip installs for the OpenScale notebook:
!pip install --upgrade watson-machine-learning-client-V4 | tail -n 1
!pip install --upgrade numpy --no-cache | tail -n 1
!pip install --upgrade SciPy --no-cache | tail -n 1
!pip install --upgrade pyspark==2.3 | tail -n 1
!pip install --upgrade scikit-learn==0.20.2 | tail -n 1```
we get errors:
```ERROR: Could not install packages due to an EnvironmentError: [Errno 30] Read-only file system: 'Set.py'
Uninstalling watson-machine-learning-client-V4-1.0.55:
ERROR: tensorflow 1.13.1 requires tensorboard<1.14.0,>=1.13.0, which is not installed.
ERROR: brunel 2.3 requires JPype1-py3, which is not installed.
Successfully installed numpy-1.18.1
Successfully installed SciPy-1.4.1
Successfully installed py4j-0.10.6 pyspark-2.3.0
Successfully installed scikit-learn-0.20.2```
For my testing, I had to change the following code
published_model_details = wml_client.repository.store_model(model, metadata, training_data=df_data, pipeline=pipeline)
to
published_model_details = wml_client.repository.store_model(model=model, pipeline=pipeline, meta_props=metadata, training_data=train_data)
Before I can continue to the next section. Otherwise, the model deployment fails.
The GUI portion of OpenScale uses the Fast Setup of OpenScale, which does everything automatically.
Change this to have the user(s) manually configure the monitors. This will teach more and make for a better workshop experience.
Investigate WML client sdk to v4.
As this gets bigger, let's break it up for separate modules, which may or may not be used in an individual workshop.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.