Giter Club home page Giter Club logo

cloudpakfordata-telco-churn-workshop's Introduction

WARNING: This repository is no longer maintained ⚠️                             

This repository will not be updated. The repository will be kept available in read-only mode.

Cloud Pak for Data Workshop

The workshop is hosted at: https://ibm.github.io/cloudpakfordata-telco-churn-workshop/

"cp4data"

cloudpakfordata-telco-churn-workshop's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cloudpakfordata-telco-churn-workshop's Issues

CP4D - menu options changed

Exercise 1 - Step 2
From the menu click on Collections -> Virtualized Data, you'll be brought to the My data section. Here you should see the data that the administrator has assigned to you. Choose the three data sets available and click Assign to start importing it to your project.
options seems to have changed.

Screen Shot 2019-11-20 at 3 19 41 PM

WKC add missing parts for data lineage .

This is in the CPD Experience Flow .docx, but was removed because it wasn't working. IT looks like a IBM cookie issue, because it worked for others and also after a switch to chrome.
Put this back in the watson-knowledge-catalog-users README.md

Alternate deployment of Flask App

Currently, we have instructions to consume machine learning model using a python flask application that is run locally on participants machines. Need to explore ways to run the applications beyond local machines so that users do not face the python dependencies (i.e. containerize and run in play with docker or straight on cluster).

Clean up pre-work section

Propose we should clean up some of the pre-work:

  • Move the data set overview section to the introduction page.
  • Once we use the project import option, we may not need participants to download / clone the repo.
  • Remove the data load section.

Python Flask App as stand alone module

Removing the python flask app consumer from the machine learning with jupyter module. Would need to create this as a stand alone module which can then be added to the workshop as optional.

Clarify Import Data to Project Module

Current 'Import Data to Project Module' is needed in scenarios where participants are not the Data Engineer who creates the virtualized data assets. Need to clarify if module is needed by non-Data Engineers and/or document when the module is needed.

TotalChurn is auto converted to Decimal; change docs

DB2 Warehouse does an auto-configure to change the data types. One of the results is that our TotalCharges is converted from String -> decimal.
In our workshop, however, we use Data Refinery to convert TotalCharges from String to Decimal.
WE need to change the instructions to reflect this.
Also, we discovered this using DB2 Warehouse on Cloud, so it might not happen in each case.

Consider entirely removing OpenScale notebook

We've moved away from using the OpenScale notebook.
Also, it requires the Admin creds used to when OpenScale is setup to point to WML.

  1. Investigate whether Admin creds are still needed. If so, this entire notebook should be removed.
  2. Perhaps remove the notebook anyway. #125 will change to configure the monitors from the GUI

Data Connection and Virtualization Typos

Module 'Data Connection and Virtualization for Admins' fixes for typo's and UI changes:

  • Think we should change module name to remove 'Admins'. Either leave it without admins or make it 'Data Engineers'
  • Instructions say "Find the tables you created earlier...". Change since the participants may not be loading their own data.
  • Instructions say "Let's see the new virtualized data from the Data Virtualization tool by clicking View my data.". The UI has changed and now should say "Let's see the new virtualized data from the Data Virtualization tool by clicking View my virtualized data." (Screenshot is already updated).
  • Typo on 'ou'll be notified that the join has succeeded! "
  • Add note to section 2 that it only needs to be completed if there are non-data engineers that will use the data.

ConfigureOpenScale.ipynb fails

The following code fails if there is no data exists in payload logging table

time.sleep(10)
subscription.quality_monitoring.enable(threshold=0.7, min_records=50)

In README of "monitoring-models-with-openscale-gui", missing pictures

In README of "monitoring-models-with-openscale-gui", missing pictures,

In the Payload logging section, select the type of input. In our case, choose Categorical:

Choose model input type

In the Model details sections, select Manually configure monitors:

Choose model details
==> missing picture
In the Quality alert section, set the alert threshold to 70% and the sample size to 100:
==> missing picture
Set alert threshold and sample size
==> missing picture

ConfigureOpenScale.ipynb fails

I have run the out-of-box auto configuration of OpenScale as user1. When I run ConfigureOpenScale.ipynb as user2, it fails. Is it ok to run the notebook when the data mart already exists?

When running the following codes

subscription = ai_client.data_mart.subscriptions.add(WatsonMachineLearningAsset(
model_uid,
problem_type=ProblemType.BINARY_CLASSIFICATION,
input_data_type=InputDataType.STRUCTURED,
label_column='Risk',
prediction_column='predictedLabel',
probability_column='probability',
feature_columns = ["CheckingStatus","LoanDuration","CreditHistory","LoanPurpose","LoanAmount","ExistingSavings","EmploymentDuration","InstallmentPercent","Sex","OthersOnLoan","CurrentResidenceDuration","OwnsProperty","Age","InstallmentPlans","Housing","ExistingCreditsCount","Job","Dependents","Telephone","ForeignWorker"],
categorical_columns = ["CheckingStatus","CreditHistory","LoanPurpose","ExistingSavings","EmploymentDuration","Sex","OthersOnLoan","OwnsProperty","InstallmentPlans","Housing","Job","Telephone","ForeignWorker"]
))

if subscription is None:
print('Subscription already exists; get the existing one')
subscriptions_uids = ai_client.data_mart.subscriptions.get_uids()
for sub in subscriptions_uids:
if ai_client.data_mart.subscriptions.get_details(sub)['entity']['asset']['name'] == MODEL_NAME:
subscription = ai_client.data_mart.subscriptions.get(sub)

It fails with the messages below

subscription = ai_client.data_mart.subscriptions.add(WatsonMachineLearningAsset(

model_uid,

problem_type=ProblemType.BINARY_CLASSIFICATION,

input_data_type=InputDataType.STRUCTURED,

label_column='Risk',

prediction_column='predictedLabel',

probability_column='probability',

feature_columns = ["CheckingStatus","LoanDuration","CreditHistory","LoanPurpose","LoanAmount","ExistingSavings","EmploymentDuration","InstallmentPercent","Sex","OthersOnLoan","CurrentResidenceDuration","OwnsProperty","Age","InstallmentPlans","Housing","ExistingCreditsCount","Job","Dependents","Telephone","ForeignWorker"],

categorical_columns = ["CheckingStatus","CreditHistory","LoanPurpose","ExistingSavings","EmploymentDuration","Sex","OthersOnLoan","OwnsProperty","InstallmentPlans","Housing","Job","Telephone","ForeignWorker"]

))

if subscription is None:

print('Subscription already exists; get the existing one')

subscriptions_uids = ai_client.data_mart.subscriptions.get_uids()

for sub in subscriptions_uids:

    if ai_client.data_mart.subscriptions.get_details(sub)['entity']['asset']['name'] == MODEL_NAME:

        subscription = ai_client.data_mart.subscriptions.get(sub)

IncorrectValue Traceback (most recent call last)
in
7 probability_column='probability',
8 feature_columns = ["CheckingStatus","LoanDuration","CreditHistory","LoanPurpose","LoanAmount","ExistingSavings","EmploymentDuration","InstallmentPercent","Sex","OthersOnLoan","CurrentResidenceDuration","OwnsProperty","Age","InstallmentPlans","Housing","ExistingCreditsCount","Job","Dependents","Telephone","ForeignWorker"],
----> 9 categorical_columns = ["CheckingStatus","CreditHistory","LoanPurpose","ExistingSavings","EmploymentDuration","Sex","OthersOnLoan","OwnsProperty","InstallmentPlans","Housing","Job","Telephone","ForeignWorker"]
10 ))
11

/opt/conda/envs/Python-3.6/lib/python3.6/site-packages/ibm_ai_openscale/subscriptions.py in add(self, asset, deployment_uids, _training_data_schema, _input_data_schema, **kwargs)
222
223 if not keys:
--> 224 raise IncorrectValue('source_uid', 'The asset with source_uid ' + asset.source_uid + ' cannot be found.')
225 else:
226 client = clients[keys[0]]

IncorrectValue: Incorrect "source_uid" provided.
Reason: The asset with source_uid e416bcd7-22a9-430a-b58a-f4885dc561a1 cannot be found.

Create single project asset

Add the assets being imported by participants for various workshop modules into a single project that would be imported as part of the pre-work.

  • Jupyter Notebook (for ML model build and OpenScale)
  • AutoAI CSV file

invstigate pip install errors

When running the pip installs for the OpenScale notebook:


!pip install --upgrade watson-machine-learning-client-V4 | tail -n 1
!pip install --upgrade numpy --no-cache | tail -n 1
!pip install --upgrade SciPy --no-cache | tail -n 1
!pip install --upgrade pyspark==2.3 | tail -n 1
!pip install --upgrade scikit-learn==0.20.2 | tail -n 1```

we get errors:
```ERROR: Could not install packages due to an EnvironmentError: [Errno 30] Read-only file system: 'Set.py'

    Uninstalling watson-machine-learning-client-V4-1.0.55:
ERROR: tensorflow 1.13.1 requires tensorboard<1.14.0,>=1.13.0, which is not installed.
ERROR: brunel 2.3 requires JPype1-py3, which is not installed.
Successfully installed numpy-1.18.1
Successfully installed SciPy-1.4.1
Successfully installed py4j-0.10.6 pyspark-2.3.0
Successfully installed scikit-learn-0.20.2```

had to modify ConfigureOpenScale.ipynb to save and deploy a model

For my testing, I had to change the following code

published_model_details = wml_client.repository.store_model(model, metadata, training_data=df_data,  pipeline=pipeline)

to

published_model_details = wml_client.repository.store_model(model=model, pipeline=pipeline, meta_props=metadata, training_data=train_data)

Before I can continue to the next section. Otherwise, the model deployment fails.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.