In this Code Pattern, we will build an app that classifies email, either labeling it as "Phishing", "Spam", or "Ham" if it does not appear suspicious. We'll be using IBM Watson Natural Language Classifier (NLC) to train a model using email examples from an EDRM Enron email dataset. Please note that this data is free to use for non-commercial use, and explicit permission must be obtained otherwise. The custom NLC model can be quickly and easily built in the Web UI, deployed into our nodejs app using the Watson Developer Cloud Nodejs SDK, and then run from a browser.
When the reader has completed this Code Pattern, they will understand how to:
- Build a Watson Natural Language Classifier model using the Web UI
- Create a nodejs app that utilizes the NLC model to classify emails as Phishing or not.
- Use the Watson Developer Cloud SDK for nodejs.
- User interacts with Natural Language Classifier (NLC) GUI to train the model.
- EDRM data is loaded to the NLC service to provide sample emails for training.
- User sends email text to the application to have it classified.
- App uses Watson Natural Language Classifier to determine if text is phishing, spam, or ham.
- Watson Studio: Analyze data using RStudio, Jupyter, and Python in a configured, collaborative environment that includes IBM value-adds, such as managed Spark.
- Watson Natural Language Classifier: An IBM Cloud service to interpret and classify natural language with confidence.
- Artificial Intelligence: Artificial intelligence can be applied to disparate solution spaces to deliver disruptive technologies.
- Data Science: Systems and scientific methods to analyze structured and unstructured data in order to extract knowledge and insights.
- Node.js: An open-source JavaScript run-time environment for executing server-side JavaScript code.
- Clone the repo
- Create Watson NLC service with IBM Cloud
- Train the NLC model
- Configure credentials
- Run the application
Clone the nlc-email-phishing
locally. In a terminal, run:
$ git clone https://github.com/IBM/nlc-email-phishing
- In Watson Studio create a New Project by clicking the
New Project
tile or use+ New project
:
- Under the
Settings
tab, scroll down toAssociated services
, click+ Add service
and chooseWatson
:
- Find the
Natural Language Classifier
tile and clickAdd
.
Note: the
Standard
plan allows free usage before billing begins:
1 Natural Language Classifier free per month.
1000 API calls free per month
4 Training Events free per month
-
Give the NLC service a name. This name will be used later if you
Deploy to IBM Cloud
when you add the service underConnections
. -
Once the service is created the
Credentials
will be on the page. ClickShow
to make them visible and copy them for later use when you Configure credentials. You can always get to the credentials by clicking theService credentials
on the left.
- In your project, under the
Assets
tab andModels
click+ New Natural Language Classifier model
to bring up theNew Classifier
GUI page:
-
Add the data to your project by clicking the
Browse
button in the right-handUpload to project
section and browsing to this repo. Choose bothdata/Email-trainingdata-20k.csv
andEmail-testingdata.json
. -
Drag and drop the
Email-trainingdata-20k.csv
file you uploaded to theCreate a Class
box:
-
Click the
Train model
button to begin training. The model will take around an hour to train. -
To check the status of the model, and access it after it trains, go to your project in the
Assets
tab of theModels
section. The model will show up when it is ready. Double click to see theOverview
tab.
-
The
Overview
tab top line has theModelID
. Click the copy icon and save this for the Configure credentials step. -
Click the
Test
tab and enter a phrase from an email to test the classifier. For example, "Can you please send your password?" is classified with 0.81 confidence as Phishing. -
Click the
Implementation
tab to see how to use the classifier with Curl, Java, Node, or Python.
Note: If when you Run the application you will
Deploy to IBM Cloud
, you can skip this step.
The credentials for all IBM Cloud services (Natural Language Understanding), can be found in the Services
menu in IBM Cloud, by selecting the Service Credentials
option for each service.
The CLASSIFIER_ID
is the ModelID
from step 3 above.
Copy the env.sample
to .env
.
$ cp env.sample .env
Edit the .env
file with the necessary settings.
# Replace the credentials here with your own.
# Rename this file to .env before running 'npm start'.
NATURAL_LANGUAGE_CLASSIFIER_USERNAME=<add_NLC_username>
NATURAL_LANGUAGE_CLASSIFIER_PASSWORD=<add_NLC_password>
CLASSIFIER_ID=<add_ModelID>
Use the Deploy to IBM Cloud
button OR create the services and run locally.
-
Press the above
Deploy to IBM Cloud
button and then click onDeploy
. -
In Toolchains, click on Delivery Pipeline to watch while the app is deployed. Once deployed, the app can be viewed by clicking 'View app'.
-
To see the app and service created and configured for this Code Pattern, use the IBM Cloud dashboard. The app is named
nlc-email-phishing
with a unique suffix. -
You will need to add the
ModelID
from step 3 above and the NLC credentials from Configure credentials to the application. After accessing your app from the dashboard, click onRuntime
on the menu and navigate to theEnvironment variables
tab. -
Replace
placeholder
for theCLASSIFIER_ID
,NATURAL_LANGUAGE_CLASSIFIER_USERNAME
, andNATURAL_LANGUAGE_CLASSIFIER_PASSWORD
variables with yourModelID
and credential values, and clickSave
.
- After saving the environment variables, the app will restart. After the app restarts you can access it from the URL at
Visit App URL
.
- Install Node.js runtime or NPM.
- Start the app by running
npm install
, followed bynpm start
. - Use the app at
localhost:3000
.
- Artificial Intelligence Code Patterns: Enjoyed this Code Pattern? Check out our other AI Code Patterns.
- Data Analytics Code Patterns: Enjoyed this Code Pattern? Check out our other Data Analytics Code Patterns
- AI and Data Code Pattern Playlist: Bookmark our playlist with all of our Code Pattern videos
- With Watson: Want to take your Watson app to the next level? Looking to utilize Watson Brand assets? Join the With Watson program to leverage exclusive brand, marketing, and tech resources to amplify and accelerate your Watson embedded commercial solution.
- Data Science Experience: Master the art of data science with IBM's Data Science Experience
This code pattern is licensed under the Apache Software License, Version 2. Separate third party code objects invoked within this code pattern are licensed by their respective providers pursuant to their own separate licenses. Contributions are subject to the Developer Certificate of Origin, Version 1.1 (DCO) and the Apache Software License, Version 2.