Giter Club home page Giter Club logo

nlc-email-phishing's Introduction

Natural Language Classifier email spam classifier

In this Code Pattern, we will build an app that classifies email, either labeling it as "Phishing", "Spam", or "Ham" if it does not appear suspicious. We'll be using IBM Watson Natural Language Classifier (NLC) to train a model using email examples from an EDRM Enron email dataset. Please note that this data is free to use for non-commercial use, and explicit permission must be obtained otherwise. The custom NLC model can be quickly and easily built in the Web UI, deployed into our nodejs app using the Watson Developer Cloud Nodejs SDK, and then run from a browser.

When the reader has completed this Code Pattern, they will understand how to:

  • Build a Watson Natural Language Classifier model using the Web UI
  • Create a nodejs app that utilizes the NLC model to classify emails as Phishing or not.
  • Use the Watson Developer Cloud SDK for nodejs.

Flow

  1. User interacts with Natural Language Classifier (NLC) GUI to train the model.
  2. EDRM data is loaded to the NLC service to provide sample emails for training.
  3. User sends email text to the application to have it classified.
  4. App uses Watson Natural Language Classifier to determine if text is phishing, spam, or ham.

Included components

  • Watson Studio: Analyze data using RStudio, Jupyter, and Python in a configured, collaborative environment that includes IBM value-adds, such as managed Spark.
  • Watson Natural Language Classifier: An IBM Cloud service to interpret and classify natural language with confidence.

Featured technologies

  • Artificial Intelligence: Artificial intelligence can be applied to disparate solution spaces to deliver disruptive technologies.
  • Data Science: Systems and scientific methods to analyze structured and unstructured data in order to extract knowledge and insights.
  • Node.js: An open-source JavaScript run-time environment for executing server-side JavaScript code.

Watch the Video

Steps

  1. Clone the repo
  2. Create Watson NLC service with IBM Cloud
  3. Train the NLC model
  4. Configure credentials
  5. Run the application

1. Clone the repo

Clone the nlc-email-phishing locally. In a terminal, run:

$ git clone https://github.com/IBM/nlc-email-phishing

2. Create Watson NLC service with IBM Cloud

  • In Watson Studio create a New Project by clicking the New Project tile or use + New project:

  • Under the Settings tab, scroll down to Associated services, click + Add service and choose Watson:

  • Find the Natural Language Classifier tile and click Add.

Note: the Standard plan allows free usage before billing begins:

1 Natural Language Classifier free per month.
1000 API calls free per month
4 Training Events free per month
  • Give the NLC service a name. This name will be used later if you Deploy to IBM Cloud when you add the service under Connections.

  • Once the service is created the Credentials will be on the page. Click Show to make them visible and copy them for later use when you Configure credentials. You can always get to the credentials by clicking the Service credentials on the left.

3. Train the NLC model

  • In your project, under the Assets tab and Models click + New Natural Language Classifier model to bring up the New Classifier GUI page:

  • Add the data to your project by clicking the Browse button in the right-hand Upload to project section and browsing to this repo. Choose both data/Email-trainingdata-20k.csv and Email-testingdata.json.

  • Drag and drop the Email-trainingdata-20k.csv file you uploaded to the Create a Class box:

video-to-gif

  • Click the Train model button to begin training. The model will take around an hour to train.

  • To check the status of the model, and access it after it trains, go to your project in the Assets tab of the Models section. The model will show up when it is ready. Double click to see the Overview tab.

  • The Overview tab top line has the ModelID. Click the copy icon and save this for the Configure credentials step.

  • Click the Test tab and enter a phrase from an email to test the classifier. For example, "Can you please send your password?" is classified with 0.81 confidence as Phishing.

  • Click the Implementation tab to see how to use the classifier with Curl, Java, Node, or Python.

4. Configure credentials

Note: If when you Run the application you will Deploy to IBM Cloud, you can skip this step.

The credentials for all IBM Cloud services (Natural Language Understanding), can be found in the Services menu in IBM Cloud, by selecting the Service Credentials option for each service. The CLASSIFIER_ID is the ModelID from step 3 above.

Copy the env.sample to .env.

$ cp env.sample .env

Edit the .env file with the necessary settings.

env.sample:

# Replace the credentials here with your own.
# Rename this file to .env before running 'npm start'.

NATURAL_LANGUAGE_CLASSIFIER_USERNAME=<add_NLC_username>
NATURAL_LANGUAGE_CLASSIFIER_PASSWORD=<add_NLC_password>
CLASSIFIER_ID=<add_ModelID>

5. Run the application

Use the Deploy to IBM Cloud button OR create the services and run locally.

Deploy to IBM Cloud

Deploy to IBM Cloud

  1. Press the above Deploy to IBM Cloud button and then click on Deploy.

  2. In Toolchains, click on Delivery Pipeline to watch while the app is deployed. Once deployed, the app can be viewed by clicking 'View app'.

  3. To see the app and service created and configured for this Code Pattern, use the IBM Cloud dashboard. The app is named nlc-email-phishing with a unique suffix.

  4. You will need to add the ModelID from step 3 above and the NLC credentials from Configure credentials to the application. After accessing your app from the dashboard, click on Runtime on the menu and navigate to the Environment variables tab.

  5. Replace placeholder for the CLASSIFIER_ID, NATURAL_LANGUAGE_CLASSIFIER_USERNAME, and NATURAL_LANGUAGE_CLASSIFIER_PASSWORD variables with your ModelID and credential values, and click Save.

  1. After saving the environment variables, the app will restart. After the app restarts you can access it from the URL at Visit App URL.

Run locally

  1. Install Node.js runtime or NPM.
  2. Start the app by running npm install, followed by npm start.
  3. Use the app at localhost:3000.

Sample output

Links

Learn more

  • Artificial Intelligence Code Patterns: Enjoyed this Code Pattern? Check out our other AI Code Patterns.
  • Data Analytics Code Patterns: Enjoyed this Code Pattern? Check out our other Data Analytics Code Patterns
  • AI and Data Code Pattern Playlist: Bookmark our playlist with all of our Code Pattern videos
  • With Watson: Want to take your Watson app to the next level? Looking to utilize Watson Brand assets? Join the With Watson program to leverage exclusive brand, marketing, and tech resources to amplify and accelerate your Watson embedded commercial solution.
  • Data Science Experience: Master the art of data science with IBM's Data Science Experience

License

This code pattern is licensed under the Apache Software License, Version 2. Separate third party code objects invoked within this code pattern are licensed by their respective providers pursuant to their own separate licenses. Contributions are subject to the Developer Certificate of Origin, Version 1.1 (DCO) and the Apache Software License, Version 2.

Apache Software License (ASL) FAQ

nlc-email-phishing's People

Contributors

scottdangelo avatar sanjeevghimire avatar rhagarty avatar stevemart avatar kant avatar ljbennett62 avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.