Giter Club home page Giter Club logo

philomathic-guy / malicious-web-content-detection-using-machine-learning Goto Github PK

View Code? Open in Web Editor NEW
222.0 8.0 75.0 2.15 MB

Chrome extension for detecting phishing web sites

Home Page: https://philomathic-guy.github.io/Malicious-Web-Content-Detection-Using-Machine-Learning/

License: MIT License

Python 85.47% PHP 2.68% HTML 1.61% JavaScript 3.99% CSS 6.24%
machine-learning chrome-extension malicious malicious-url-detection malicious-redirects chrome phishing phishing-detection phishing-websites python

malicious-web-content-detection-using-machine-learning's Introduction

Malicious Web Content Detection using Machine Learning

NOTE -

1. If you face any issue, first refer to Troubleshooting.md. If you are still not able to resolve it, please file an issue with the appropriate template (Bug report, question, custom issue or feature request).

2. Please support the project by starring it :)

Steps for reproducing the project -

  • Install all the required packages using the following command - pip install -r requirements.txt. Make sure your pip is consistent with the Python version you are using by typing pip -V.
  • Move the project folder to the correct localhost location. For eg. /Library/WebServer/Documents in case of Macs.
  • (If you are using a Mac) Give permissions to write to the markup file sudo chmod 777 markup.txt.
  • Modify the path of your Python 2.x installation in clientServer.php.
  • (If you are using anything other than a Mac) Modify the localhost path in features_extraction.py to your localhost path (or host the application on a remote server and make the necessary changes).
  • Go to chrome://extensions, activate developer mode, click on load unpacked and select the 'Extension' folder from our project.
  • Now, you can go to any web page and click on the extension in the top right panel of your Chrome window. Click on the 'Safe of not?' button and wait for a second for the result.
  • Done!

Abstract -

  • Naive users using a browser have no idea about the back-end of the page. The users might be tricked into giving away their credentials or downloading malicious data.
  • Our aim is to create an extension for Chrome which will act as middleware between the users and the malicious websites, and mitigate the risk of users succumbing to such websites.
  • Further, all harmful content cannot be exhaustively collected as even that is bound to continuous development. To counter this we are using machine learning - to train the tool and categorize the new content it sees every time into the particular categories so that corresponding action can be taken.

Take a look at the demo

A few snapshots of our system being run on different webpages -

spit_safe Fig 1. A safe website - www.spit.ac.in (College website)

drive_phishing Fig 2. A phishing website which looks just like Google Drive.

dropbox_phishing Fig 3. A phishing website which looks just like Dropbox

moodle_safe Fig 4. A safe website - www.google.com

malicious-web-content-detection-using-machine-learning's People

Contributors

andy71195 avatar jatakiajanvi12 avatar philomathic-guy avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

malicious-web-content-detection-using-machine-learning's Issues

XMLHTTPrequest failed to update

Have you read Troubleshooting.md? If No, please do so before filing an issue.
Yes

Have you tried Googling the problem?
Yes

Which python version are you using to run the project? In the terminal, type which <python-path-you-have-in-clientServer.php> and enter the output here
Python version -3.7

Describe the bug
A clear and concise description of what the bug is.
I have followed all the steps as mentioned, using Windows laptop. The extension shows errors such as
Synchronous XMLHttpRequest on the main thread is deprecated because of its detrimental effects to the end user's experience. For more help, check https://xhr.spec.whatwg.org/.
Error handling response: Error: Failed to execute 'send' on 'XMLHttpRequest': Failed to load 'file:///D:/codes/Malicious-Web-Content-Detection-Using-Machine-Learning-master/clientServer.php'. when i tried putting the file location directly
and
Synchronous XMLHttpRequest on the main thread is deprecated because of its detrimental effects to the end user's experience. For more help, check https://xhr.spec.whatwg.org/. in line- xhr.open("POST","http://localhost/Malicious-Web-Content-Detection-Using-Machine-Learning-master/clientServer.php",false);
I am not sure where i am missing what and am confused as the extension doesnt show any output if website is phished or not.

To Reproduce
Steps to reproduce the behavior:

  1. Go to 'extensions and load unpacked extension'
  2. Click on 'new tab and pressed website"google.com and click the extension, safe or not button pops up but no output, go to extensions tab and click errors on the extension pack'
  3. Scroll down to 'see the warnings and errors with it'
  4. See error

Expected behavior
I expected the code to run the extension and extract and see if the url is phished or not, even from the local browser or the real web browser

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

  • OS: Windows 10 OS
  • Browser chrome
  • Version (optional) i dont know

Smartphone (please complete the following information): N/A

  • Device: [e.g. iPhone6]
  • OS: [e.g. iOS8.1]
  • Browser [e.g. stock browser, safari]
  • Version (optional) [e.g. 22]

Additional context
please help to solve the issue as i also want to learn how phishing works and check on my computer.. as a learning experience i need to understand how it is done. i am new to the machine learning and cyber security.

Screenshot (213)
Screenshot (214)
Screenshot (215)
Screenshot (216)
Screenshot (217)
Screenshot (218)
Screenshot (219)

Showing every website safe for my own random_forest.pkl

Hi @rohitnaik246, I have created my own random_forest.pkl using train.py without any changes but still I am getting every website as safe. Why? How to get the output as Phishing for Phishing website? What changes I must do in the project? Please suggest the ideas.

Getting Error in test.py

Hi @rohitnaik246 ,
I'm referring your project. I've created random_forest.pkl using train.py but getting error in test.py:
ValueError: Number of features of the model must match the input. Model n_features is 30 and input n_features is 22
error
I'm using Python 2.7 and Windows 7(32 bit). How to resolve it?

All links are classified as PHISHING

Hello!
I'm not sure what category to put this issue under but I've tried the following:

python test.py https://www.split.ac.in 
python test.py https://google.com
python3 test.py https://facebook.com

And they all returned the following result:

[Parallel(n_jobs=1)]: Done  10 out of  10 | elapsed:    0.0s finished
PHISHING

Is it supposed to work like that?

Hey I have refered ur project and I tested it on a phishing website which I got from phish tank It was shown as safe.The website is this: https://richafoodz45.ml/AT&T/

Describe the bug
A clear and concise description of what the bug is.

To Reproduce
Steps to reproduce the behavior:

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

  • OS: [e.g. iOS]
  • Browser [e.g. chrome, safari]
  • Version [e.g. 22]

Smartphone (please complete the following information):

  • Device: [e.g. iPhone6]
  • OS: [e.g. iOS8.1]
  • Browser [e.g. stock browser, safari]
  • Version [e.g. 22]

Additional context
Add any other context about the problem here.

Getting error in train.py

Hi,
Am referring your project but in train.py I getting one error as shown below.
**File "E:/Malicious-Web-Content-Detection-Using-Machine-Learning-master/train.py", line 14, in
data1 = data1[0:, : -1]

TypeError: list indices must be integers or slices, not tuple**

Also even tried to solve it....I don't get it the use of this snippet
data1 = data1[0 : -1]
for i in data1:
labels.append(i[30])

RE: Opening project in anaconda, Spyder, Juypter Notebook

Hello there,

I am fairly new to python, and I was wondering how do you possibly open this project in either, anaconda spyder, Juypter Notebook (both are hosted locally on my system) and Pycharm?

Also, as this project is done on Python 2.x, I assume this wont work on python 3.x?

Thanks!

how you check SSLfinal_State? i can't run this section and output is not true for this feature.

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Describe the solution you'd like
A clear and concise description of what you want to happen.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

issues in the chrome extension

Have you read Troubleshooting.md? If No, please do so before filing an issue.
Yes

Have you tried Googling about it?
Yes

Which python version are you using to run the project? In the terminal, type which <python-path-you-have-in-clientServer.php> and enter the output here

$decision=exec("C:/Python27 test.py $site 2>&1 ");

Python version - python 2.7.13

Describe the question
error 1
"Synchronous XMLHttpRequest on the main thread is deprecated because of its detrimental effects to the end user's experience" is the error shown in the chrome extension settings

er1

er2

the below mentioned is the line from the extension/popup.js which is at number 14
xhr.open("POST","http://localhost/Malicious-Web-Content-Detection-Using-Machine-Learning/clientServer.php",false);

error 2
in the extension itself below safe or not it shows error as follows
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en"> <head> <title>Object not found!</title> <link rev="made" href="mailto:postmaster@localhost" /> <style type="text/css"><!--/*--><![CDATA[/*><!--*/ body { color: #000000; background-color: #FFFFFF; } a:link { color: #0000CC; } p, address {margin-left: 3em;} span {font-size: smaller;} /*]]>*/--></style> </head> <body> <h1>Object not found!</h1> <p> The requested URL was not found on this server. If you entered the URL manually please check your spelling and try again. </p> <p> If you think this is a server error, please contact the <a href="mailto:postmaster@localhost">webmaster</a>. </p> <h2>Error 404</h2> <address> <a href="/">localhost</a><br /> <span>Apache/2.4.34 (Win32) OpenSSL/1.0.2o PHP/5.6.38</span> </address> </body> </html>
er3

er4

Additional context
Thank you for the amazing work sir ! please help me to solve this problem.

Errors while running project on windows 10, using python 2.7.16

Have you read Troubleshooting.md? If No, please do so before filing an issue.
Yes

Have you tried Googling about it?
Yes

Which python version are you using to run the project? In the terminal, type which <python-path-you-have-in-clientServer.php> and enter the output here
Python version - 2.7.16

Describe the question
A clear and concise description of your question.

Screenshots
data_validation

This is the error im getting when i run data_validation.py on pycharm. Do i need to change the directory accordingly to where it is placed in my folder that i pulled from your github?

test

the 2nd error i got is on test.py - it is stating that the list index is out of range, and im not really sure whats the problem since i havent changed anything from your project for amendments.

train1

3rd error is - on train.py where it states too many indices for array, once again i have not done any amendments from your project but just tried to run it on pycharm

chrome error

lastly, when i did host it on xampp and tried to run it on google chrome extension, i kept getting this issue (failed to open stream: HTTP request failed! HTTP/1.1 403 Forbidden in D:\XAMPP\htdocs\Malicious-Web-Content-Detection-Using-Machine-Learning\clientServer.php on line 4
operable program or batch file)

Please any advice on this will be greatly appreciated. Thanks!

Additional context
Any additional information, if needed.

Where is the source of your data set? Can you give me the source of URL for the dataset?

Have you read Troubleshooting.md? If No, please do so before filing an issue.
Yes/No

Have you tried Googling about it?
Yes/No

Which python version are you using to run the project? In the terminal, type which <python-path-you-have-in-clientServer.php> and enter the output here
Python version -

Describe the question
A clear and concise description of your question.

Screenshots
Add any screenshots, if necessary to describe your question better.

Additional context
Any additional information, if needed.

ValueError: invalid \x escape

I was trying to implement your project but I got an error in feature_extraction.py as ValueError: invalid \x escape.I searched a lot but wasn't really able to identify issue.

chrome extensioin problem

i do everything but while cliking the safe or not it doesn't show the page is safe or what ,pls can you tell full step from starting to ending not for mac for normal pc pls

from googlesearch import search ImportError: cannot import name 'search'

Have you read Troubleshooting.md? If No, please do so before filing an issue.
Yes/No

Have you tried Googling the problem?
Yes/No

Which python version are you using to run the project? In the terminal, type which <python-path-you-have-in-clientServer.php> and enter the output here
Python version -

Describe the bug
A clear and concise description of what the bug is.

To Reproduce
Steps to reproduce the behavior:

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

  • OS: [e.g. iOS]
  • Browser [e.g. chrome, safari]
  • Version (optional) [e.g. 22]

Smartphone (please complete the following information):

  • Device: [e.g. iPhone6]
  • OS: [e.g. iOS8.1]
  • Browser [e.g. stock browser, safari]
  • Version (optional) [e.g. 22]

Additional context
Add any other context about the problem here.

Python version error

Your project is developed in python2.x, changing it to python3.x is generating multiple errors.
I guess it should be mentioned in readme that this project was developed using python2.x .

Getting error in popup.js

I'm referring your project, I added your extension in google chrome and when i want too verify if the link is a safe site or not
i get this error
issue
Capture du 2019-04-03 22-27-40
I'm using Python 2.7 and Ubuntu. How to resolve it?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.