Giter Club home page Giter Club logo

shd-identification's Introduction

Identification of Smart Home Devices

Random Forest Classifier: Classification results for 3 three Smart Home Devices

made-with-python Last-commit Badge

Abstract

Smart Devices have increasingly found their way into private homes and, while they enable an increase in convenience, they also introduced many security risks. To mitigate the risks involved, it is important to identify the devices that communicate with external services outside the home and monitor their behavior. Since identification is the first step in a successful defense against attacks on the smarthome environment, this work builds a base for further research looking to solvesecurity challenges in this domain. This work compares three different Machine Learning techniques, the Random Forest, the k-nearest-Neighbor and the SupportVector Machine, on their ability to identify Smart Home Devices in a data setof captured network traffic. It provides a recommendation for the most suitable algorithm, the Random Forest, with a robust feature set as well as a software implementation thereof. The Random Forest trained on a small feature set of onlyfour features (packet length, inter-arrival time, average burst size, average burstlength) performs well with a f1-score of around 92.8 % and shows that the identi-fication of Smart Home Devices can be accomplished with reasonable confidencein a short inference time span of around 119 ms.

Usage

The following quick-start example shows the usage of the software. These prerequisites have to be met:

  • a python environment with the packets specified in environment.yml
  • a (with joblib) serialized ML model (e.g. rf_classifier.joblib) that was trained on the feature set containing specified in training.ipynb
  • a .pcap file containing the network packets to classify

If the prerequisites are not met or the user wishes to use the identification tool with more fine grained control, they must first provide a suitable dataset. The notebook training.ipynb can then be used as a template for preparing the data, training and serializing the classifier. Within the notebook the provided packet capture files can be converted to pandas.DataFrames that are necessary to perform model training. Furthermore these DataFrames can be serialized into Python's .pkl files. This functionality is provided by convert_pcap_to_df.py. With the data available as serialized data frames, model training can be started by executing the classifier portion of the training.ipynb notebook. The resulting RF model is then saved into another serialized python object.

After this, identify_device.py can be called to perform the classification and return the identified devices.

Author

Tobias Becher

Acknowledgments

This repository accompanies a thesis paper written at the University of Hagen.

License

Copyright © 2021 Tobias Becher.
This project is MIT licensed.

shd-identification's People

Contributors

tb-devacc avatar

Stargazers

Johannes Jestram avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.