Giter Club home page Giter Club logo

datasciencecompetiton-stemfellowsip-'s Introduction

DataScienceCompetiton(Stemfellowsip)

This project was created for the data science competition held by Stemfellowship in Canada, where I used 699 samples of benign and malignant breast tumors from Kaggle. The sample contains nine features: mitosis, nuclioli, nulcei, cell size and shape, clump size, adhesion,single epitheloa cell size, and chromatin.

This project is composed of four parts: data collection, data cleaning, visualization of the variables, and creating five machine learning models.

Data Cleaning: After checking Null values, the values have been replaced with mean values since the dataset is small and not categorical. Then the most and least two important features or dependent variables have been identified.

Data Visualization: The relationship between the variables has been explored through craeting violin plot, heatmap, and pair plot.

image

This figure has been taken from the paper written by me and my team (BioinfoScience), where we have explored the relationship between the two most and least important features with and without breast cancer.

image

The above figure has been taken from the research paper related to this data science competition written by me and my team.

Machine learning model: Five types of supervised machine learning models, such as deep learning with five layers, decision trees, support vector machine, naive bayes, and logistic regression, have been constructed. Feature Engineering: Only in the case of deep learning has feature engineering been conducted where binary hot encoding has been used to feed the model.

Trainning and Evaluation: Five, six, eight and ten k fold cross validation have been used for k fold cross vlaidation.

Evaluation: ROC plot have been constructed and AUC score, sensitivity, specififivity and accuracy have been calculated.

Total ten model have been created, trained, validated and evaluated where five for all feature and five for two least features.

image

This figure has been taken from the research paper written by BioinfoScience Team at Data Science Competition held by Stemfellowship.

It has been found out that deep learning and SVM with all features and logistic regression with two features have the hightes and lowest AUC scores respectively.

datasciencecompetiton-stemfellowsip-'s People

Contributors

ayeshaskp avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.