Giter Club home page Giter Club logo

cancer-and-nonbiological-variables's Introduction

Row-2-Group-Project

Data Analytics Group Project

Overview:

Nonbiological data was used in this group project to determine whether there were differences between cancer incidence rates and cancer mortality rates

The data we explored to make the above determination was:

  • Air quality data
  • Employment data by sector
  • Medical insurance rate data
  • Household income data
  • Lifestyle data

How to Run Code:

1). Clone the Github repository into a folder in your local

2). Open Jupyter Lab (May need to install Anaconda in order to do this)

3). Navigate to Row-2-Group-Project/Final Result/Analysis_cancer.ipynb

3). Run all cells by clicking on Run>Run all cells

Data Analysis:

  • Cancer mortality and cancer incidence rates were joined in with nonbiological data.

  • Pandas and Matplotlib was used to clean, manipulate, and join all datasets in order to

  • Create scatter plots and r squared values

Screenshots:

ScreenShot

This chart demonstrates the weight that various lifestyle factors can have on cancer incidence rate. We used this chart type because we wanted the user to be able to view all lifestyle factors we looked into and their corresponding weight on cancer incidence rate in a clear, quick way.

ScreenShot

This chart demonstrates that there is little correlation between household income and cancer death rate. A scatter plot was selected because that is the one of the best charts that can be used for a correlation visualization.

ScreenShot

These graphs demonstrate the correlations between various air pollutants and cancer incidence and cancer death rates. The images on the right demonstrate the relationship between PM2.5 and all states in the United States and the image on the right only looks at select states along with SO2, NO2, and PM2.5. A line chart was used to see if cancer incidence rate increased over time with increased exposure to air pollutants and scatter plots were used to demonstrate individual air pollutant correlations.

Findings:

  • There is a correlation between percentage of manufacturing jobs and cancer incidence rates-chemical manufacturing shows the highest r squared value

  • There is a correlation between cancer incidence rate and concentration of PM 2.5

cancer-and-nonbiological-variables's People

Contributors

anjurupesh avatar settinge avatar davidtaboh avatar jmglia avatar bcliffwm avatar anishaa95 avatar

Watchers

James Cloos avatar  avatar

Forkers

davidtaboh

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.