The data-sets from mandalravi

Here you can find all data sets that are used is examples at Pythonfordatascience.org.

These data sets are open to the public and can be downloaded and used by anyone. The sources of each data set will be inlcuded in this README file.

To download all files, click the Clone or download drop down arrow and select "Download ZIP". This will download all the data sets used. Another option is to click on the file that you are interested in and click the "Raw" button which will open the file the browser. From here, the URL link can be used in the pandas.read_csv() method and it will import the dataset.

Data sets (in no particular order)
The Energy Level.csv data set is a simulated data set that was created to be used in an independent t-test and compared two groups, Group A and Group B, on some outcome measure. The values range 1-10 and can represent anything that fits within that scale. It was created using the following Python code:

np.random.seed(12345678)

df = pd.DataFrame(np.random.randint(10, size= (100, 2)), columns= ['Group A', 'Group B'])

df.to_csv("Energy Level.csv", index= False)

The automotive_data.csv file was downloaded from Kaggle.com from the user Ramakrishnan Srinivasan; the link to the full page is here: https://www.kaggle.com/toramky/automobile-dataset

The responses.csv file was downloaded from Kaggle.com from the user Miroslav Sabo; the link to the full page is here: https://www.kaggle.com/miroslavsabo/young-people-survey. The "Participant Number" column is not part of the original data set. This was added to show examples on how to merge.

The responses_state.csv file is a simulated file (not real data) to be paired with the responses.csv data in the merging examples.

admission.csv file is from the logistic regression example created by UCLA for their walk through of how to conduct logistic regression using Stata. The original data link is here: https://stats.idre.ucla.edu/stat/stata/dae/binary.dta

blood_pressure.csv is an example data set that is included in Stata. This file was exported from within Stata to be used within Python.

difficile.csv is a made up data set that was created to be used in an example.

fairpoor.csv is a made up data set that was created to be used in an eample.

mandalravi / data-sets Goto Github PK

data-sets's Introduction

data-sets's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent