Giter Club home page Giter Club logo

data-sets's Introduction

Here you can find all data sets that are used is examples at Pythonfordatascience.org.

These data sets are open to the public and can be downloaded and used by anyone. The sources of each data set will be inlcuded in this README file.

To download all files, click the Clone or download drop down arrow and select "Download ZIP". This will download all the data sets used. Another option is to click on the file that you are interested in and click the "Raw" button which will open the file the browser. From here, the URL link can be used in the pandas.read_csv() method and it will import the dataset.

Data sets (in no particular order)
The Energy Level.csv data set is a simulated data set that was created to be used in an independent t-test and compared two groups, Group A and Group B, on some outcome measure. The values range 1-10 and can represent anything that fits within that scale. It was created using the following Python code:

np.random.seed(12345678)

df = pd.DataFrame(np.random.randint(10, size= (100, 2)), columns= ['Group A', 'Group B'])

df.to_csv("Energy Level.csv", index= False)

The automotive_data.csv file was downloaded from Kaggle.com from the user Ramakrishnan Srinivasan; the link to the full page is here: https://www.kaggle.com/toramky/automobile-dataset

The responses.csv file was downloaded from Kaggle.com from the user Miroslav Sabo; the link to the full page is here: https://www.kaggle.com/miroslavsabo/young-people-survey. The "Participant Number" column is not part of the original data set. This was added to show examples on how to merge.

The responses_state.csv file is a simulated file (not real data) to be paired with the responses.csv data in the merging examples.

admission.csv file is from the logistic regression example created by UCLA for their walk through of how to conduct logistic regression using Stata. The original data link is here: https://stats.idre.ucla.edu/stat/stata/dae/binary.dta

blood_pressure.csv is an example data set that is included in Stata. This file was exported from within Stata to be used within Python.

difficile.csv is a made up data set that was created to be used in an example.

fairpoor.csv is a made up data set that was created to be used in an eample.

data-sets's People

Contributors

opensourcefordatascience avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.