Giter Club home page Giter Club logo

elementsofdatascience's Introduction

Elements of Data Science is an introduction to data science in Python for people with no programming experience. My goal is to present a small, powerful subset of Python that allows you to do real work in data science as quickly as possible.

At the same time, I want to make sure the material is presented clearly. I don't assume that the reader knows anything about programming, statistics, or data science. When I use a term, I try to define it immediately, and when I use a programming feature, I try to explain it.

There are a few places where I use a programming feature before it is fully explained, but I keep them to a minimum, and I'll let you know what you don't need to know.

This "book" is in the form of Jupyter notebooks. Jupyter is a software development tool you can run in a web browser, so you don't have to install any software. A Jupyter notebook is a document that contains text, Python code, and results. So you can read it like a book, but you can also modify the code, run it, develop new programs, and test them.

The notebooks contains exercises where you can practice what you learn. Most of the exercises are meant to be quick, but a few are more substantial.

This material is a work in progress, so suggestions are welcome. The best way to provide feedback is to click here and create an issue in this GitHub repository.

The notebooks

For each of the notebooks below, you have two options: if you view the notebook on NBViewer, you can read it, but you can't run the code. If you run the notebook on Colab, you'll be able to run the code, do the exercises, and save your modified version of the notebook in a Google Drive (if you have one).

Notebook 1

Variables and values: The first notebook explains how to use Jupyter and introduces the most basic programming features in Python, variables and values.

Press this button to run this notebook on Colab:

or click here to read it on NBViewer

Notebook 2

Times and places: This notebook shows how to represent times, dates, and locations in Python, and uses the GeoPandas library to plot points on a map.

Press this button to run this notebook on Colab:

or click here to read it on NBViewer

Notebook 3

Lists and Arrays: This notebook presents lists and NumPy arrays. It discusses absolute, relative, and percent errors, and ways to summarize them.

Press this button to run this notebook on Colab:

or click here to read it on NBViewer

Notebook 4

Loops and Files: This notebook presents the for loop and the if statement; then it uses them to speed-read War and Peace and count the words.

Press this button to run this notebook on Colab:

or click here to read it on NBViewer

Notebook 5

Dictionaries: This notebook presents one of the most powerful features of Python, dictionaries, and uses them to count the unique words in War and Peace.

Press this button to run this notebook on Colab:

or click here to read it on NBViewer

Notebook 6

Plotting: This notebook introduces Matplotlib, a plotting library for Python, and uses it to generate a few common data visualizations and one less common one, a Zipf plot.

Press this button to run this notebook on Colab:

or click here to read it on NBViewer

Notebook 7

DataFrames: This notebook presents DataFrames, which are used to represent tables of data. And it uses data from the National Survey of Family Growth to find the average weight of babies in the U.S.

Press this button to run this notebook on Colab:

or click here to read it on NBViewer

Notebook 8

Distributions: This notebook explains what a distribution is and presents 3 ways to represent a distribution: a PMF, CDF, or PDF. It also shows how to compare a distribution to another distribution or a mathematical model.

Press this button to run this notebook on Colab:

or click here to read it on NBViewer

Notebook 9

Relationships: This notebook explores relationships between variables using scatter plots, violin plots, and box plots. It quantifies the strength of a relationship using the correlation coefficient and uses simple regression to estimate the slope of a line.

Press this button to run this notebook on Colab:

or click here to read it on NBViewer

Notebook 10

Regression: This notebook presents multiple regression and uses it to explore the relationship between age, eduction, and income. It uses visualization to interpret multivariate models. It also presents binary variables and logistic regression.

Press this button to run this notebook on Colab:

or click here to read it on NBViewer

Notebook 11

Inference: This notebook presents computational inference, a process for computing p-values, standard errors, and confidence intervals using randomization methods rather than analysis.

Press this button to run this notebook on Colab:

or click here to read it on NBViewer

elementsofdatascience's People

Contributors

allendowney avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.