Giter Club home page Giter Club logo

go's Introduction

created & maintained by @clarecorthell, founding partner of Luminant Data Science Consulting

Contents - [The Open-Source Data Science Masters](#the-open-source-data-science-masters) - [Contents](#contents) - [The Internet is Your Oyster](#the-internet-is-your-oyster) - [The Motivation](#the-motivation) - [An Academic Shortfall](#an-academic-shortfall) - [Ready?](#ready) - [The Open Source Data Science Curriculum](#the-open-source-data-science-curriculum) - [A Note About Direction](#a-note-about-direction) - [Math](#math) - [Computing](#computing) - [Data Analysis](#data-analysis) - [Data Communication and Design](#data-communication-and-design) - [Python (Learning)](#python-learning) - [Python (Libraries)](#python-libraries) - [Datasets are now here](#datasets-are-now-here) - [R resources are now here](#r-resources-are-now-here) - [Data Science as a Profession](#data-science-as-a-profession) - [Capstone Project](#capstone-project) - [Resources](#resources) - [Read](#read) - [Watch & Listen](#watch--listen) - [Learn](#learn) - [Notation](#notation) - [Contribute](#contribute)

The Open Source Data Science Curriculum

1) Start Here

Intro to Data Science / UW Videos

  • Topics: Python NLP on Twitter API, Distributed Computing Paradigm, MapReduce/Hadoop & Pig Script, SQL/NoSQL, Relational Algebra, Experiment design, Statistics, Graphs, Amazon EC2, Visualization.

Data Science / Harvard Videos & Course

  • Topics: Data wrangling, data management, exploratory data analysis to generate hypotheses and intuition, prediction based on statistical methods such as regression and classification, communication of results through visualization, stories, and summaries.

Data Science with Open Source Tools Book $27

  • Topics: Visualizing Data, Estimation, Models from Scaling Arguments, Arguments from Probability Models, What you Really Need to Know about Classical Statistics, Data Mining, Clustering, PCA, Map/Reduce, Predictive Analytics
  • Example Code in: R, Python, Sage, C, Gnu Scientific Library

Ethics in Machine Intelligence

Human impact is a first-class concern when building machine intelligence technology. When we build products, we deduce patterns and then reinforce them in the world. Ethics in any Engineering concerns understanding the sociotechnological impact of the products and services we are bringing to bear in the human world -- and whether they are reinforcing a future we all want to live in.

2) Math and Problem-Solving

Linear Algebra & Programming

Convex Optimization

Statistics

Differential Equations & Calculus

Problem Solving

3) Databases, Distributed Computing, and Data Design Get your environment up and running with the [Data Science Toolbox](http://bit.ly/datascitoolbox)

Distributed Computing Paradigms

Databases

Data Mining

Data Design

How does the real world get translated into data? How should one structure that data to make it understandable and usable? Extends beyond database design to usability of schemas and models.

OSDSM Specialization: Web Scraping & Crawling

4) Applied Data Science: BeginnerMachine Learning

Foundational & Theoretical

Practical

Probabilistic Modeling

Deep Learning (Neural Networks)

5) Applied Data Science: Intermediate

Social Network & Graph Analysis

Natural Language Processing

Data Analysis

One of the "unteachable" skills of data science is an intuition for analysis. What constitutes valuable, achievable, and well-designed analysis is extremely dependent on context and ends at hand.

in Python

  • Data Analysis in Python Tutorial
  • Python for Data Analysis Book $24
  • An Example Data Science Process ipynb
7 and 8: Data Communication and Design
7) Data Visualization _Data Visualization and Communication_ * The Truthful Art: Data, Charts, and Maps for Communication [Cairo / Book ```$21```](http://amzn.to/1UydGAc)

Theoretical Design of Information

Applied Design of Information

Theoretical Courses / Design & Visualization

Practical Visualization Resources

8) Python Libraries, Packages, and APIs Installing Basic Packages [Python, virtualenv, NumPy, SciPy, matplotlib and IPython ](http://bit.ly/scientific-py-install) & [Using Python Scientifically](http://bit.ly/lecture-scipy)

Command Line Install Script for Scientific Python Packages

More Libraries can be found in the "awesome machine learning" repo & in related specializations

Data Structures & Analysis Packages

Machine Learning Packages

Networks Packages

Statistical Packages

  • PyMC - Bayesian Inference & Markov Chain Monte Carlo sampling toolkit
  • Statsmodels - Python module that allows users to explore data, estimate statistical models, and perform statistical tests
  • PyMVPA - Multivariate Pattern Analysis in Python

Natural Language Processing & Understanding

  • NLTK - Natural Language Toolkit
  • Gensim - Python library for topic modeling, document indexing and similarity retrieval with large corpora. Target audience is the natural language processing (NLP) and information retrieval (IR) community.

Data APIs

  • twython - Python wrapper for the Twitter API

Visualization Packages

  • matplotlib - well-integrated with analysis and data manipulation packages like numpy and pandas
  • Seaborn - a high-level statistical visualization package built on top of matplotlib
9) Capstone Project * Capstone Analysis of Your Own Design; [Quora](http://bit.ly/quora-toyproblems)'s Idea Compendium * Healthcare Twitter Analysis [Coursolve & UW Data Science](http://bit.ly/project-healthcare-twitter-analysis) * Analyze your LinkedIn Network [Generate & Download Adjacency Matrix](http://socilab.com/)
Resources #### Read * [DataTau](http://bit.ly/datatau) - The "Hacker News" of Data Science * [The Signal and The Noise - Nate Silver ```$15```](http://amzn.to/1hoxQoG) - Bestseller Pop Sci * [Zipfian Academy's List of Resources](http://bit.ly/1qoF1We) * [A Software Engineer's Guide to Getting Started with Data Science](http://bit.ly/1jwgV4p) * [Data Scientist Interviews / Metamarkets](http://bit.ly/1r1tJot) * [/r/MachineLearning](http://bit.ly/1uANaEM)

Watch & Listen

iPython Data Science Notebooks

Datasets are now here

R resources are now here

Data Science as a Profession

  • Doing Data Science: Straight Talk from the Frontline O'Reilly / Book $25
  • The Data Science Handbook: Advice and Insights from 25 Amazing Data Scientists Book $22

go's People

Contributors

aaronjbecker avatar acquayefrank avatar alizeefr avatar byrnenick avatar clarecorthell avatar dawny33 avatar eduardkoller avatar fauria avatar gnperdue avatar harjotsinghparmar avatar kressaty avatar lgeorge avatar mikezawitkowski avatar mm- avatar mminar avatar nathanepstein avatar nathantypanski avatar niangaotuantuan avatar omnipresent avatar ptwobrussell avatar rajeshwerkushwaha avatar seakun avatar shaunmccarthy avatar siyaoxu avatar srinify avatar ssaeger avatar stefsy avatar stevenmaude avatar tonyfischetti avatar westurner avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.