Giter Club home page Giter Club logo

dat5_bos's Introduction

DAT5 Course Repository

Course materials for General Assembly's Data Science course in Boston, MA (20 January 2015 - 07 April 2015). View student work in the student repository.

Instructor: Bryan Balin. Teaching Assistant: Harish Krishnamurthy.

Office hours: Wednesday 5-6pm; Friday 5-6:30pm, @Boston Public Library; as needed Tuesdays and Thursdas @GA.

Course Project information

Tuesday Thursday
1/20: Introduction 1/22: Python & Pandas
1/27: Git and GitHub 1/29: SQL
2/3: Advanced Pandas
Milestone: Question and Data Set
2/5: Numpy, Machine Learning, KNN
2/10: scikit-learn, Model Evaluation Procedures 2/12: Linear Regression
2/17: Logistic Regression,
Preview of Other Models
2/19: Model Evaluation Metrics
Milestone: Data Exploration and Analysis Plan
2/24: Working a Data Problem 2/26: Clustering and Visualization
Milestone: Deadline for Topic Changes
3/3: Naive Bayes 3/5: Natural Language Processing
3/10: Decision Trees and Ensembles
Milestone: First Draft
3/12: Advanced scikit-learn
3/17: No Class 3/19: Databases and MapReduce
3/24: Recommenders 3/26: Course Review, Companion Tools
Milestone: Second Draft (Optional)
3/31: TBD 4/2: Project Presentations
4/7: Project Presentations

Installation and Setup

  • Install the Anaconda distribution of Python 2.7x.
  • Install Git and create a GitHub account.
  • Once you receive an email invitation from Slack, join our "datbos05 team" and add your photo!

Class 1: Introduction

  • Introduction to General Assembly
  • Course overview: our philosophy and expectations (slides)
  • Data science overview (slides)
  • Tools: check for proper setup of Anaconda, overview of Slack

Homework:

  • Resolve any installation issues before next class.

Optional:

Class 2: Python & Pandas

slides. Python refresher code. Python code. Pandas code.

  • Brief overview of Python
  • Brief overview of Python environments: Python scripting, IPython interpreter, Spyder
  • Working with data in Pandas
    • Loading and viewing data
    • Indexing and selecting data
    • Assigning, reassigning, and splitting data
    • Describing and summarizing data
    • Plotting data

Homework:

Optional:

Resources:

Class 3: Git and GitHub

Homework:

Class 4: SQL

slides

Overview of the baseball archive

  • Installation of SQLite, Sublime, DB Visualizer, and our dataset
  • The SELECT statement
  • The WHERE clause
  • ORDER BY
  • LEFT JOIN and INNER JOIN
  • GROUP BY
  • DISTINCT
  • CASE statements
  • Subqueries and IS NOT NULL
  • CREATE TABLE
  • Using Pandas and SQL Seamlessly

Homework:

  • Complete the in-class excercises, if you haven't already:

    • Find the player with the most at-bats in a single season.
    • Find the name of the the player with the most at-bats in baseball history.
    • Find the average number of at_bats of players in their rookie season.
    • Find the average number of at_bats of players in their final season for all players born after 1980.
    • Find the average number of at_bats of Yankees players who began their second season at or after 1980.
    • Pass the SQL in the previous bullet into a pandas DataFrame and write it back to SQLite.
  • Create full, working queries to answer at least four novel questions you have about the dataset using the following concepts:

    • The WHERE clause
    • ORDER BY
    • LEFT JOIN and INNER JOIN
    • GROUP BY
    • SELECT DISTINCT
    • CASE statements
    • Subqueries and IS NOT NULL
  • Using Pandas, (1) query the Baseball dataset, (2) transform the data in some way, and (3) write a new table back to the databse.

  • Commit and Sync your SQL and Pandas files to your GitHub fork and issue a pull request.

Resources: * SQLite homepage * SQLite Syntax

SQL Tutorials: * Note: These tutorials are for all flavors of SQL, not just SQLite, so some of the functions may behave differently in SQLite. * SQL tutorial * SQLZoo

Class 5: Advanced Pandas

Class 6: Numpy, Machine Learning, KNN

Class 7: scikit-learn, Model Evaluation Procedures

Class 8: Linear Regression

Class 9: Logistic Regression, Preview of Other Models

Class 10: Model Evaluation Metrics

Class 11: Working a Data Problem

Class 12: Clustering and Visualization

Class 13: Naive Bayes

Class 14: Natural Language Processing

Class 15: Decision Trees and Ensembles

Class 16: Advanced scikit-learn

Class 17: Databases and MapReduce

Class 18: Recommenders

Class 19: Course Review, Companion Tools

Class 20: TBD

Class 21: Project Presentations

Class 22: Project Presentations

dat5_bos's People

Contributors

bbalin12 avatar wittedhaddock avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.