Giter Club home page Giter Club logo

dat_sf_12's Introduction

DAT SF 12

dat_sf_12's People

Contributors

maddatascience avatar sampathweb avatar alexchaomander avatar bingbingboo avatar vanessaohta avatar

Watchers

James Cloos avatar

dat_sf_12's Issues

HW 2 Feedback

  • Nice use of plots to help you determine the number of neighbors for KNN
  • Good use of scale. Adding the additional scaling jumps the predictive accuracy from 70% to 90%. Exploration of your data through plots and other means would have revealed this.
  • There are other ways to calculate the accuracy of your model. For instance most models will have a score function. i.e. "clf.score(X_test, y_test)" http://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html
  • And while the problem didn't specify it, I think using a few basic histograms to visualize the distribution of the data can go a long way in catching things like the differing scales and other peculiarities with which you can modify your model.
  • Also add comments or a short writeup comparing the models you've done. This will not only help the reader but also yourself when you go back to look at your code.
  • All in all, good work!

HW 1 review by OttoS

@bingbingboo

"Comments refer to the question number:
#3. You did this correctly, but to see the full functionality of Pandas, don't use the print command, just type ""df.head(20)"" This keeps the output formatted in a table form that is easier to read.
#4 Good job!
#6 You're sql commands are correct! You just didn't use the pd.read_sql() command. For example, if you modified your code to read pd.read_sql(""SELECT * FROM sf_df limit 5"",con) - if runs just as you want it to! 7. Same as 6, SQL query is correct, so just modify for pandas: pd.read_sql(""SELECT * FROM sf_df where neighborhood='Haight Ashbury' and category LIKE '%Sidewalk Cleaning%' LIMIT 10"",con). "

Project Feedback #2

A few things ~ don't forget to think about % of total flights late. If you just do raw counts, you'll miss some of the smaller airports!

Really like the weather idea!!

Project Proposal Feedback

@bingbingboo

You wrote:
I want to collect 5 or 10 years of flights arrival data into a big data store probably Hadoop like AWS EMR.

Comment:
How will you collect this data?
http://www.transtats.bts.gov/OT_Delay/OT_DelayCause1.asp
http://openflights.org/data.html

You wrote:
Based on the dataset, I can render some historical analysis for flight departure or arrival delay based on airline, aircraft type, date/time and airport name.

Comment:
Great! I like this idea. Can you think of some useful questions to answer that you, as a flight customer might be curious about?

You wrote:
I hope I will be able to build a predictive model based on the past data to predict what is the chance a particular flight will be delayed.

Comment:
Great!!!! What sort of features do you think could be predictive in this format? What data is available? A regression model could be a good place to start, but also pay attention during our Naive bayes lectures!

Lets keep this conversation going here! Tag me in your reply (@ostegm)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.