Giter Club home page Giter Club logo

Comments (6)

mine-cetinkaya-rundel avatar mine-cetinkaya-rundel commented on August 16, 2024

Agreed. Should we use the nycflights13 data? It's a good one for a lab that doesn't involve inference.

from oilabs-base-r.

norcalbiostat avatar norcalbiostat commented on August 16, 2024

You'd have to change the Normal distribution lab as well. And I feel the data set is fine, just choose a different outcome variable perhaps.

from oilabs-base-r.

mine-cetinkaya-rundel avatar mine-cetinkaya-rundel commented on August 16, 2024

@norcalbiostat we could use different datasets for the two labs though, so I don't think we need to feel limited to variables that are normally distributed for the intro to data lab.

from oilabs-base-r.

beanumber avatar beanumber commented on August 16, 2024

But I think @norcalbiostat 's point is the the body dimensions data in the Normal Distribution lab has the same problem.

from oilabs-base-r.

mine-cetinkaya-rundel avatar mine-cetinkaya-rundel commented on August 16, 2024

I should admit I haven't used the normal distribution lab in a while, so I should first correct myself - the two labs don't use the same dataset anyway.

I feel like the issue with the intro to data lab is the wdiff variable, that we then compare between men and women. The normal distribution lab compares heights, briefly, but beyond that doesn't go into comparing peoples' desired weights, so perhaps it's a bit more factual and bit less about body image?

I'm completely on board with changing the dataset for the intro to data lab, as I think that lab can be enhanced to be more about data wrangling skills (in addition to resolving the issue @beanumber raised). And I'm also on board with changing the data in the normal distribution lab because it's not that exciting (likely the reason why I haven't been doing that lab lately...). But if we're prioritizing, it seems like intro to data lab might have a more urgent issue to be addressed.

from oilabs-base-r.

andrewpbray avatar andrewpbray commented on August 16, 2024

I'm all for refreshing data sets, but the challenge is always finding a replacement that is better. And there's often that unfortunate trade-off between data that clearly illustrate a statistical principle and data that is most interesting (please oh please, let us find a population level data set so we can replace the ames data).

I think a data wrangling lab based on the nycflights13 would be terrific. It has heterogeneous data types and is interesting enough to naturally motivate several different questions and analyses. It also has that nice opportunity to define on-time performance in multiple ways, so it's an improvement on wdiff that way. If this lab were to replace lab 1, it's important that it cover some of the key points of chapter 1. It could also be cool to have it go off on it's own data sciency direction, but then it's probably work best as an extra lab.

If I remember correctly, the main thing in favor of the bdims data set is that it's a collection of continuous variables that exhibit a mix of symmetric and skewed distributions. I think we should keep our eyes out for a more interesting replacement, but I have nothing on hand right now.

from oilabs-base-r.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.