Giter Club home page Giter Club logo

Comments (10)

beanumber avatar beanumber commented on August 26, 2024

Oh...nevermind.
It seems that this is the same as mosaicData::Gestation
But then shouldn't this package import mosaicData?

from openintro.

beanumber avatar beanumber commented on August 26, 2024

Upon further review, these data sets do not appear to be the same. In particular, the parity and smoke variables are handled differently. In babies they are binary, but in mosaicData::Gestation they are not.

Can anyone illuminate?

from openintro.

mine-cetinkaya-rundel avatar mine-cetinkaya-rundel commented on August 26, 2024

I'm afraid there is a data provenance issue here and I have not been able to track down the origin any further than what is stated in the help file of the package.

from openintro.

mine-cetinkaya-rundel avatar mine-cetinkaya-rundel commented on August 26, 2024

More on this at https://twitter.com/AmeliaMN/status/1331037890382467076?s=20

@AmeliaMN @hardin47

from openintro.

AmeliaMN avatar AmeliaMN commented on August 26, 2024

I feel like this could be a good opportunity to update the ncbirths dataset, using data from 2019. Pretty sure all my wrangling code would work on the new data, the only tricky piece is that real ages are redacted in the public-facing data, so the only thing available are ranges. Maybe someone has a smart idea for how to impute some ages or just randomly assign an age in the range.

from openintro.

mine-cetinkaya-rundel avatar mine-cetinkaya-rundel commented on August 26, 2024

I think it's a great idea to have the updated datasets here @AmeliaMN! We use the existing dataset in current books so I'd hesitate to replace it -- we can put a note in the docs clarifying the provenance issue as well as suggesting using the newer version. Once it's out of all the most recent editions of the books we could consider deprecating it.

I wonder about naming, how about ncbirths19?

Also, ages, hm... First question that comes to mind is, do we have to have ages? I don't have a great suggestion for imputing but could look up an appropriate method. Selecting from a random distribution in the range should be straightforward.

from openintro.

AmeliaMN avatar AmeliaMN commented on August 26, 2024

That's fair, and I have seen other textbooks do similar things. For example, Stat2Data::BaseballTimes vs Stat2Data::BaseballTimes2017 or Lock5Data::HollywoodMovies vs Lock5Data::HollywoodMovies2011. I think ages was a nice variable because it is numeric, and this dataset gets used places like the inference for numeric data lab (that's probably an old link, just easy to put my hands on) and you exploring NC births lab.

from openintro.

mine-cetinkaya-rundel avatar mine-cetinkaya-rundel commented on August 26, 2024

Given that the age bands are not very wide, I wouldn't be opposed to a random draw from a uniform in that range. We can place the data prep code in the data-raw folder. I'd be happy to do this based on your work or a PR is good too, whichever you prefer!

from openintro.

AmeliaMN avatar AmeliaMN commented on August 26, 2024

I started working on a PR and realized a couple things: 1. it seems like the most recent natality data is from 2014, and 2. probably the reason the ncbirths data was from 2004 is that is the last year the data included state information! So, I could make a births14 dataset that would have random babies born in 2014, but they wouldn't necessarily be from North Carolina.

from openintro.

mine-cetinkaya-rundel avatar mine-cetinkaya-rundel commented on August 26, 2024

I think that's perfectly fine!

from openintro.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.