Giter Club home page Giter Club logo

agebyname's Introduction

AgebyName

This R package is inspired by Nate Silver's recent article on How to Tell Someone's Age When All you Know is Her Name. It allows one to (almost) replicate the analysis done in the article, and provides more extensive features.

To get started, you can install the package from github using devtools. You will also need to install the latest version of dplyr, also from github.

library(devtools)
install_github("hadley/dplyr")
install_github("ramnathv/agebyname")

Usage

There are two main functions in this package that allow you to carry out interesting analysis. The plot_name function allows you to plot the distribution of births by name, sex and state, while the estimate_age function provides the age estimate, given the same set of input arguments.

Let us start by plotting the distribution for the name Joseph.

library(agebyname)
plot_name('Joseph')

plot of chunk unnamed-chunk-2

Since the sex argument is not specified, plot_name guesses it based on the modal sex for the given name.

One can also plot the distribution for names from a given state. For example, we can plot

library(agebyname)
plot_name('Violet', state_ = "MA")

plot of chunk unnamed-chunk-3

Let us now use the estimate_age function to plot the age distribution for the 25 most common female names.

library(dplyr, warn.conflicts = FALSE)

top_100_names = bnames %>%
  group_by(name, sex) %>%
  summarize(n = sum(n)) %>%
  arrange(desc(n)) %>%
  head(100)

estimates = plyr::ldply(top_100_names$name, estimate_age)

library(ggplot2)
estimates %>%
  left_join(top_100_names) %>%
  filter(sex == "F") %>%
  arrange(desc(n)) %>%
  head(25) %>%
  ggplot(aes(x = reorder(name, -q50), y = q50)) +
    geom_point() +
    geom_linerange(aes(ymin = q25, ymax = q75)) + 
    coord_flip()
## Joining by: "name"

plot of chunk unnamed-chunk-4

You can also use this package to generate an interactive Shiny application that allows people to explore the age distribution of names. An example app is available in the package.

app = system.file('example1', package = 'agebyname')
shiny::runApp(app)

shinyapp1

One can also use the rMaps package to generate an interactive choropleth map of the relative popularity of a name.

library(rMaps)
bnames_by_state %>%
  group_by(state, year, sex) %>%
  mutate(prop = n/sum(n)) %>%
  filter(year %in% 2000:2005, name == 'Anna', sex == 'F') %>%
  ichoropleth(prop ~ state, data = ., animate = 'year')

ichoropleth1

Data

This package uses four primary datasets.

  1. Babynames
  2. Babynames by State
  3. Cohort Life Tables
  4. Census Live Births Data

All data was downloaded from the above sources, processed using the R scripts in the rawdata folder, and saved as .rdata files in the data folder. Some extrapolation was done on the raw data to correct for the fact that not all births were recorded by SSA till around 1930, since it wasn't mandatory. Note that there might be some differences in the way I have extrapolated the data, as compared to Nate Silver. If you find any discrepancies in my data cleaning process, please feel free to file an issue or a pull-request.

agebyname's People

Contributors

ramnathv avatar

Watchers

James Cloos avatar Ali Saad avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.