Giter Club home page Giter Club logo

r-outbreak-info's Issues

Mods to characteristic mutations heatmap

Suggestions for "Characteristic S-gene mutations in common lineages" in https://outbreak-info.github.io/R-outbreak-info/locationvignette.html:

  • The heatmap is wrong; you need to grab all the mutations which have at least 75% prevalence in any of the lineages. I do this by getting all mutations (getMutationsByLineage(.x, frequency=0)) within the lineage_list, then finding all unique mutations where they're > 0.75 in any lineage, and then filtering the mutations in df_ll to only be those mutations.
  • If you want a dplyr version (I'm a big advocate because I find the code easier to read): lineage_list <- lineage_df %>% filter(lineage != "other") %>% pull(lineage) %>% unique()
  • df_ll <- subset(df_ll, gene == "S") can also be dplyr-ized with filter
  • add coord_equal to the ggplot so the chiclets are square rather than rectangular.
  • add a border around the tiles
  • add a ggtitle
  • To mimic the outbreak palette, I'm using RdPu from ColorBrewer
  • I'd probably change the theme to remove the distracting gridlines, etc.

getEpiData function is broken

getEpiData(name="United States of America", date=2020-07-01, mostRecent=T) example from the help documentiation is broken.

  1. Need to translate mostRecent argument TRUE / FALSE into ES-compatible representations of booleans (true, false)
  2. Fixing that string leads to an rbind err:
getEpiData(name="United States of America", date=2020-07-01, mostRecent='true')

[1] "Retrieving data..."
 1 records retrieved

Error in rbind_pages(results) : 
  all(vapply(pages, is.data.frame, logical(1))) is not TRUE

Add check if date is a string in getEpiData()

getEpiData(name="United States of America", date=2020-07-01) will not work since the date here will be 2012. Add check to see if date is a string else show relevant error message.

Change the color scheme on the lineage prevalence plots

p <- plotAllLineagesByLocation(location = "United States", other_threshold = 0.03, nday_threshold = 5, ndays = 60)
p <- p + scale_fill_brewer(palette="Paired")

The ColorBrewer "Paired" scheme wouldn't be my first choice for this plot. The paired-ness of the colors mean that users subconsciously associate the two similar colors together, when B.1.1.519 and B.1.1.7 are completely unrelated.

Add introduction to README and vignette.

from @flaneuse : Add introduction to the Github Readme and the vignette to explain what the data is, where it comes from, etc. More context about what the package can do. Should also link to the data sources (outbreak.info/sources) and the api documentation (api.outbreak.info)

Improve helper text displayed to user

Call: county_df=getAdmn2ByCountry()

  • The counter says things like 144 records retrieved; do you actually mean 14,400 records? Would suggest changing language to either something like "query" or multiplying by 1000.
  • It'd be helpful to grab the number of results so you have an idea of how far you are into the function call (e.g. 14,400 out of 68,000)
  • Would be nice to add a newline after "X records are retrieved". When there's an error, they get pasted together.
  • Might also be useful to display the estimated time remaining in the function. see progress package: https://dplyr.tidyverse.org/reference/progress_estimated.html

Improve function call perf and/or offer users more options to limit their results

When you call functions like getAdmn0() or getAdmn2ByCountry() the function is slooooooooow since it has to make a bunch of calls and returns a lot of data.

Would be nice to offer the users the ability to limit the query to a smaller subset of the data to help make it easier to return data faster. Potential options could include narrowing by date range, mostRecent, returning only a subset of fields...

Might also be worth double-checking that the function is as fast as possible. You're probably API-limited, but if it could be faster, that'd be nice. (low priority)

README

  • Be explicit that installation occurs in R script and not command line (this seems intuitive, but doesn't hurt to state)
  • Be more descriptive with purpose and function of R package directly, rather than relying on the user to click on the vignette (which does a good job of explaining functions themselves, but isn't entirely descriptive itself). For example, provide a few of the package uses directly.
  • License? Credits? I know we did an overview on how to approach a README (Mar 3, 2020) that Ginger facilitating, and there was a GoogleDoc breakdown

Geographic synonym lookup in searchLocations()

From @flaneuse : searchLocations would be MUCH more useful if there was a geographic synonym lookup function. If I try to lookup “USA” right now, I get no results. We could do this either by a gazetteer approach (store a list of synonyms for each location), or we could use a package that calls Google’s Geocode API to find the lat/lon of the location and let the user select what they mean.

We would need an api_key for google geocoding api.

Show progress bar for queries

Functions like getAdmn2ByCounty take a while to load the data. Showing a progress bar with the number of items retrieved would improve user experience.

Add ggtitle to plotPrevalenceByLocation

Add a title to plotPrevalenceByLocation(pangolin_lineage = "P.1", location = "Brazil"):
image

Personally, I also prefer percentage over proportion, but it's not a big deal either way.

plotPrevalenceByLocation(pangolin_lineage = "P.1", location = "Brazil") +
  ggtitle("Prevalence of P.1 in Brazil") +
  scale_y_continuous(labels = scales::percent, name="percentage")

Encode & as unicode for getCountyByRegion()

Currently the following error is thrown since & is not encoded as \u0026

eu = getCountryByRegion("Europe & Central Asia")
Error in open.connection(con, "rb") : HTTP error 400.

Simplify ggplot code in Lineage Report Rmd

ditch_the_axes <- theme(
  axis.text = element_blank(),
  axis.line = element_blank(),
  axis.ticks = element_blank(),
  panel.border = element_blank(),
  panel.grid = element_blank(),
  axis.title = element_blank()
)

p <- ggplot() + geom_polygon(data = us_df, aes(x, y, group=group, fill=proportion), color="gray25", size=0.3) + scale_fill_gradient(low = "lemonchiffon", high = "lightseagreen", name = "Cumulative prevalence \nof B.1.1.7", labels = scales::percent) + ditch_the_axes + coord_fixed(1.05)
show(p)

Suggest replacing ditch_the_axes with the in-built ggplot::theme_void() function

Lineage Report vignette mapping

I'm not in love with the call to coord_fixed(1.05) in the Lineage Report vignette, since that's a hardcoded value of the aspect ratio of the lat/lon in a proper projection of the U.S. map.

I would think about converting the mapping data into an sf object, where you will need to:

  1. convert the lat/lon of the polygons into a geometry object
  2. specify the projection using st_transform and a proj4 string -- can grab from projection wizard
  3. change the ggplot geom_polygon call into geom_sf

There's a lot more cool things the sf package can do for manipulating geographic data...

Allow prevalence of a mutation

getPrevalenceByLocation(mutations = c("S:P681R"), location = "Brazil")
should work, but it's requiring a pangolin_lineage.

getCountryByRegion is broken

getCountryByRegion("North America")

Error in getEpiData(wb_region = location, admin_level = 0) : 
  object 'location' not found

Writing and expose a generic getEpiData function and structure other functions as wrappers

Feedback from @flaneuse : It would be something like getEpiData(location_id, location_name, date, most_recent, admin_level, fields, sort, size), where all those fields are optional. getData() would return ALL the data. The advantage of this approach is that you now have one function which all the others can rely on, so if anything ever changes, like the API URL, you only have to change it in one place. You can also then have one place to do all the transformations you need. This function would also expose a lot more functionality to the user like sorting, specifying size, fields to return, etc. It’d be really cool too if you could specify date ranges (let us know if you need help constructing that ES query).

`plotCovid` is broken

plotCovid("Florida", "confirmed_per_100k")
[1] "Retrieving data..."
 1 records retrieved
Error in key %in% colnames(df) : object 'key' not found

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.