Giter Club home page Giter Club logo

what-they-forgot's Introduction

What They Forgot to Teach You About R

Rendered site: rstats.wtf

Netlify preview URL: https://rstats-wtf.netlify.com

Creative Commons License Project Status: Active – The project has reached a stable, usable state and is being actively developed.

The initial impetus for creating these materials is a two-day hands-on workshop. The target learner:

  • Has a moderate amount of R and RStudio experience.
  • Is largely self-taught.
  • Suspects they have drifted into some idiosyncratic habits that may slow them down or make their work products more brittle.
  • Is interested in (re)designing their R lifestyle, to be more effective and more self-sufficient.

The in-person workshops are still the primary delivery method for this content, but we've begun recording prose versions of this content, in order to make it more widely available and for participants to refer back to. Warning: these materials absolutely do not constitute a self-contained "book", nor do they capture all workshop content.

We focus on building holistic and project-oriented workflows that address the most common sources of friction in data analysis, outside of doing the statistical analysis itself.

Workshops

Upcoming and past offerings:

  • rstudio::conf 2020, San Francisco, CA, January 27 & 28 Training Days
  • rstudio::conf January 2019, Austin, TX
  • 2018 October 4 & 5, Seattle, WA
  • rstudio::conf January 2018, San Diego, CA

The workshops typically include substantial components that draw on other materials, such as:

what-they-forgot's People

Contributors

aedobbyn avatar apreshill avatar batpigandme avatar cderv avatar chsafouane avatar cwickham avatar edavidaja avatar ewenme avatar hadley avatar jennybc avatar jimhester avatar kaiaragaki avatar karawoo avatar lionel- avatar lucymcgowan avatar matanhakim avatar maurolepore avatar mtkerber avatar rdinter avatar shannonpileggi avatar wlandau-lilly avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

what-they-forgot's Issues

Should system prep verification use `pkgbuild::check_build_tools()`?

Currently the verify-system-prep section suggests using devtools::has_devel() to verify system readiness for development:
https://github.com/jennybc/what-they-forgot/blob/ec7696b21af2110e7f81262fd3cda8fd24c62e32/14_system-prep-for-build.Rmd#L63

The usethis setup vignette says:

  • pkgbuild::check_build_tools() is the most current function. It can be installed via install.packages("pkgbuild").
  • devtools::has_devel() was a function in devtools, at least up to v1.13.5, so you might still see references to that. All functionality related to package building now lives in pkgbuild. But once you have installed pkgbuild, you should just use the function pkgbuild::check_build_tools() instead.

Is it too soon to move to pkgbuild::check_build_tools() for the What They Forgot text? (If not, I'm happy to submit a PR).

I filed an issue for pkgbuild here re. possibility of returning something to signal success, but I'm thinking it might be better-suited as an usethis feature request…
r-lib/pkgbuild#73

exercise callout

Rather than use .callout-note we should create an exercsise/activity specific callout.

Ideas for Jim for October webinar

We will have to discuss and see if we think the audience will be up for this, but what do you think of me doing a class on the following (likely on day 2?). It would be mostly getting used to git in the terminal and how to get yourself out of messes. Git bash on Windows + the RStudio terminal seems to have everything needed for setting things up; which it looks like you already cover in the earlier material.

  • (Don't) Burn it all Down - The git reflog / git reset and how to recover when your repo is on fire

  • git gymnastics - tracking upstream changes, manipulating / renaming remotes, interactive rebase, git add patch, cherry picking

Mention `base` package Rprofile as a point of interest?

The base package has an Rprofile in it's R directory that may be of interest in the "R Startup" section.

I can't imagine it's recommended, but I accidentally used this base package Rprofile to set my library path, using .libPaths(), during setup on a new computer. Later, while editing my Renviron and the correct Rprofile, I noticed that the code to set my library path was missing. I was confused as to how R was still using my desired path even though it was no longer declared anywhere. After scouring as much Start-Up documentation as I could find, I stumbled upon this guide, which recommends modifying the base package Rprofile. (This recommendation surprises me, but I am glad it helped me to identify my mistake.)

After fixing my mistake, I explored the base package Rprofile and found some neat things, like the fact that this is where T <- TRUE and F <- FALSE are declared.

Compare installed R to the current R

Capturing ideas from a Slack conversation.

Sometimes we encounter people living in an R time capsule, either intentionally, out of fear of upgrading, or they somehow just drifted into it.

Write up some ways to programmatically compare your installed R version to the current version.

rversions::r_release()
getRversion()
r_versions()

A milestone of "out-of-date-ness":

issue a warning when you are > 2 minor releases old, since CRAN stops building binaries for them

Long-term: could be an update.packages()-type function in pkgman or part of a "health check" function in usethis.

Simplify netlify stuff

The cli we were using as beta is now production and various things can be simplified / tidied up.

Two issues in 1! .Rprofiles when you have multiple users and a github thought question

I have a team and organization where all data, Rprojects, and scripts live on one server that we all have access to and work in. We cannot bring our work to our local machines due to security reasons. I'm looking for a good way to set up a .Rprofile for multiple users all working on one project in one location.

As a side note, this can also create some github issues for us as well and would love to brainstorm about workflows when everyone is working in the same repository from the same system.

here() to access parent directories?

Say I have my data stored in ~/Data, but my R project & code is stored in ~/dev/sample_model. Is it good practice to use here() to access this data, or should I hard-code absolute paths to it?

For “Maintaining R”, would it be worth adding architecture-specific `R_LIBS_USER` paths?

I'm not sure whether or not this is obscure, or what many people are moving to/have set up with tools like RSwitch, rcli, and r-lib/rim, but, these days my M1 Mac user library paths look different to what they did in the last pass on the Maintaining R chapter (below)

Once this is setup, the process for transferring your package library becomes.
(assumes `R_LIBS_USER` is set to `~/Library/R/3.5/library`).
```r
# Install new version of R (lets say 3.5.0 in this example)
# Create a new directory for the version of R
fs::dir_create("~/Library/R/3.5/library")
# Re-start R so the .libPaths are updated
# Lookup what packages were in your old package library
pkgs <- fs::path_file(fs::dir_ls("~/Library/R/3.4/library"))

e.g. My default setup for R_LIBS_USER is now ~/Library/R/arm64/4.2/library, with the previous version for this architecture at ~/Library/R/arm64/4.1/library.

However, I also have ~/Library/R/x86_64/4.2/library and ~/Library/R/x86_64/4.1/library.

Again, not sure how common this is/will become, but, let me know if you think it'd be worth a PR.

Package-assisted path handling, via @EdwinTh

Thanks @EdwinTh for blogging your approach!

Use to enrich discussion about ways to make paths more robust across users and computers.

https://edwinth.github.io/multiperson-project/

@hadley also got @EdwinTh's blessing to do a mini code review & makeover, which is here:

tidyverse/design#56

Repeating a comment I made there, in case I can tempt @EdwinTh into a discussion of it here:

if its better to swap configuation for convention

Yeah, why not force users to follow the same convention within their home dir? Or have a helper function in the package that interactively creates a symlink from each user's idiosyncratic data storage choice to ~/project_name/data. A user would do this once, when they start working on the project.

Basically get the varying code / logic out of the package itself and just make sure each user's local situation can "explain itself" to the package.

Data and version control

A question we discussed that might be nice to include in the future: can/should we version control our data in git/Github? If not, what are the other options?

Coding mindset and standards

different for data anaylist/scientist vs SWE

influences many things, some ideas translate, some do not, some need adaptation

Great tweet:

(One of the things research programmers struggle with is the transition from exploration to infrastructure, i.e., from "coding to figure out what the problem is" to "I'm building a reusable tool". Habits from the first are often carried over to the second.) 2/N

https://twitter.com/gvwilson/status/1028964474135429122

warn against `save`

from a piece of code

save(foo, file = "data.Rdata")

....

load("data.Rdata")

it is absolutely not clear what variables are being loaded into the workspace.

something like:

saveRDS(...)
foo <-  readRDS(...)

is much cleaner. Maybe mention that the representation of R object may change between version of R and should therefore not be relied on for long term storage data or sending data to other people.

Why not conda? What about Homebrew?

@jimhester What do you think of adding something about why one should use CRAN's binaries unless you have a really good reason to do otherwise? conda, in particular, comes up a lot and causes pain and it would be nice to write down why it can cause more trouble than people anticipate.

Update front matter

https://whattheyforgot.org

needs a light refresh. Basically reset / update it to better support rstudio::conf and beyond.

Analogous content for Happy Git, in case it is helpful:

It's fine if this still lives on the same page in WTF. I just mean Happy Git might provide a template for the refresh. I've been confronting the same problems for longer there re: persistent vs perishable stuff.

Discuss cases in which workflow might break down

A few folks mentioned that their companies have proprietary databases that they use to pull data, so they can't separate their data pull & analysis sections into multiple scripts. For example, there are instances where a company wouldn't suggest creating intermediate data, and certainly not to save it anywhere.

Maybe including a slide or two on when it would be a bad idea to write many small units for tidy analysis (or a disclaimer on one slide) would be helpful?

Definitive to guide to re-installing a package on Windows

We have all explained this so many times, referring to the repetitive dance necessary to avoid borked Windows package installs, where the R code gets upgraded but the compiled code does not.

Let's add that content here, with rich section links, so we have a useful place to point to from elsewhere.

Chapter 8.4: Example for path expansion

Thanks for the great chapter 8.4 How to transfer your library when updating R, it helped me a lot to do a clean update! I stumbled upon one line in it, maybe you could consider clarifying it:

"You can also alternatively set R_LIBS_USER to a different path; but make sure to include the %v wildcard. e.g. ~/R/library/%v. The %v is automatically expanded to the major and minor version of R, so with R 3.5.1 this path becomes ~/Library/R/3.5/library."

While ~/Library/R/3.5/library is, of course, the default for R_LIBS_USER with R 3.5.1 (as explained in the chapter before), it sounds to me like "this path" refers to the given example,~/R/library/%v, which would expand to ~/R/library/3.5.

[WIP] `xcode-select --install` seems to not work anymore

Just going through a new R installation on a Mac, and the xcode-select function seems to no longer work as expected anymore.
Just putting this here for other people, and I'll report back.

output:

danielchen@Daniels-MBP ~ % xcode-select --install
xcode-select: note: install requested for command line developer tools

danielchen@Daniels-MBP ~ % sudo !!
sudo xcode-select --install
Password:
xcode-select: note: install requested for command line developer tools

Running MacOS 11.5: Big Sur
MacBook Pro (Retina, 13-inch, Late 2013)

Move `master` branch to `main`

Cc @jennybc

The master branch of this repository will soon be renamed to main, as part of a coordinated change across several GitHub organizations (including, but not limited to: tidyverse, r-lib, tidymodels, and sol-eng). We anticipate this will happen by the end of September 2021.

That will be preceded by a release of the usethis package, which will gain some functionality around detecting and adapting to a renamed default branch. There will also be a blog post at the time of this master --> main change.

The purpose of this issue is to:

  • Help us firm up the list of targetted repositories
  • Make sure all maintainers are aware of what's coming
  • Give us an issue to close when the job is done
  • Give us a place to put advice for collaborators re: how to adapt

message id: entire_lizard

Productive mental models

Re: the intersection of habits/psychology and technical progress.

Integrate and beef up these ideas from current slides into bookdown:

  • if it hurts, do it more often
  • let go of the idea that your setup will ever be "done"; maintenance and staying current is an ongoing activity, where the timescale and acceptable risk varies by person/project
  • find ways to make and lock in incremental progress, move from one successful position to the next; reduce size & direction of incremental change until this is possible

macOS header problem

On Mojave, one often needs to make some missing headers available:

https://silvae86.github.io/sysadmin/mac/osx/mojave/beta/libxml2/2018/07/05/fixing-missing-headers-for-homebrew-in-mac-osx-mojave/


Things are different on Catalina, for which the Mojave "fix" (the header-containing package) is not available. Summary courtesy of @kevinushey:

R 3.6.1-patched handles the issue by updating etc/Makeconf with:

CPPFLAGS = -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk -I/usr/local/include

I think once could also fix the issue by putting that in their own ~/.R/Makevars file.
So I think the "do I have command line tools ready and installed?" checklist is now:

  1. xcode-select --install (make sure command line tools are installed)
  2. /usr/bin/git --version (may trigger the Xcode license agreement)
  3. Update ~/.R/Makevars, or (on Mojave) install the header-providing package.

Also worth saying: if someone has Xcode installed, but not command line tools, then regular command line tools will still work, even though compilation may fail (since R expects to find and use command line tools explicitly, as opposed to the version that might be bundled with Xcode).
And Xcode bundles its own version of command line tools, and the system binaries are smart enough to find and use whichever one happens to be installed.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.