Giter Club home page Giter Club logo

rmweather's Introduction

rmweather

Lifecycle Status CRAN status CRAN log

Introduction

rmweather is an R package to conduct meteorological/weather normalisation on air quality so trends and interventions can be investigated in a robust way. For those who are aware of my previous research, rmweather is the "Mk.II" package of normalweatherr. rmweather does less than normalweatherr, but it is faster and easier to use.

Installation

rmweather is available from CRAN and can be installed in the normal way:

# Install rmweather from CRAN
install.packages("rmweather")

Development version

To install the development version of rmweather, the remotes package will need to be installed first. Then:

# Load helper package
library(remotes)

# Install rmweather
install_github("skgrange/rmweather")

Example usage

rmweather contains example data from London which can be used to show the meteorological normalisation procedure. The example data are daily means of NO2 and NOx observations at London Marylebone Road. The accompanying surface meteorological data are from London Heathrow, a major airport located 23 km west of Central London.

Most of rmweather's functions begin with rmw_ so are easy to track and find help for. In this example, we have used dplyr and the pipe (%>% and pronounced as "then") for clarity. The example takes about 30 seconds on my (laptop) system and the model has an R2 value of 76 %.

# Load packages
library(dplyr)
library(rmweather)
library(ranger)

# Have a look at rmweather's example data, from london
head(data_london)

# Prepare data for modelling
# Only use data with valid wind speeds, no2 will become the dependent variable
data_london_prepared <- data_london %>% 
  filter(variable == "no2",
         !is.na(ws)) %>% 
  rmw_prepare_data(na.rm = TRUE)

# Grow/train a random forest model and then create a meteorological normalised trend 
list_normalised <- rmw_do_all(
  data_london_prepared,
  variables = c(
    "date_unix", "day_julian", "weekday", "air_temp", "rh", "wd", "ws",
    "atmospheric_pressure"
  ),
  n_trees = 300,
  n_samples = 300,
  verbose = TRUE
)

# What units are in the list? 
names(list_normalised)

# Check model object's performance
rmw_model_statistics(list_normalised$model)

# Plot variable importances
list_normalised$model %>% 
  rmw_model_importance() %>% 
  rmw_plot_importance()

# Check if model has suffered from overfitting
rmw_predict_the_test_set(
  model = list_normalised$model,
  df = list_normalised$observations
) %>% 
  rmw_plot_test_prediction()

# How long did the process take? 
list_normalised$elapsed_times

# Plot normalised trend
rmw_plot_normalised(list_normalised$normalised)

# Investigate partial dependencies, if variable is NA, predict all
data_pd <- rmw_partial_dependencies(
  model = list_normalised$model, 
  df = list_normalised$observations,
  variable = NA
)

# Plot partial dependencies
data_pd %>% 
  filter(variable != "date_unix") %>% 
  rmw_plot_partial_dependencies()

The meteorologically normalised trend produced is below.

Examples and citations

For usage examples see:

Grange, S. K., Carslaw, D. C., Lewis, A. C., Boleti, E., and Hueglin, C. (2018). Random forest meteorological normalisation models for Swiss PM10 trend analysis. Atmospheric Chemistry and Physics 18.9, pp. 6223--6239.

Grange, S. K. and Carslaw, D. C. (2019). Using meteorological normalisation to detect interventions in air quality time series. Science of The Total Environment 653, pp. 578--588.

The use of rmweather for prediction or counterfactual/business as usual scenarios

A second usage of rmweather became established in 2020 to help researchers quantify the effects of the COVID-19 related restrictions on air quality. Briefly, the approach involves the training of random forest models to explain pollutant concentrations based on meteorological and time variables for a training period, say, between 2018 and 2019. After the training period, the model is used in predictive-mode using the experienced meteorological conditions. The predicted time series can be thought of as a counterfactual or business-as-usual (BAU) scenario which the observed time series can be compared with. Critically, an approach like this accounts for the meteorological conditions observed in 2020, which in many locations was unusual and complicates simple analyses. The meteorological sampling and normalisation step is not required for this analysis, but this has been confused in the literature.

Examples of counterfactural modelling

Grange, S. K., Lee, J. D., Drysdale, W. S., Lewis, A. C., Hueglin, C., Emmenegger, L., and Carslaw, D. C. (2021). COVID-19 lockdowns highlight a risk of increasing ozone pollution in European urban areas. Atmospheric Chemistry and Physics 21.5, pp. 4169--4185.

Wang, Y., Wen, Y., Wang, Y., Zhang, S., Zhang, K. M., Zheng, H., Xing, J., Wu, Y., and Hao, J. (2020). Four-Month Changes in Air Quality during and after the COVID-19 Lockdown in Six Megacities in China. Environmental Science and Technology Letters 7.11, pp. 802--808.

Fenech, S., Aquilina, N. J., Ryan, V. (2021) COVID-19-Related Changes in NO2 and O3 Concentrations and Associated Health Effects in Malta. Frontiers in Sustainable Cities 3.631280, pp. 1--12.

Shi, Z., Song, C., Liu, B., Lu, G., Xu, J., Van Vu, T., Elliott, R. J. R., Li, W., Bloss, W. J., and Harrison, R. M. (2021). Abrupt but smaller than expected changes in surface air quality attributable to COVID-19 lockdowns. Science Advances 7.3, eabd6696.

See also

rmweather's People

Contributors

skgrange avatar

Stargazers

YuH avatar Lucas.hood avatar  avatar Yucheng Wang avatar Liam Swanepoel avatar Jinghao Hu avatar  avatar Congbo Song avatar Xiang Liu avatar Temuulen Enebish avatar  avatar  avatar  avatar Christiaan Pauw avatar  avatar water@nankai avatar  avatar  avatar Anatolii Tsyplenkov avatar Jim McQuaid avatar Liang avatar Jin Li avatar  avatar  avatar Matej Vukovic avatar AlongTraj avatar Tobias Augspurger avatar Pankaj Kumar avatar Zhenping Yin avatar  avatar Huo Ruiqing avatar  avatar  avatar  avatar  avatar MattZou avatar  avatar Zhenyang Yuan avatar  avatar Trần Thị Hồng Hiền avatar  avatar  avatar  avatar Seoncheol Park avatar  avatar

Watchers

schonhose avatar James Cloos avatar  avatar Trần Thị Hồng Hiền avatar  avatar

rmweather's Issues

Error message regarding ranger when using the normalise code

Hi there,

Apologies if this is a stupid question.

I am attempting to normalise a data set including hourly o3 values and relevant meteorological variables in 2016.
When using the following code:

normalised_o3_bl_2016 <- rmw_do_all(
df = o3_bl_2016_prepared,
variables = c(
"air_temp", "dew", "atm_pressure", "wd", "ws", "date_unix", "day_julian", "weekday", "hour"
),
n_trees = 300,
n_samples = 300
)

I get this error message:
Error in ranger::ranger(value ~ ., data = df, write.forest = TRUE, importance = "permutation", :
Error: Unsupported type of dependent variable.

All my variables are numerical and when rerunning the code with some variables removed, in an attempt to locate the variable causing the issue, it results in the same error message.
My csv file is of the same layout as the example data file.

Thanks for any help you can provide.

Minimum number of samples required?

Hi,

My data has 110 samples and 9 variables. I used this package to build a prediction model. It is clear and helpful for beginners like me. However, when I used the function: "rmw_calculate_model_errors" to get prediction performance metrics, most of the values looked a little weird (attached below).
mean_bias 33.6, mean_gross_error 559., normalized_mean_bias 0.0162, normalized_mean_gross_error 0.270, root_mean_squared_error 685., normalized_root_mean_squared_error 0.330, r 0.2 48, r_squared 0.0613, p 0.00909, coefficient_of_efficiency 0.0115. I was wondering if is this cause the dataset is not enough to get a good model. Or do you think the values are relatively okay for a model?

Besides the above, I also would like to ask for your help with the metric: FAC2 (fraction of predictions with a factor of two), maybe a bit stupid, I could not find this metric in the model by myself.

Thanks in advance.

Tianci

Add NA catch for detect cores

An example of the error:

Error in if (x < 16) { : missing value where TRUE/FALSE needed
> parallel::detectCores()
[1] NA

I have not seen this, but it has been reported to me and can easily handled.

error by executing example script

taken from the exaple usage of rmweather

data_london %>%
 filter(!is.na(ws)) %>%
rename(value = no2) %>%
 rmw_prepare_data(na.rm =TRUE)
Error: Can't rename columns that don't exist.
x Column `no2` doesn't exist.
Run `rlang::last_error()` to see where the error occurred.
> rlang::last_error()
Error: Internal error: Trace data is not square.

what Is the supposed scope of that "rename" which Is apparetly giving rise to an error? Just to myself, I'm wondering?
In the original dataset does not seem to be present a column "no2" that has to be renamed.

thanks.

Tibble incompatibility

I tried this amazing-looking package, unfortunately I get the following error:

data_frame() is deprecated as of tibble 1.1.0.
Please use tibble() instead.

Using R-4.0,

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.