skgrange / saqgetr Goto Github PK
View Code? Open in Web Editor NEWImport Air Quality Monitoring Data in a Fast and Easy Way
License: GNU General Public License v3.0
Import Air Quality Monitoring Data in a Fast and Easy Way
License: GNU General Public License v3.0
Hi, your package is great for retrieving EEA data but I have encountered a mysterious bug: There seemes to be some sort of Java conflict when using the package. This took me a while to trace down, but now I have identified the place in the code that creates an error. With a simple call like this:
allsites <- `get_saq_sites()`
and then, later:
lons = seq(-39.5,39.5,1) lats = seq(45.5,64.5,1) bg = read_osm(bb(c(min(lons),min(lats),max(lons),max(lats))), type='osm')
I get a long error message:
**Error in .jcall("java/lang/Class", "Ljava/lang/Class;", "forName", cl, :
Unable to start conversion to UTF-16
Error in h(simpleError(msg, call)) :
error in evaluating the argument 'Class' in selecting a method for function 'new': class not found
Error in .tryJava() :
Java classes could not be loaded. Most likely because Java is not set up with your R installation.
Here are some trouble shooting tips:
If I leave out the call to get_saq_sites there I get no error message,
Earlier, I got a similar error message in other programs, and then managed to solve it ny the recipe given on this web page: [ https://www.geeksforgeeks.org/how-to-set-java-path-in-windows-and-linux/]
I have done the same now, but it seems that just a basic call to the saqgetr package leads to the Java error.
So there seem to be some conflicting java stuff in your package.
I really hope you find a fix for this as I'm a bit stuck
Hi Stuart,
first of all - thanks for an excellent package for extracting EEA data! Very efficient and useful. According to the manual there is a lag of about 1 month for the newest data and indeed the data_processes() and filtered_processes() show this, but when extracting data for the sites with get_saq_observations() I actually get the "almost updated" data (a lag of 3 days from now). Is there a kind of mismatch between these routines or are the data being updated just now, so that I happened to hit the period of monthly updating? According to data_processes() all data end at 2021-01-15 or 2021-01-16, but when extracting data by get_saq_observations() they end at 2021-02-15. (Today is 2021-02-18).
Thanks.
To the users of saqgetr, the observations since 2019 have been updated with validated data (called the E1a data flow) in the AQER nomenclature this month. However, there are some issues with the lack of validated data for some countries, notably the UK. I have reinserted the near-real-time observations (from the E2a data flow) for the UK which I think has resolved the missing data issue, but in-depth testing of other countries has not been done. If users encounter systematic missing data for a year across a number of monitoring sites in a country, please let me know and I will see what I can do. Many thanks!
Hi Stuart,
I am doing an air pollution research, want to collect the UK hourly air pollutant data as recent as possible. However, when using get_saq_observations() as the example shows, it only updates till 12-June,2020 .
I wonder can that data be updated till the end of June, or the early of July?
Yours Sincerely,
Ada
Hello,
It seems that there is no data available after the 2023-08-12.
require(saqgetr)
data_sites <- get_saq_sites()
data_sites$date_end %>% max(na.rm=T)
outputs :
[1] "2023-08-12 23:00:00 UTC"
Thanks,
Best wishes
Hi Stuart,
I'm trying to put up some RMarkdown document using this package. It works fine when run the code interactively in the global environment but when I try to knit it to html the get_saq_observations()
throws some error relating to connections.
I think this is due to the use of closeAllConnections()
in the read_saq_observations()
. See this related SO question: https://stackoverflow.com/a/11165899/4227151
Can we safely delete closeAllConnections()
? Otherwise not sure how hard it is to specify which connection to close.
Cheers,
Hao
Calling get_saq_sites on a Linux machine returns the following error:
Error in curl::new_handle() : An unknown option was passed in to libcurl
Replacing with a direct call of the table using the data.table function fread gets round it:
fread('http://aq-data.ricardo-aea.com/R_data/saqgetr/helper_tables/sites_table.csv.gz')
But any ideas what is causing the initial error?
I was downloading some data for 2012 and noticed there were 4 values for every hour. Each value is different.
dat <- get_saq_observations(site = 'gr0027a', start = '2012-07-01', end = '2012-07-15', variable = 'o3')
Also for this site, but for no2 two values returned. Expecting only one for each hour for both species.
dat_2 <- get_saq_observations(site = 'gb0002r', start = '2012-07-01', end = '2012-07-15', variable = 'no2')
Hi Dr. Grange,
By using the get_saq_observations() function, I could manage to download BC data from 31 BC monitoring sites (including those closed ones) between 2013 and 2017. However, as I was trying to download BC data between 2008 and 2012, 2018 to 2019; only part of 2018 could be downloaded. Please see the following code:
BC_Data <- get_saq_observations(
site = c("gb1044a","gb1028a","gb0048r","gb1067a","gb1097a","gb1055r","gb0620a",
"gb0682a","gb0886a","gb0567a","gb1023a","gb0723a","gb0934a","gb0580a",
"gb0960a","gb0851a","gb0105a","gb0146a","gb0991a","gb0839a","gb0931a",
"gb0641a","gb0036r","gb0706a","gb0613a","gb0995a","gb1059a","gb0182a",
"gb0658a","gb0234a","gb0135a"),
start = 2011,
end = 2012,
variable = "bc",
verbose = TRUE
) %>%
saq_clean_observations (summary = " hour " , valid_only = TRUE, spread = TRUE
) %>%
arrange(site)**
Is there anything I have done wrong?
Thank you so much in advance.
Sam
Hi Stuart,
two questions to the get_saq_observations():
I noticed some discrepancy between the data from saqgetr and the Dutch observation network (LML operated by RIVM). The LML data is dated 07/2023 so potentially this is simply a case of Airbase not being updated with the adjusted data from RIVM, but the data from Airbase says it has been validated.
How often is Airbase updated?
Are there multiple validation steps?
RIVM info on validation (not much detail) https://www.luchtmeetnet.nl/informatie/overige/validatie-data
Reprex below:
library(saqgetr)
library(dplyr)
library(lubridate)
library(threadr)
library(openair)
library(reshape2)
## import all netherlands sites
saq_sites_nl <- get_saq_sites() %>%
filter(grepl("nl0", site))
## get valid observations for 2022
saq_nl <- get_saq_observations(site = saq_sites_nl$site, variable = "pm2.5", valid_only = TRUE, tz = "UTC", start = "2022", end = "2022") %>%
select(date, site, saqgetr = value)
## import csv, doesn't like header so go for row above
lml_dat <- read.table("https://data.rivm.nl/data/luchtmeetnet/Vastgesteld-jaar/2022/2022_PM25.csv", skip = 9, sep = ';')
## use first row
names(lml_dat) <- lml_dat[1,]
## LML is in CET winter time, convert to UTC
lml_nl <- lml_dat[-1,] %>%
mutate(date = ymd_hm(` Begindatumtijd`, tz = "UTC")-3600)
lml_nl_down <- lml_nl[,-c(1,2,3,4,5)] %>%
melt('date') %>%
mutate(variable = gsub("NL01", "nl00", variable),
variable = gsub("NL10", "nl00", variable),
variable = gsub("NL49", "nl00", variable)) %>%
transmute(date, site = variable, lml = as.numeric(value))
## left join with saq first as it has fewer dates with data
saq_lml_nl <- left_join(saq_nl, lml_nl_down, by = c('date', 'site'))
Summarising the two datasets for each site
## summary stats
statz <- aqStats(saq_lml_nl, c('saqgetr', 'lml') ,type = "site")
## calculate daily means and number of days above 15
saq_lml_24h_exceed <- saq_lml_nl %>%
timeAverage("day", type = "site") %>%
group_by(site) %>%
summarise(saq_gt_15 = sum(saqgetr >= 15, na.rm = TRUE),
lml_gt_15 = sum(lml >= 15, na.rm = TRUE)) %>%
left_join(saq_sites_nl, by = "site") %>% ## get site info
select(site, site_type, site_area, saq_gt_15, lml_gt_15) %>%
arrange(site_type, site_area) ## arrange by site type then site area
An example below for the site Vredepeel-Vredeweg NL00131 which is 1ug/m3 higher than lml from 01/01/2022 to 24/11/2022 16:00 then it is the same.
## Example of one site
## import background site Vredepeel-Vredeweg
saq_nl00131 <- get_saq_observations(site = "nl00131", variable = "pm2.5", valid_only = TRUE, tz = "UTC", start = "2022", end = "2022") %>%
select(date, saqgetr = value)
## import csv, doesn't like header so go for row above
lml_dat <- read.table("https://data.rivm.nl/data/luchtmeetnet/Vastgesteld-jaar/2022/2022_PM25.csv", skip = 9, sep = ';')
## use first row
names(lml_dat) <- lml_dat[1,]
## convert to UTC
lml_nl00131 <- lml_dat[-1,] %>%
transmute(date = ymd_hm(` Begindatumtijd`, tz = "UTC")-3600,
lml = as.numeric(NL10131))
## join them together
nl00131 <- left_join(saq_nl00131, lml_nl00131, by = 'date')
## plot full time series
threadr::time_dygraph(nl00131, c('saqgetr', 'lml'))
## plot summary
openair::timeVariation(nl00131, c('saqgetr', 'lml'))
Hello,
I'm running into the error:
Error in open.connection(con, "rb") : HTTP error 403.
when trying to import the sites information with get_saq_sites().
This error occurs already since two weeks.
Maybe you can help me on this issue?
Thank you!
Greetings,
Lea
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.