josiahparry / sfdep Goto Github PK

View Code? Open in Web Editor NEW

121.0 6.0 6.0 21.08 MB

A tidy interface for spatial dependence

Home Page: https://sfdep.josiahparry.com/

License: GNU General Public License v3.0

R 98.97% SCSS 0.24% C++ 0.79%

r-spatial rstats spatial

sfdep's People

Contributors

Stargazers

Watchers

Forkers

rsbivand nowosad opelolo jlacko nlebovits dhlocke

sfdep's Issues

significance on local moran

Hi Josiah,

Great work on this package! Very keen to use and collaborate here!

Just a quick suggestion on the local_moran example. I would suggest not plotting non-significant results (as highlighted in bold, although it is not very elegant!

lisa %>%
tidyr::unnest(moran) %>%
mutate (mean = case_when(p_ii>.05 | is.nan(p_ii) ~ NA_character_,
T ~ as.character(mean))) %>%
ggplot(aes(fill = mean)) +
geom_sf() +
geom_sf(lwd = 0.2, color = "black") +
theme_void() +
scale_fill_manual(values = c("#1C4769", "#24975E", "#EACA97", "#B20016"))

complete_spacetime_cube breaks grouped data

Hi,

Right now after I run complete_spacetime_cube I get an error whenever I try to call the completed spacetime cube saying that:

.data must be a valid <grouped_df> object.
Caused by error in validate_grouped_df():
! The groups attribute must be a data frame.

Thanks,
Rob

rename `st_nb_band()` -> `st_dist_band()`

st_nb_band is confusing. I think st_dist_band would be clearer. This would be inline with Pysal and Esri
https://twitter.com/DextraordinaryH/status/1508854869804920850?s=20&t=ZEcsdE81wkCLLoO7B08ivQ

Check for matching observations and times in spacetime cube test

sfdep/R/spacetime-cube-test.R

Line 80 in aa31b82

# check that each loc <> time is only 1 observation

Make st_as_edges() work without `wt` object

https://twitter.com/DextraordinaryH/status/1508847260574044173?s=20&t=ZEcsdE81wkCLLoO7B08ivQ

Estimation of local spatial autocorrelation statistics (local Moran's Ii and Getis-Ord's Gi/Gi*) using sfdep and tidyverse

Hi everyone,

I am currently trying to estimate the local Moran's Ii and the Getis-Ord Gi/Gi* statistic to visualize clusters of voters during the 2019 Swiss national election and compare two political parties at the municipality level.

Unfortunately, those two parties did not participate in all 2212 Swiss municipalities during the 2019 election meaning there are around 120 NAs (different amount for each party), which cause problems when estimating the aforementioned statistics.

Using sfdep and tidyverse, my code, without addressing the missing data problem, looks like this:

df_municip %>% mutate(nb = st_contiguity(geometry), wt = st_weights(wt), local_moran = sfdep::local_moran(party1, nb, wt, nsim=499), local_Gi = sfdep::local_g_perm(party1, nb, wt, nsim=499)) %>% unnest(c(local_moran, local_Gi))

To circumvent the missing data problem, I tried setting na.action = na.exclude or na.action = na.pass. The former did not work, as dpylr's mutate() cannot add columns with fewer rows than the original dataframe (in my case: df_municip) to the original dataframe, while the latter did not work (optimally), as it only estimated two out of the eight values (in my case: ii and p_ii_sim).

Importantly, both of these na.action options only work for the local Moran statistic, as sfdep::local_g_perm(), which relies on spdep's localG_perm(), does not accept the argument na.action.

Does anyone have any suggestions?

P.S. I am sorry if this post is not formatted optimally, it is my first post on Github.

feature request: getting nb object and wt vector from an sfnetworks object

as discussed on twitter and implemented in this gist, sfdep could enjoy a direct access to sfnetworks objects, as they have an easy approach to neighbors and weights.

Add categorize LISA function for labeling lisa clusters

This was previously implemented in sfweight and would be a good utility to make avaialble to people

function(x, x_lag, scale = TRUE) {

  cats <- character(length(x))

  if (scale) {
    x <- scale(x)
    x_lag <- scale(x_lag)

    cats[x > 0 & x_lag > 0] <- "HH"
    cats[x > 0 & x_lag < 0] <- "HL"
    cats[x < 0 & x_lag < 0] <- "LL"
    cats[x < 0 & x_lag > 0] <- "LH"

  }

  cats[x > mean(x) & x_lag > mean(x_lag)] <- "HH"
  cats[x > mean(x) & x_lag < mean(x_lag)] <- "HL"
  cats[x < mean(x) & x_lag < mean(x_lag)] <- "LL"
  cats[x < mean(x) & x_lag > mean(x_lag)] <- "LH"
  cats[cats == ""] <- NA
  cats
}

Problem with weighted center mean

sfdep/R/point-pattern-centers.R

Line 38 in 7e5cddb

res <- colSums(coords * weights) / n

I've been testing this functions and considering this formula: https://desktop.arcgis.com/es/arcmap/10.6/tools/spatial-statistics-toolbox/GUID-6844F6F5-B0F9-4B4A-B0EA-04AC6AA8C1BC-web.png

When you put a vetcor of weights this line should be like this:

res <- colSums(coords * weights)/sum(weights, na.rm = TRUE)

Space time representation s3 class

There is an older implementation of this from edzer pebesma

https://cran.r-project.org/web/packages/spacetime/vignettes/jss816.pdf

I think a long representation of time and space is best with ancillary geometry table.

Implementation consideration:

Attribute of all used geometry as an sf object with a column for geometry identifier and the respective geometry. The geometry attribute should contain neighbors and weights (potentially stored as a listed object for spdep compatibility)

The main table contains a column for the geometry identifier, the time period, and any other values that would be desired.

All time periods must have have all regions.

If there are 2 time periods with 10 regions we would have 2*10 = 20 observation in our table.

Users can then create any lisas they'd like and then summarize with Mann Kendall to identify changing temporal patterns.

Error in stopifnot() with local moran

Hi Josiah,

I'm a spdep user who just discovered this useful looking package so thought I'd gave it a quick go. While looking to apply a simple LISA assessment to some of my data I encountered an error with local_moran() that is reproducible with the Guerry data.

guerry <- sfdep::guerry %>% 
  st_set_crs(27572) %>% 
  select(code_dept, crime_pers, crime_prop)

guerry_nb <- guerry %>% 
  mutate(nb = st_contiguity(geometry),
         wt = st_weights(nb))

guerry_lisa <- guerry_nb %>% 
  mutate(local_moran_crime_pers = local_moran(crime_pers, nb, wt, nsim = 199), # error occurs here
         local_moran_crime_prop = local_moran(crime_prop, nb, wt, nsim = 199)) # tried with another column as well, just in case

It returns this error for both Guerry and my own data:

Error in `stopifnot()`:
! Problem while computing `local_moran_crime_pers = local_moran(crime_pers, nb, wt, nsim = 199)`.
Caused by error in `names(object) <- nm`:
! 'names' attribute [9] must be the same length as the vector [8]

In case there was any mix-up with rgeoda package, I also tried using sfdep::local_moran(), to no avail.

xts imputation method

Create CLQ

Create error for local G* if self included is missing

When the self isnt included it is added and weights are recalculated. It makes the assumption of queen weights. This isn't fair.
Instead, it should error out and inform the user that self.included is missing and that it should be added with included_self(nb) and then calculate weights

sfdep/R/localG-star.R

Line 25 in 91b4e19

if (is.null(attr(nb, "self.included"))) {

inconsistent results between emerging_hotspot_analysis() and mannually calculating hotspots step-by-step

Kia ora Josiah

Thanks for your work on sfdep:: I was happy to see an sf-friendly package.

However, I've discovered that there are some inconsistencies that I can't explain between what I get if I use the emerging_hotspot_analysis() and try to do the same thing step by step.

I note that the "include_gi = TRUE," argument doesn't seem to work. So I wasn't able to really dig into where/how the inconsistencies came in...

The following example uses the guerry dataset and shows that I get a different picture depending on whether I use emerging_hotspot_analysis() (top map) vs doing it step-by-step (lower map).

Can you please resolve the issue with the argument include_gi?

Can you also please clarify why my results are so different? I'm looking to use the function in some published indicators I'm working on - but need to have confidence that they match what I get through step-by-step.

Thanks heaps,

John

library(sf)
library(sfdep)
library(simplevis)

# replicate the guerry dataset 10 times
x <- purrr::map_dfr(1:10, ~guerry)  %>%  
  dplyr::select(code_dept, crime_pers) %>% 
  # create an indicator for time period
  mutate(time_period = sort(rep(1:10, 85)), 
         # add some noise 
         crime_pers = crime_pers * runif(850, max = 2))

x

# check what this looks like for one time period
x %>% 
  filter(time_period == 1) %>% 
  st_set_crs(27572) %>% 
  simplevis::gg_sf_col(col_var = crime_pers)

# test using emerging_hotspot_analysis() which first requires creating a spacetime object...
spt <- as_spacetime(x, "code_dept", "time_period")

ehsa_guerry <-
  emerging_hotspot_analysis(spt,
                            "crime_pers",
                            include_gi = TRUE, # note the include_gi argument doesn't seem to work.
                            threshold = 0.05)  %>% 
  mutate(classification = factor(
    classification,
    levels = c(
      "consecutive hotspot",
      "sporadic hotspot",
      "new hotspot",
      "new coldspot",
      "sporadic coldspot",
      "consecutive coldspot",
      "persistent coldspot",
      "no pattern detected"
      ))) 

rcartocolor::carto_pal(n = 9, name = "Prism") %>% 
  scales::show_col()

p <- c(rev(c("#CC503E", "#E17C05", "#EDAD08", "#73AF48", "#0F8554", "#38A6A5", "#1D6996")),"grey")

guerry %>% 
  left_join(ehsa_guerry, by = c("code_dept" = "location")) %>% 
  st_set_crs(27572) %>% 
  simplevis::gg_sf_col(col_var = classification,
                       pal = p,
                       title = "Emerging hotspots - using sfdep::emerging_hotspot_analysis()")

# compare what the output from emerging_hotspot_analysis() looks like relative to doing it step-by-step----
# identify neighbours and assign weights----
x_ <- x %>%
  mutate(nb = include_self(st_contiguity(geometry)),
         wt = st_weights(nb)) %>% 
  split(list(.$time_period))

# calculate local_gstar----
guerry_gi_star <- x_ %>% 
  map(. %>% mutate(nb = include_self(st_contiguity(geometry)),
                   wt = st_weights(nb)) %>% transmute(gi_star = local_gstar(crime_pers, nb, wt))) %>% 
  map(. %>% tidyr::unnest(gi_star)) %>% 
  map2(names(.), ~ mutate(.x, time_period = .y)) %>% 
  map(. %>% mutate(code_dept = x_[[1]]$code_dept)) %>% 
  map(. %>% relocate(code_dept, everything())) %>% 
  bind_rows()

# map it for one time period----
guerry_gi_star %>% 
  st_set_crs(27572) %>% 
  filter(time_period == "1") %>% 
  simplevis::gg_sf_col(col_var = gi_star)
  
# and now calculate the trends----
guerry_gi_star_l <- guerry_gi_star %>% 
  st_drop_geometry() %>% 
  split(list(.$code_dept))

mk <- guerry_gi_star_l %>%
  furrr::future_map( ~ .x %>%
                       pull(gi_star) %>%
                       mmkh() %>% as.data.frame() %>% rownames_to_column(var = "output") %>% pivot_wider(names_from = "output",
                                                                                                         values_from = ".")
  )

mk_ <- mk %>%
  bind_rows(.id = "code_dept") %>%
  dplyr::select(-c(`Original Z`, `old P.value`, old.variance)) %>%
  janitor::clean_names() %>% 
  mutate(code_dept = as.factor(code_dept))

mk_

guerry_gi_star_mk <- guerry_gi_star %>%
  left_join(mk_, by = c("code_dept"))

# categories adapted from https://pro.arcgis.com/en/pro-app/latest/tool-reference/space-time-pattern-mining/learnmoreemerging.htm----
guerry_gi_star_mk_cat <- guerry_gi_star_mk %>%
  mutate(
    hotspot = case_when(
      gi_star <= -2.58 ~ "Cold spot 99%",
      gi_star > -2.58 & gi_star <= -1.645 ~ "Cold spot 90%",
      gi_star > -1.645 & gi_star <= -0.9542 ~ "Cold spot 66%",
      gi_star >= 0.9542 & gi_star < 1.645 ~ "Hot spot 66%",
      gi_star >= 1.645 & gi_star < 2.58 ~ "Hot spot 90%",
      gi_star >= 2.58 ~ "Hot spot 99%",
      TRUE ~ "Not Significant"
    )
  ) %>% # adapted from https://crd230.github.io/lab7.html#Local_spatial_autocorrelation and adapted to align this with IPCC thresholds
  mutate(hotspot = factor(
    hotspot,
    levels = c(
      "Cold spot 99%",
      "Cold spot 90%",
      "Cold spot 66%",
      "Not Significant",
      "Hot spot 66%",
      "Hot spot 90%",
      "Hot spot 99%"
    )
  )) %>%
  group_by(code_dept) %>%
  arrange(code_dept, time_period) %>%
  mutate(
    hot = case_when(hotspot %in% c(# "Hot spot 66%", # remove to aligned with sfdep::emerging_hotspot_analysis() which seems to use a default threshold of 0.01.
      "Hot spot 90%",
      "Hot spot 99%") ~ 1,
      TRUE ~ 0),
    cold = case_when(hotspot %in% c(# "Cold spot 66%", # remove to aligned with sfdep::emerging_hotspot_analysis() which seems to use a default threshold of 0.01.
      "Cold spot 90%",
      "Cold spot 99%") ~ 1,
      TRUE ~ 0),
    prev_hot = lag(hot),
    prev_cold = lag(cold),
    n_hot = sum(hot),
    p_hot = n_hot / length(hot),
    n_cold = sum(cold),
    p_cold = n_cold / length(cold),
    next_h = case_when(hot != lag(hot) ~ 1),
    n_changes_hot = sum(next_h, na.rm = TRUE),
    next_c = case_when(cold != lag(cold) ~ 1),
    n_changes_cold = sum(next_c, na.rm = TRUE),
    cat = case_when(
      last(hot) == 1 &
        n_hot == 1 ~ "new hotspot",
      # A location that is a statistically significant hot spot for the final time step and has never been a statistically significant hot spot before.
      last(hot) == 1 &
        n_hot == 2 &
      p_hot < 0.9 ~ "consecutive hotspot",
      # A location with a single uninterrupted run of at least two statistically significant hot spot bins in the final time-step intervals. The location has never been a statistically significant hot spot prior to the final hot spot run and less than 90 percent of all bins are statistically significant hot spots.
      last(hot) == 1 &
        n_changes_hot >= 3 &
        p_hot < 0.9 &
        n_cold == 0 ~ "sporadic hotspot",
      # A statistically significant hot spot for the final time-step interval with a history of also being an on-again and off-again hot spot (i.e. there are not more than two consecutive hotspots). Less than 90 percent of the time-step intervals have been statistically significant hot spots and none of the time-step intervals have been statistically significant cold spots.
      p_hot >= 0.9 &
        last(hot) == 1 &
        tau > 0 &
        new_p_value <= 0.333 ~ "intensifying hotspot",
      # A location that has been a statistically significant hot spot for 90 percent of the time-step intervals, including the final time step. In addition, the intensity of clustering of high counts in each time step is increasing overall and that increase is statistically significant.
      p_hot >= 0.9 &
        new_p_value > 0.333 ~ "persistent hotspot",
      # A location that has been a statistically significant hot spot for 90 percent of the time-step intervals with no discernible trend in the intensity of clustering over time.
      p_hot >= 0.9 &
        new_p_value <= 0.333 &
        tau < 0 &
        last(hot == 1) ~ "diminishing hotspot",
      # A location that has been a statistically significant hot spot for 90 percent of the time-step intervals, including the final time step. In addition, the intensity of clustering in each time step is decreasing overall and that decrease is statistically significant.
      last(hot == 0) & p_hot >= 0.9 ~ "historic hotspot",
      # The most recent time period is not hot, but at least 90 percent of the time-step intervals have been statistically significant hot spots.
      last(cold) == 1 &
        n_cold == 1 ~ "new coldspot",
      # A location that is a statistically significant cold spot for the final time step and has never been a statistically significant cold spot before.
      last(hot) == 1 &
        n_hot == 2 &
      p_cold < 0.9 ~ "consecutive coldspot",
      # A location with a single uninterrupted run of at least two statistically significant cold spot bins in the final time-step intervals. The location has never been a statistically significant cold spot prior to the final cold spot run and less than 90 percent of all bins are statistically significant cold spots.
      last(cold) == 1 &
        n_changes_cold >= 3 &
        p_cold < 0.9 &
        n_hot == 0 ~ "sporadic coldspot",
      # A statistically significant cold spot for the final time-step interval with a history of also being an on-again and off-again cold spot (i.e. there are not more than two consecutive coldspots). Less than 90 percent of the time-step intervals have been statistically significant cold spots and none of the time-step intervals have been statistically significant hot spots.
      p_cold >= 0.9 &
        last(cold) == 1 &
        tau < 0 &
        new_p_value <= 0.333 ~ "intensifying coldspot",
      # A location that has been a statistically significant cold spot for 90 percent of the time-step intervals, including the final time step. In addition, the intensity of clustering of low counts in each time step is increasing overall and that increase is statistically significant.
      p_cold >= 0.9 &
        new_p_value > 0.333 ~ "persistent coldspot",
      # A location that has been a statistically significant cold spot for 90 percent of the time-step intervals with no discernible trend in the intensity of clustering over time.
      p_cold >= 0.9 &
        new_p_value <= 0.333 &
        tau < 0 &
        last(cold == 1) ~ "diminishing coldspot",
      # A location that has been a statistically significant cold spot for 90 percent of the time-step intervals, including the final time step. In addition, the intensity of clustering in each time step is decreasing overall and that decrease is statistically significant.
      last(cold == 0) & p_cold >= 0.9 ~ "historic coldspot"
    ),
    # The most recent time period is not cold, but at least 90 percent of the time-step intervals have been statistically significant cold spots.
  ) %>%
  mutate(cat = replace_na(cat, "no pattern detected")) %>%
  ungroup() %>%
  mutate(cat = factor(
    cat,
    levels = c(
      "consecutive hotspot",
      "sporadic hotspot",
      "new hotspot",
      "new coldspot",
      "sporadic coldspot",
      "consecutive coldspot",
      "persistent coldspot",
      "no pattern detected"
    )
  )) 

# map it----
guerry_gi_star_mk_cat %>% 
  group_by(code_dept) %>% 
  slice(1) %>% 
  st_set_crs(27572) %>% 
  simplevis::gg_sf_col(col_var = cat,
                             pal = p,
                             title = "Emerging hotspots - manually using sfdep::local_gstar, mann_kendall, etc.")

# mmm... so there appear to be lots of differences between the two methods... Though it is unclear how emerging_hotspot_analysis() categorises the data so a more robust comparision is to look at the tau and p-values (noting that the include_gi = TRUE argument doesn't seem to be working so we can't compare gi_star values)

mk <- guerry_gi_star_l %>%
  furrr::future_map( ~ .x %>%
                       values_from = ".")
                       mutate(mk = mann_kendall(gi_star, alternative = "greater")))

mk_ <- mk %>%
  bind_rows(.id = "code_dept") %>%
  janitor::clean_names() %>% 
  mutate(code_dept = as.factor(code_dept))

mk_ %>% 
  group_by(code_dept) %>% 
  slice(1) %>% 
  unnest() %>% 
  dplyr::select(code_dept, gi_star, p_value, tau) %>% 
  left_join(ehsa_guerry, by = c("code_dept" = "location")) %>% # note I'm pretty sure that emerging_hotspot_analyais() calls MannKendall() which returns a two-sided p-value.
  ungroup() %>% 
  summarise(sbs_n_p_less_05 = sum(p_value.x < 0.05),
            ehsa_n_p_less_05 = sum((p_value.y/2) < 0.05), # I'm dividing by 2 to turn a two-sided p-value into a one-sided p-value
            dif_tau = sum(tau.x != tau.y),
            n_sbs_positive_tau = sum(tau.x > 0),
            n_sbs_negative_tau = sum(tau.x < 0),
            n_ehsa_positive_tau = sum(tau.y > 0),
            n_ehsa_negative_tau = sum(tau.y < 0),
            n_diff_pos_neg = sum(tau.x > 0 & tau.y < 0 | tau.x < 0 & tau.y > 0 ),
            n_dif_ehsa_sig = sum((p_value.y/2) < 0.05 & (tau.x > 0 & tau.y < 0 | tau.x < 0 & tau.y > 0 )))

# note it is unclear whether the ehsa_guerry p_value relates to the gi_star or mannKendall result... I suspect is relates to the gi_star...

# So there are 16 code_dept(s) that ehsa finds have a significant p-value (< 0.05) but that have different tau values. This seems to suggest that over half the estimated slopes might be wrong??? That said it is apparent from below that there are only five polygons/code_dept that have a statistically significant (p-value <= 0.1) trend and of these five none had more than 0.5 of the timeseries that were hotspots (they needed to have at least 0.9 of the timesteps as hotspots). Hence none of them were intensifying or diminishing.

guerry_gi_star_mk_cat %>% 
  filter(new_p_value <= 0.05) %>% 
  # distinct(code_dept)
  View()

# ok so there are only five polygons/code_dept that have a statistically significant (p-value <= 0.1) trend and of these five none had more than 0.5 of the timeseries that were hotspots. Hence none of them were intensifying or diminishing.

Error with spacetime due to "." at start of of data labels

I am using package version 0.2.4.

I am trying to create a simple spacetime object but I am getting an error from cli that says:

Error in `fun(..., .envir = .envir)`:
! Invalid cli literal: `{.data_l...}` starts with a dot.
i Interpreted literals must not start with a dot in cli >= 3.4.0.
i `{}` expressions starting with a dot are now only used for cli styles.
i To avoid this error, put a space character after the starting `{` or use parentheses: `{(.data_l...)}`.
---
Backtrace:
 1. sfdep::spacetime(D4_dat_hspt, geo_f, "TARGET_F_1", "Year")
 2. sfdep::new_spacetime(.data, .geometry, .loc_col, .time_col, active = active)
 3. sfdep::validate_spacetime(.data, .geometry, .loc_col, .time_col)
 4. cli::cli_abort(c("Differing class types for {.var .loc_col}.", ...
 5. cli:::vcapply(message, format_inline, .envir = .envir)
 6. base::vapply(X, FUN, FUN.VALUE = character(1), ..., USE.NAMES = USE.NAMES)
 7. local FUN(X[[i]], ...)
 8. cli::cli_fmt(fun(..., .envir = .envir), collapse = collapse, strip_newline = TRUE)
 9. cli:::cli__rec(expr)
10. local fun(..., .envir = .envir)
11. cli:::cli__message("inline_text", list(text = glue_cmd(..., .envir = .envir, ...
12. "id" %in% names(args)
13. cli:::glue_cmd(..., .envir = .envir, .call = sys.call(), .trim = FALSE)
14. cli:::glue(str, .envir = .envir, .transformer = transformer, .cli = TRUE, ...
15. (function (expr) ...
16. .transformer(expr, .envir) %||% character()
17. local .transformer(expr, .envir)
18. cli:::glue(text, .envir = envir, .transformer = sys.function(), .cli = TRUE)
19. (function (expr) ...
20. .transformer(expr, .envir) %||% character()
21. local .transformer(expr, .envir)
22. cli:::throw(cli_error(call. = caller, "Invalid cli literal: {.code {{{abbrev(code, 10)}}}} starts with a dot.", ...

Here is the code I ran:

#get the data for NDVI
D4_Dat_nz <- read.csv("../01Data/AvgNDVI_gridDat_wZones.csv")
D4_dat_hspt <- D4_Dat_nz[,c("TARGET_F_1", "Year", "AvgNDVI", "Date")]
names(D4_dat_hspt)[names(D4_dat_hspt) == "AvgNDVI"] <- "value" 

#get the locations
GridLocs <- st_read("../01Data/Lvl1_grids/SegmentCanopies_FullWest_wZone_wDispl.shp")
geo <- GridLocs[,-(1:3)]
geo <- geo[,-(2:10)]

#make sure the two line up
pxls <- unique(D4_dat_hspt$TARGET_F_1)
pxls2 <- unique(geo$TARGET_F_1)

geo_f <- geo[geo$TARGET_F_1 %in% pxls,]
geo_f <- st_zm(geo_f)
D4_dat_hspt <- D4_dat_hspt[D4_dat_hspt$TARGET_F_1 %in% pxls2,]

#create your spacetime object
stObj <- spacetime( D4_dat_hspt, geo_f, "TARGET_F_1", "Year")

guerry missing crs

according to spatialanalysis/geodaData#5 the CRS is 27572.

Will want CRS set to 4326 eventually, though.

st_set_crs(guerry, 27572)

Consider better fx prefixing

This package has used the st_ prefix for most function (spatial type) per SO.

However, I think that better prefixing can be used.

all functions that create or modify neighbor lists should be prefixed nb_ which has started to become a pattern.
All functions that create or modify a weights list prefixed with wt_
sfnetwork related functions prefixed with sfn_ as opposed to st_as_graph(), st_as_node(), st_as_edge()
- though the latter might actually be informative. Or perhaps sf_as_graph(), sf_as_node(), and sf_as_edge() to indicate type
- additionally, new functions such as node_get_nbs(), node_get_edge_list(), and node_get_edge_col() do not have the prefix, but I think the naming is satisfactory as node pertains to a network.
global measures are prefixed with global_
local measures (except losh and nb match test) are prefixed local_

spacetime print method

attribute distance band

create distance band neighbors based on non-geographic distances. Create distances with distance matrix. Use dbscan() and dist(). This is implemented internally in the neighbor match test

Negative and larger than 1 p-values with local_gstar_perm

I computed the local Gi* star for a sf multipolygon object with 1 feature. To my surprise, I was seeing negative and very large p_values. P_sim and p_folded_sim were between 0-1 as expected. I thought there was something wrong with my code so I run the exact same code as in the Getis-Ord-Gi* section of the Emerging Hot Spot Analysis vignette and also found negative p values.The res values are different from the ones in the vignette. I tried going back to older versions of sfdep and spdep and still the same issue. I couldn't replicate results. Is anybody else having the same problem? I am using R 4.3.2. Thank you so much.

Inefficient calculation of bivariate moran

sfdep/R/local-moran-bv-impl.R

Line 15 in 45f8e53

local_moran_bv_calc <- function(x, yj, wt) {

This calculation of the bivariate local moran is slow. It would be much more effecitve to calculate using x * st_lag(y, nb, wt). See benchmark below.

This can be replaced fairly simply. After changing local_moran_bv_calc() to include nb argument, local_moran_bv_impl() and local_moran_bv_perm_impl() will need to be modified to use the new calculation.

library(sfdep)

x <- guerry_nb$crime_pers
y <- guerry_nb$wealth
nb <- guerry_nb$nb
wt <- guerry_nb$wt

bench::mark(
  current = sfdep:::local_moran_bv_calc(x, find_xj(y, nb), wt),
  simpler = x * st_lag(y, nb, wt)
)
#> # A tibble: 2 × 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 current     107.5µs  115.6µs     8327.      24KB     16.7
#> 2 simpler      17.1µs   18.1µs    53472.     107KB     10.7

^{Created on 2022-08-05 by the reprex package (v2.0.1)}

nb to lines interface

Create a function to cast nbs to lines using nb2lines()

Possible functionalities to implement

cast to lines from nbs
cast to tidygraph

activate() drops active attribute

Identify inefficient mapply() operations and rewrite using Rcpp if necessary

Current use of mapply() is fine. But I suspect people may use these functions on objects larger than guerry. As such rewriting to use Rcpp would mean some sizeable speed improvements.

provide skater interface

Mapping/equivalences cross language

Hi! Co-maintainer of esda here. I'm not as well versed as @rsbivand here, but I think that the mappings between packages might need some duplication?

This arises because esda is "object-oriented" in its design... so its estimators compute & store a variety of statistics in a single pass. For example, in the G_Local class, we compute Z-scores their corresponding p-values (like those returned by spdep::localG) and store them in the .Zs and .p_norm attributes, respectively. We also, by default, compute the conditional permutation p-values p_sim using 999 replications, akin to spdep::localG_perm. So, the "cognate" in esda is G_Local for both spdep::localG and spdep::localG_perm. When we expose new inference versions, we'll probably just add them to the objects, too, if their computational cost is simple. That is, at least until we decide upon the next-generation interface, which is a GSOC project this year.

I could be misunderstanding how the cognate table is intended (or, indeed, what the equivalences are!) but it'd be useful to clarify.

guerry_nb |>
  global_moran_perm(crime_pers, nb, wt, 99)

st_contiguity <- function(x, queen = TRUE, ...) {

convert stars to spacetime representation

and vice versa

local_g_perm and local_gstar_perm returns only attributes

sfdep/R/localG.R

Line 44 in 91b4e19

stats::setNames(as.data.frame(attr(res, "internals")), localg_names)