Kia ora Josiah Thanks for your work on sfdep:: I was happy to see an

Thanks <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-u

When creating a reproducible example, I'd recommend using <a href="https://reprex.tidy

inconsistent results between emerging_hotspot_analysis() and mannually calculating hotspots step-by-step,about josiahparry/sfdep

Comments (8)

JosiahParry commented on June 20, 2024

Thanks @johnForne! Unfortunately I cannot replicate your code here.
A few things here. Firstly, your code does not replicate emerging hotspot analysis as it should be done. emerging hot spot analysis includes neighbors from different time lags. That means for code dept 1, its neighbors at the present time and the neighbors at a previous time are included in the calculation of the Gi*. By default k = 1 and not 0.

I'm also not so confident that your classification scheme is entirely accurate. These are fairly complex and you find them at https://github.com/JosiahParry/sfdep/blob/main/R/emerging-hostpot-classifications.R

I also had to write a specific version of the Gi* to be able to include time neighbors as well so that's definitely a place where things can be come different.
https://github.com/JosiahParry/sfdep/blob/main/R/spacetime-local-gistar-impl.R

I also think you're using a different version of the MK statistic than I (you're using, I think, https://rdrr.io/cran/modifiedmk/)

To use the output of include_gi you need to get it from an attribute. The attribute from the result can be gotten like so:

library(sfdep)

df_fp <- system.file("extdata", "bos-ecometric.csv", package = "sfdep")
geo_fp <- system.file("extdata", "bos-ecometric.geojson", package = "sfdep")

# read in data
df <- read.csv(df_fp, colClasses = c("character", "character", "integer", "double", "Date"))
geo <- sf::st_read(geo_fp)
#> Reading layer `bos-ecometric' from data source 
#>   `/Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library/sfdep/extdata/bos-ecometric.geojson' 
#>   using driver `GeoJSON'
#> Simple feature collection with 168 features and 1 field
#> Geometry type: POLYGON
#> Dimension:     XY
#> Bounding box:  xmin: -71.19115 ymin: 42.22788 xmax: -70.99445 ymax: 42.3974
#> Geodetic CRS:  NAD83

# Create spacetime object called `bos`
bos <- spacetime(df, geo,
                 .loc_col = ".region_id",
                 .time_col = "time_period")


# conduct EHSA
ehsa <- emerging_hotspot_analysis(
  x = bos,
  .var = "value",
  k = 1,
  nsim = 9,
  include_gi = TRUE
)

attr(ehsa, "gi_star")
#>            gi_star p_sim
#> 1    -0.9851338521   0.1
#> 2    -1.1651720095   0.1
#> 3    -1.2901364459   0.1
#> 4    -0.9406102825   0.1
#> 5    -1.1077628305   0.1

^{Created on 2023-11-28 with reprex v2.0.2}

from sfdep.

johnForne commented on June 20, 2024

Kia ora Josiah Thanks for getting back to me so quickly - this is very valuable and is helping me understand what's happening under the hood. Apologies if I'm missing something as I'm relatively new to this analysis, but I'd like to better understand why you seem to be suggesting that emerging hotspot analysis requires including neighbours from different time lags. As far as I understand, what we're trying to do is calculate where hot/cold spots are in a single time slice and to then analyse how the localg (z-score) for that geography changes over time. By comparing localg values over time doesn't this effectively compare say, polygon x (and it's neighbours) in time slice 1 with polygon x (and its neighbours) in time slice 2, etc. Or are you saying that emerging_hotspot_analysis(k = 1) works by smoothing both in space and time. This seems a novel idea - do you know how many other people introduce a k type argument. E.g. does ArcGIS emerging hotspot tool do this? Or is this something that you've developed? Are there any papers on this? From https://github.com/JosiahParry/sfdep/blob/main/R/emerging-hostpot-analysis.R I've read that "In addition to identifying neighbors in space, emerging hotspot analysis also incorporates the same observations from `k` periods ago-called a time lag. If the time lag k is 1 and the unit of time is month, the neighbors for the calculation of Gi* would include the spatial neighbors' values at time `t` and the same spatial neighbors' values at time `t-1`. If `k = 2`, it would include `t`, `t-1`, and `t-2`." I've tested adding in the argument "k=0" to emerging_hotspot_anlysis() and can see that the results differ from when I use the default "k=1". However, they're still not the same. You're right my code used a different version of the MK statistic, I did use the mmkh(). However, I've revised the code to use the MannKendall() (alternative = "greater"). However, even if I set k = 0 and use the standard MannKendall() my figures/maps still don't match (see below). I then tried using the classify_hotspots(threshold = 0.1) from https://github.com/JosiahParry/sfdep/blob/main/R/emerging-hostpot-classifications.R - I still can't get my maps/figures to match! In order to use classify_hotspots() I found that I needed to use local_gstar_perm() not local_gstar() as I had been... I noticed that there is a p_sim variable and that p_sim seems to be an output from local_gstar_perm() but not from local_gstar() - and that this p_sim was needed for classify_hotspots(). I'm not sure how to determine what to set n_sim argument to - but saw that you use 499 in one of your examples so I followed this. I noticed that the gi_star value differed depending on whether I used local_gstar() or local_gstar_perm(). I would like to understand the when I should use local_gstar_perm() and when local_gstar()? Also I'd be interested in your thoughts as to why I still can't get matching figures/maps even if 1. I set both to k = 0 (for comparability I think it makes sense to set to k =0 (even though you seem to be saying that k=1 is better) because I don't think there is a k argument when I calculate manually/step-by-step 2. I use local_gstar_perm() when I run manually/step-by-step 3. I use standard MannKendall() (not mmkh(). Please let me know why you can't replicate my code as this might help use get on the same page??? Thanks again, John [cid:da17fa9d-427c-4ade-a2ea-851e05430a4c] [cid:834c63dc-9584-4c94-af7e-3c7ed8c5184f] [cid:728f1220-a2a4-472d-bb6b-1d6a6503955f]

…

________________________________ From: Josiah Parry ***@***.***> Sent: Wednesday, November 29, 2023 2:01 AM To: JosiahParry/sfdep ***@***.***> Cc: John Forne ***@***.***>; Mention ***@***.***> Subject: Re: [JosiahParry/sfdep] inconsistent results between emerging_hotspot_analysis() and mannually calculating hotspots step-by-step (Issue #41) Thanks @johnForne<https://github.com/johnForne>! Unfortunately I cannot replicate your code here. A few things here. Firstly, your code does not replicate emerging hotspot analysis as it should be done. emerging hot spot analysis includes neighbors from different time lags. That means for code dept 1, its neighbors at the present time and the neighbors at a previous time are included in the calculation of the Gi*. By default k = 1 and not 0. I'm also not so confident that your classification scheme is entirely accurate. These are fairly complex and you find them at https://github.com/JosiahParry/sfdep/blob/main/R/emerging-hostpot-classifications.R I also had to write a specific version of the Gi* to be able to include time neighbors as well so that's definitely a place where things can be come different. https://github.com/JosiahParry/sfdep/blob/main/R/spacetime-local-gistar-impl.R I also think you're using a different version of the MK statistic than I (you're using, I think, https://rdrr.io/cran/modifiedmk/) To use the output of include_gi you need to get it from an attribute. The attribute from the result can be gotten like so: library(sfdep) df_fp <- system.file("extdata", "bos-ecometric.csv", package = "sfdep") geo_fp <- system.file("extdata", "bos-ecometric.geojson", package = "sfdep") # read in data df <- read.csv(df_fp, colClasses = c("character", "character", "integer", "double", "Date")) geo <- sf::st_read(geo_fp) #> Reading layer `bos-ecometric' from data source #> `/Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library/sfdep/extdata/bos-ecometric.geojson' #> using driver `GeoJSON' #> Simple feature collection with 168 features and 1 field #> Geometry type: POLYGON #> Dimension: XY #> Bounding box: xmin: -71.19115 ymin: 42.22788 xmax: -70.99445 ymax: 42.3974 #> Geodetic CRS: NAD83 # Create spacetime object called `bos` bos <- spacetime(df, geo, .loc_col = ".region_id", .time_col = "time_period") # conduct EHSA ehsa <- emerging_hotspot_analysis( x = bos, .var = "value", k = 1, nsim = 9, include_gi = TRUE ) attr(ehsa, "gi_star") #> gi_star p_sim #> 1 -0.9851338521 0.1 #> 2 -1.1651720095 0.1 #> 3 -1.2901364459 0.1 #> 4 -0.9406102825 0.1 #> 5 -1.1077628305 0.1 Created on 2023-11-28 with reprex v2.0.2<https://reprex.tidyverse.org> — Reply to this email directly, view it on GitHub<#41 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AO3QSUOSWVMVRK7YFX3YBSDYGXOENAVCNFSM6AAAAAA75TFUA2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMRZG44TOOBSGQ>. You are receiving this because you were mentioned. This email is from an external sender. Do not click links or open attachments unless you recognise the sender and know the content is safe

from sfdep.

JosiahParry commented on June 20, 2024

When creating a reproducible example, I'd recommend using reprex so that you can ensure that your code is reproducible. Your code uses a version of simplevis that has something called gg_sf_col() which is not available in the current release. It also does not load the library that the mmkh() is from (or call it by namespace), and some other things. Again, reprex is super handy here!

I took no creative direction with this implementation of Emerging Hot Spot Analysis and having since joined the team, I have confirmed that the way that neighbors and weights are used are correct. Neighbors are identified and weighted based on space. Then neighbors from k time lags are included in the calculation of the statistic.

A couple reasons why your code is likely not reproducing the results:

you're finding neighbors based on the entire dataset instead of just the space context. The means that the nb object is going to have indices greater than the number of rows per group. Instead you should use group_by(time_period) |> mutate(nb = include_self(st_contiguity(geometry))) so that you find neighbors per time period
It also appears that your calculation of the MK stat is done for each and every location's value which is why you're getting null values. Here's a reprex that you can use to explore it further. I've spent over an hour on this this morning and that's all ive got in me ;)

You can fork the repository and explore the local_g_spt_calc() and functions like that. The package puts a lot of work into making sure that neighbors from the right time windows are fetched and used. The differences you are seeing are from a very slight difference in the calculation of the local G. The values are quite similar but slightly different. The p-values are also going to be different each run because of conditional permutation.

You can read my blog post on complete spatial randomness if you'd like (ugh looks like quarto broke my images!!! )

library(sfdep)
library(dplyr)

x <- purrr::map_dfr(1:10, ~guerry)  %>%  
  dplyr::select(code_dept, crime_pers) %>% 
  # create an indicator for time period
  mutate(
    time_period = sort(rep(1:10, 85)),
    crime_pers = crime_pers * runif(850, max = 2)
  )

xnb <- x |> 
  group_by(time_period) |> 
  mutate(
    nb = include_self(st_contiguity(geometry)),
    wt = st_weights(nb)
  ) |> 
  ungroup()

all_gis <- xnb |> 
  sf::st_drop_geometry() |> 
  group_by(time_period) |> 
  mutate(
    gi_star = local_gstar_perm(crime_pers, nb, wt)
  ) |> 
  tidyr::unnest_wider(gi_star) 

ehsa <- all_gis |> 
  group_by(code_dept) |> 
  summarise(mk = as.data.frame(unclass(Kendall::MannKendall(gi_star))))

ehsa
#> # A tibble: 85 × 2
#>    code_dept  mk$tau    $sl    $S    $D $varS
#>    <fct>       <dbl>  <dbl> <dbl> <dbl> <dbl>
#>  1 01         0.511  0.0491    23    45   125
#>  2 02         0.244  0.371     11    45   125
#>  3 03         0.0222 1          1    45   125
#>  4 04        -0.111  0.721     -5    45   125
#>  5 05        -0.200  0.474     -9    45   125
#>  6 07         0.289  0.283     13    45   125
#>  7 08         0.111  0.721      5    45   125
#>  8 09        -0.244  0.371    -11    45   125
#>  9 10        -0.111  0.721     -5    45   125
#> 10 11        -0.200  0.474     -9    45   125
#> # ℹ 75 more rows


# EHSA --------------------------------------------------------------------

spt <- as_spacetime(x, "code_dept", "time_period")

# use sfdep
ehsa_guerry <-
  emerging_hotspot_analysis(
    spt,
    "crime_pers",
    include_gi = TRUE,
    threshold = 0.05,
    nsim = 99
  ) 


ehsa_guerry
#> # A tibble: 85 × 4
#>    location     tau p_value classification     
#>    <fct>      <dbl>   <dbl> <chr>              
#>  1 01        0.467   0.0736 sporadic hotspot   
#>  2 02        0.467   0.0736 sporadic hotspot   
#>  3 03        0.289   0.283  sporadic hotspot   
#>  4 04       -0.244   0.371  sporadic coldspot  
#>  5 05       -0.244   0.371  no pattern detected
#>  6 07        0.333   0.210  no pattern detected
#>  7 08        0.378   0.152  sporadic hotspot   
#>  8 09       -0.422   0.107  sporadic coldspot  
#>  9 10       -0.0222  1      sporadic hotspot   
#> 10 11       -0.378   0.152  sporadic coldspot  
#> # ℹ 75 more rows

^{Created on 2023-11-29 with reprex v2.0.2}

from sfdep.

johnForne commented on June 20, 2024

Kia ora Josiah Thanks heaps for your thorough and informative answers. I've shared with the team and we're learning and figuring out how to use sfdep:: nicely. Apologies for the delay in replying and for not providing a reproducible example. I'll take up your suggestion and will look into using reprex. Simplevis:: was developed by someone who used to be in our team and we use an old version of this. Anyhow... We're looking to analyse emerging hotspots using local g star. We're still not 100% sure whether we'll simply use sfdep::emerging_hotspot_analysis(k = 1) or whether we're use sfdep:: functions more manually. With the reason for doing this process more manually because we're interested in using a Bayesian model to analyse the trends (rather than a Mann Kendall test). However, following your comments below we've discovered that it is potentially complicated to do it step-by-step... I suspect that we should be using local_g_spt() rather than local_gstar_perm() as local_g_spt() as it seems to account for the fact that the assumption of normality with spatial data is often not meet (which local_gstar_perm() does - but it also that it 1. works on a space time object (not just a single timeslice) - one of the arguments in local_g_spt() seems to be time lagged spatial neighbours from spt_nb(). Though I'm unclear how this actually works. spt_nb seems to include k = 1. Which I understand means that gi_star is calculated for a neighbourhood both in space and time. What I'm unclear about though is how it calculates the "global" mean (whether its all hexagons from time=t, hexagons from time=t and t-1, or hexagons from all time points ie the whole dataset)??? Lauren Scott<https://www.youtube.com/watch?v=9VDRYBvOoDI&list=PLGZUzt4E4O2LuV0vuH74WN6j9nxv0jUty&index=10> (from around 5':38'') seems to suggest that when defining the global window there are three options (neighbourhoods are compared with: i. the entire cube; ii. ). Option one (i) seems to smooth too much and doesn't seem preferrable. Instinctively, it seems to make more sense to compare with bins in the same time period (including t-1) only. The issue is that it is not clear to me at least from https://github.com/JosiahParry/sfdep/blob/main/R/emerging-hostpot-analysis.R what global is used... Can you please clarifiy? I'm also a bit unclear about how to use the gi_star value from local_g_spt(). From your video (https://www.youtube.com/watch?v=OnMNZwJywjs) it is apparent that I should use both the z-score (gi_star) values and the p_folded_sim values to categories the hotspots (into those that are statistically significant vs those that aren't). I think I would then calculate trends on these gi_star values (that are smoothed in space and time (k = 1) using a Bayesian model. And finally we could then use classify_hotspot() to categorise. Thanks in again for your help - much appreciated, John [cid:ee719990-af6d-4eab-853f-e37127da453d]<https://www.youtube.com/watch?v=OnMNZwJywjs>   Hot Spot Analysis in R: GIS Fundamentals<https://www.youtube.com/watch?v=OnMNZwJywjs> Hot Spot Analysis is one of the most common uses of local indicators of spatial analysis (LISA). In this code walk through we conduct a hot spot analysis using the Getis-Ord Local G (or Gi / Gi*) of Robbery in the metro Atlanta area using the R packages sfdep, dplyr, and ggplot2. We conclude by making a map of crime hot and cold spots. GitHub ... www.youtube.com   [cid:bd249373-f544-44f3-8216-6177eb419f8a]

…

________________________________ From: Josiah Parry ***@***.***> Sent: Thursday, November 30, 2023 3:26 AM To: JosiahParry/sfdep ***@***.***> Cc: John Forne ***@***.***>; Mention ***@***.***> Subject: Re: [JosiahParry/sfdep] inconsistent results between emerging_hotspot_analysis() and mannually calculating hotspots step-by-step (Issue #41) When creating a reproducible example, I'd recommend using reprex<https://reprex.tidyverse.org/> so that you can ensure that your code is reproducible. Your code uses a version of simplevis that has something called gg_sf_col() which is not available in the current release. It also does not load the library that the mmkh() is from (or call it by namespace), and some other things. Again, reprex is super handy here! I took no creative direction with this implementation of Emerging Hot Spot Analysis and having since joined the team, I have confirmed that the way that neighbors and weights are used are correct. Neighbors are identified and weighted based on space. Then neighbors from k time lags are included in the calculation of the statistic. A couple reasons why your code is likely not reproducing the results: * you're finding neighbors based on the entire dataset instead of just the space context. The means that the nb object is going to have indices greater than the number of rows per group. Instead you should use group_by(time_period) |> mutate(nb = include_self(st_contiguity(geometry))) so that you find neighbors per time period It also appears that your calculation of the MK stat is done for each and every location's value which is why you're getting null values. Here's a reprex that you can use to explore it further. I've spent over an hour on this this morning and that's all ive got in me ;) You can fork the repository and explore the local_g_spt_calc() and functions like that. The package puts a lot of work into making sure that neighbors from the right time windows are fetched and used. The differences you are seeing are from a very slight difference in the calculation of the local G. The values are quite similar but slightly different. The p-values are also going to be different each run because of conditional permutation. You can read my blog post<https://josiahparry.com/posts/csr> on complete spatial randomness if you'd like (ugh looks like quarto broke my images!!! ) library(sfdep) library(dplyr) x <- purrr::map_dfr(1:10, ~guerry) %>% dplyr::select(code_dept, crime_pers) %>% # create an indicator for time period mutate( time_period = sort(rep(1:10, 85)), crime_pers = crime_pers * runif(850, max = 2) ) xnb <- x |> group_by(time_period) |> mutate( nb = include_self(st_contiguity(geometry)), wt = st_weights(nb) ) |> ungroup() all_gis <- xnb |> sf::st_drop_geometry() |> group_by(time_period) |> mutate( gi_star = local_gstar_perm(crime_pers, nb, wt) ) |> tidyr::unnest_wider(gi_star) ehsa <- all_gis |> group_by(code_dept) |> summarise(mk = as.data.frame(unclass(Kendall::MannKendall(gi_star)))) ehsa #> # A tibble: 85 × 2 #> code_dept mk$tau $sl $S $D $varS #> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> #> 1 01 0.511 0.0491 23 45 125 #> 2 02 0.244 0.371 11 45 125 #> 3 03 0.0222 1 1 45 125 #> 4 04 -0.111 0.721 -5 45 125 #> 5 05 -0.200 0.474 -9 45 125 #> 6 07 0.289 0.283 13 45 125 #> 7 08 0.111 0.721 5 45 125 #> 8 09 -0.244 0.371 -11 45 125 #> 9 10 -0.111 0.721 -5 45 125 #> 10 11 -0.200 0.474 -9 45 125 #> # ℹ 75 more rows # EHSA -------------------------------------------------------------------- spt <- as_spacetime(x, "code_dept", "time_period") # use sfdep ehsa_guerry <- emerging_hotspot_analysis( spt, "crime_pers", include_gi = TRUE, threshold = 0.05, nsim = 99 ) ehsa_guerry #> # A tibble: 85 × 4 #> location tau p_value classification #> <fct> <dbl> <dbl> <chr> #> 1 01 0.467 0.0736 sporadic hotspot #> 2 02 0.467 0.0736 sporadic hotspot #> 3 03 0.289 0.283 sporadic hotspot #> 4 04 -0.244 0.371 sporadic coldspot #> 5 05 -0.244 0.371 no pattern detected #> 6 07 0.333 0.210 no pattern detected #> 7 08 0.378 0.152 sporadic hotspot #> 8 09 -0.422 0.107 sporadic coldspot #> 9 10 -0.0222 1 sporadic hotspot #> 10 11 -0.378 0.152 sporadic coldspot #> # ℹ 75 more rows Created on 2023-11-29 with reprex v2.0.2<https://reprex.tidyverse.org> — Reply to this email directly, view it on GitHub<#41 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AO3QSUPVBTMH2CF3Q6GA3J3YG5AZHAVCNFSM6AAAAAA75TFUA2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMZRHE4TQMZTGA>. You are receiving this because you were mentioned. This email is from an external sender. Do not click links or open attachments unless you recognise the sender and know the content is safe

from sfdep.

JosiahParry commented on June 20, 2024

The tool uses the Individual time step windowing method and does not calculate Gi* in the context of the whole cube.

local_gstar_perm() also handles the lack of assumption of normality. This is done through the conditional permutation approach of calculating pseudo p-values. local_g_spt is not exported (i dont think) and is intended strictly for the use case of spacetime objects.

The pseudo p-values (simulated) are used to classify the hotspots (e.g. was it significant 5 times or fewer type of thing). The Gi* star value determines the type of hotspot (negative cold positive hot). You calculate the trend itself on the Gi* values themselves. So you chunk your data into k time steps calculate your statistic and then do a trend test on the statistic itself. :)

I hope that helps :)

from sfdep.

johnForne commented on June 20, 2024

Amazing 😁 thank you so much! This is very helpful. To be clear can you confirm that local_gstar_perm() takes a k argument and takes a space time neighbours ( I.e. those across year1 and year 1-1 when k =1) or whether i have to create a space time neighbourhood using spt_nb() and that this can then be passed to local_gstar_perm()???? Also when you say Individual time stepwindowing method can you please confirm that this calculates gi_star for the years and year I-1? Thanks heaps, John Get Outlook for iOS<https://aka.ms/o0ukef>

…

________________________________ From: Josiah Parry ***@***.***> Sent: Saturday, December 2, 2023 5:21:13 AM To: JosiahParry/sfdep ***@***.***> Cc: John Forne ***@***.***>; Mention ***@***.***> Subject: Re: [JosiahParry/sfdep] inconsistent results between emerging_hotspot_analysis() and mannually calculating hotspots step-by-step (Issue #41) The tool uses the Individual time step windowing method and does not calculate Gi* in the context of the whole cube. local_gstar_perm() also handles the lack of assumption of normality. This is done through the conditional permutation approach of calculating pseudo p-values. local_g_spt is not exported (i dont think) and is intended strictly for the use case of spacetime objects. The pseudo p-values (simulated) are used to classify the hotspots (e.g. was it significant 5 times or fewer type of thing). The Gi* star value determines the type of hotspot (negative cold positive hot). You calculate the trend itself on the Gi* values themselves. So you chunk your data into k time steps calculate your statistic and then do a trend test on the statistic itself. :) I hope that helps :) — Reply to this email directly, view it on GitHub<#41 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AO3QSUJSWXXLWTCHSO6P4RLYHH7XTAVCNFSM6AAAAAA75TFUA2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMZWGQYDEMBZGM>. You are receiving this because you were mentioned.Message ID: ***@***.***> This email is from an external sender. Do not click links or open attachments unless you recognise the sender and know the content is safe

from sfdep.

johnForne commented on June 20, 2024

Kia ora Josiah Apologies for all my emails asking for help. In addition to my latest email below (asking for confirmation that the Individual time step windowing method you mention means local_gstar_perm() calculates gi_star for the yeari and yeari-1) can you please help me with the following question? Thanks again!, John 1. how should I categorise p_folded_sim and gi_star values? I think it seems to make the most sense to use local_gstar_perm() - rather than local_gstar() which I was using initially. I gather ocal_gstar_perm() seems<https://sfdep.josiahparry.com/articles/conditional-permutation> to account for the fact that the assumption of normality with spatial data are often not meet. However, local_gstar_perm() returns both a gi_star value and other parameters, including p_folded_sim. My issues is that I am unclear how to use these variables. In particular, rather than simply having a gi_star value that we can categorise into different confidence bands for hot or cold spots - it seems from https://sfdep.josiahparry.com/articles/understanding-emerging-hotspots.html that we should treat the results of local_gstar_perm() slightly differently. In particular, that we should categorise based on the p_folded_sim for confidence and the sign of the gi_star value for whether it is a cluster of high or low values. This then got me thinking whether we should use the p-values in this way. In particular, the local_gstar_perm() includes an argument for "alternative". And by default this "alternative = "two.sided". It seems to me that if the alternative is set to two.sided - then we would need to divide the p_(folded_sim)value by two in order to categorise the confidence that a positive (or negative) gi_star value is indicating a trend??? Yet, from what I can see of the documentation for sfdep:: it seems that the p-value coming out of local_gstar_perm() is two.sided yet it is interpreted as if it was the p-value for a one-sided test. I've had a play with setting alternative = "greater" and am puzzled... sometimes the p-values from the two-sided test are larger than the p-values from the one-sided test and other times vice versa. I would have expected the p-values from a one-sided test to be the same as the p-values from a two.sided test/2? Given that ultimately I'm looking to identify (i) whether the polygon is a hotspot/coldspot, and; (ii) the confidence we have that this polygon is a hotspot or coldspot - please let me know whether you think we should set the argument for local_gstar_perm(alternative = "greater"). I'm also wondering whether we need to tweek the way we categorise the p_folded_sim and gi_star values so that if gi_star > 0 then p_folded_sim values are catorised as is and if gi_star < 0 then p_folded_sim values are "flipped" so that we get the confidence from 1 - p_folded_sim???

…

________________________________ From: John Forne ***@***.***> Sent: Saturday, December 2, 2023 7:53 AM To: JosiahParry/sfdep ***@***.***>; JosiahParry/sfdep ***@***.***> Cc: Mention ***@***.***> Subject: Re: [JosiahParry/sfdep] inconsistent results between emerging_hotspot_analysis() and mannually calculating hotspots step-by-step (Issue #41) Amazing 😁 thank you so much! This is very helpful. To be clear can you confirm that local_gstar_perm() takes a k argument and takes a space time neighbours ( I.e. those across year1 and year 1-1 when k =1) or whether i have to create a space time neighbourhood using spt_nb() and that this can then be passed to local_gstar_perm()???? Also when you say Individual time stepwindowing method can you please confirm that this calculates gi_star for the years and year I-1? Thanks heaps, John Get Outlook for iOS<https://aka.ms/o0ukef>

________________________________ From: Josiah Parry ***@***.***> Sent: Saturday, December 2, 2023 5:21:13 AM To: JosiahParry/sfdep ***@***.***> Cc: John Forne ***@***.***>; Mention ***@***.***> Subject: Re: [JosiahParry/sfdep] inconsistent results between emerging_hotspot_analysis() and mannually calculating hotspots step-by-step (Issue #41) The tool uses the Individual time step windowing method and does not calculate Gi* in the context of the whole cube. local_gstar_perm() also handles the lack of assumption of normality. This is done through the conditional permutation approach of calculating pseudo p-values. local_g_spt is not exported (i dont think) and is intended strictly for the use case of spacetime objects. The pseudo p-values (simulated) are used to classify the hotspots (e.g. was it significant 5 times or fewer type of thing). The Gi* star value determines the type of hotspot (negative cold positive hot). You calculate the trend itself on the Gi* values themselves. So you chunk your data into k time steps calculate your statistic and then do a trend test on the statistic itself. :) I hope that helps :) — Reply to this email directly, view it on GitHub<#41 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AO3QSUJSWXXLWTCHSO6P4RLYHH7XTAVCNFSM6AAAAAA75TFUA2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMZWGQYDEMBZGM>. You are receiving this because you were mentioned. This email is from an external sender. Do not click links or open attachments unless you recognise the sender and know the content is safe

from sfdep.

JosiahParry commented on June 20, 2024

I can only recommend that you use emerging_hotspot_analysis() and use the approach there. Alternatively, if you have ArcGIS Pro available to you, you can use the original implementation there.

One last alternative approach you can take is to use the internal function spt_nb() to calculate space time neighbors and pass that to local_gstar_perm(). Below is an example of doing this.

To interpret the scores, you look at the sign of gi_star positive is "hot" and negative is "cold". I recommend you use p_folded_sim since it does not assume normality. You should not divide the p-value by 2. What you have described is a two-tailed test: that it is more extreme than expected under complete spatial randomness in either direction (hot or cold). You should also use a more conservative cutoff than 0.05 such as 0.001 as is recommended by Luc Anselin for conditional permutation approach to p-value calculation.

You should not be using the Gi* statistic to determine whether or not a trend is present. Lastly, its worth noting that the Mann-Kendall stat is non-parametric and has no assumption of normality.

library(sfdep)
library(dplyr)

x <- purrr::map_dfr(1:10, ~guerry)  %>%  
  dplyr::select(code_dept, crime_pers) %>% 
  # create an indicator for time period
  mutate(
    time_period = sort(rep(1:10, 85)),
    crime_pers = crime_pers * runif(850, max = 2)
  )

# calculate neighbors in space
nb <- include_self(st_contiguity(guerry$geometry))

spt_nbs <- sfdep:::spt_nb(
  nb,
  n_times = 10,
  n_locs = length(nb),
  k = 1
) |> 
  # handles bug in the internal function that makes it numeric 
  # not integer
  lapply(as.integer)

# make it into a nb class object 
attributes(spt_nbs) <- attributes(nb)

# calculate row-standardized weights
wts <- st_weights(spt_nbs)

# use local_gstar_p
gis <- local_gstar_perm(x$crime_pers, spt_nbs, wts)

head(gis)
#>       gi_star        e_gi       var_gi    p_value        p_sim p_folded_sim
#> 1  2.81081999 0.001484892 1.041060e-07  2.1986013 0.0279062810        0.048
#> 2  0.07275412 0.001086474 8.546990e-08  0.3839037 0.7010498597        0.684
#> 3  3.07011902 0.001030353 8.024913e-08  3.8285633 0.0001288935        0.008
#> 4 -0.47830940 0.001237108 1.067520e-07 -0.7156852 0.4741857566        0.468
#> 5  0.07303781 0.001317220 1.300446e-07 -0.3082567 0.7578870420        0.808
#> 6 -1.86127253 0.001134215 6.955435e-08 -1.8565154 0.0633801274        0.044
#>   skewness  kurtosis
#> 1    0.024 0.4356863
#> 2    0.342 0.4500153
#> 3    0.004 0.3997048
#> 4    0.234 0.2333309
#> 5    0.404 0.3420084
#> 6    0.022 0.3065077

from sfdep.

inconsistent results between emerging_hotspot_analysis() and mannually calculating hotspots step-by-step about sfdep HOT 8 CLOSED

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent