Giter Club home page Giter Club logo

Comments (8)

JosiahParry avatar JosiahParry commented on June 20, 2024

Thanks @johnForne! Unfortunately I cannot replicate your code here.
A few things here. Firstly, your code does not replicate emerging hotspot analysis as it should be done. emerging hot spot analysis includes neighbors from different time lags. That means for code dept 1, its neighbors at the present time and the neighbors at a previous time are included in the calculation of the Gi*. By default k = 1 and not 0.

I'm also not so confident that your classification scheme is entirely accurate. These are fairly complex and you find them at https://github.com/JosiahParry/sfdep/blob/main/R/emerging-hostpot-classifications.R

I also had to write a specific version of the Gi* to be able to include time neighbors as well so that's definitely a place where things can be come different.
https://github.com/JosiahParry/sfdep/blob/main/R/spacetime-local-gistar-impl.R

I also think you're using a different version of the MK statistic than I (you're using, I think, https://rdrr.io/cran/modifiedmk/)

To use the output of include_gi you need to get it from an attribute. The attribute from the result can be gotten like so:

library(sfdep)

df_fp <- system.file("extdata", "bos-ecometric.csv", package = "sfdep")
geo_fp <- system.file("extdata", "bos-ecometric.geojson", package = "sfdep")

# read in data
df <- read.csv(df_fp, colClasses = c("character", "character", "integer", "double", "Date"))
geo <- sf::st_read(geo_fp)
#> Reading layer `bos-ecometric' from data source 
#>   `/Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library/sfdep/extdata/bos-ecometric.geojson' 
#>   using driver `GeoJSON'
#> Simple feature collection with 168 features and 1 field
#> Geometry type: POLYGON
#> Dimension:     XY
#> Bounding box:  xmin: -71.19115 ymin: 42.22788 xmax: -70.99445 ymax: 42.3974
#> Geodetic CRS:  NAD83

# Create spacetime object called `bos`
bos <- spacetime(df, geo,
                 .loc_col = ".region_id",
                 .time_col = "time_period")


# conduct EHSA
ehsa <- emerging_hotspot_analysis(
  x = bos,
  .var = "value",
  k = 1,
  nsim = 9,
  include_gi = TRUE
)

attr(ehsa, "gi_star")
#>            gi_star p_sim
#> 1    -0.9851338521   0.1
#> 2    -1.1651720095   0.1
#> 3    -1.2901364459   0.1
#> 4    -0.9406102825   0.1
#> 5    -1.1077628305   0.1

Created on 2023-11-28 with reprex v2.0.2

from sfdep.

johnForne avatar johnForne commented on June 20, 2024

from sfdep.

JosiahParry avatar JosiahParry commented on June 20, 2024

When creating a reproducible example, I'd recommend using reprex so that you can ensure that your code is reproducible. Your code uses a version of simplevis that has something called gg_sf_col() which is not available in the current release. It also does not load the library that the mmkh() is from (or call it by namespace), and some other things. Again, reprex is super handy here!

I took no creative direction with this implementation of Emerging Hot Spot Analysis and having since joined the team, I have confirmed that the way that neighbors and weights are used are correct. Neighbors are identified and weighted based on space. Then neighbors from k time lags are included in the calculation of the statistic.

A couple reasons why your code is likely not reproducing the results:

  • you're finding neighbors based on the entire dataset instead of just the space context. The means that the nb object is going to have indices greater than the number of rows per group. Instead you should use group_by(time_period) |> mutate(nb = include_self(st_contiguity(geometry))) so that you find neighbors per time period
    It also appears that your calculation of the MK stat is done for each and every location's value which is why you're getting null values. Here's a reprex that you can use to explore it further. I've spent over an hour on this this morning and that's all ive got in me ;)

You can fork the repository and explore the local_g_spt_calc() and functions like that. The package puts a lot of work into making sure that neighbors from the right time windows are fetched and used. The differences you are seeing are from a very slight difference in the calculation of the local G. The values are quite similar but slightly different. The p-values are also going to be different each run because of conditional permutation.

You can read my blog post on complete spatial randomness if you'd like (ugh looks like quarto broke my images!!! )

library(sfdep)
library(dplyr)

x <- purrr::map_dfr(1:10, ~guerry)  %>%  
  dplyr::select(code_dept, crime_pers) %>% 
  # create an indicator for time period
  mutate(
    time_period = sort(rep(1:10, 85)),
    crime_pers = crime_pers * runif(850, max = 2)
  )

xnb <- x |> 
  group_by(time_period) |> 
  mutate(
    nb = include_self(st_contiguity(geometry)),
    wt = st_weights(nb)
  ) |> 
  ungroup()

all_gis <- xnb |> 
  sf::st_drop_geometry() |> 
  group_by(time_period) |> 
  mutate(
    gi_star = local_gstar_perm(crime_pers, nb, wt)
  ) |> 
  tidyr::unnest_wider(gi_star) 

ehsa <- all_gis |> 
  group_by(code_dept) |> 
  summarise(mk = as.data.frame(unclass(Kendall::MannKendall(gi_star))))

ehsa
#> # A tibble: 85 × 2
#>    code_dept  mk$tau    $sl    $S    $D $varS
#>    <fct>       <dbl>  <dbl> <dbl> <dbl> <dbl>
#>  1 01         0.511  0.0491    23    45   125
#>  2 02         0.244  0.371     11    45   125
#>  3 03         0.0222 1          1    45   125
#>  4 04        -0.111  0.721     -5    45   125
#>  5 05        -0.200  0.474     -9    45   125
#>  6 07         0.289  0.283     13    45   125
#>  7 08         0.111  0.721      5    45   125
#>  8 09        -0.244  0.371    -11    45   125
#>  9 10        -0.111  0.721     -5    45   125
#> 10 11        -0.200  0.474     -9    45   125
#> # ℹ 75 more rows


# EHSA --------------------------------------------------------------------

spt <- as_spacetime(x, "code_dept", "time_period")

# use sfdep
ehsa_guerry <-
  emerging_hotspot_analysis(
    spt,
    "crime_pers",
    include_gi = TRUE,
    threshold = 0.05,
    nsim = 99
  ) 


ehsa_guerry
#> # A tibble: 85 × 4
#>    location     tau p_value classification     
#>    <fct>      <dbl>   <dbl> <chr>              
#>  1 01        0.467   0.0736 sporadic hotspot   
#>  2 02        0.467   0.0736 sporadic hotspot   
#>  3 03        0.289   0.283  sporadic hotspot   
#>  4 04       -0.244   0.371  sporadic coldspot  
#>  5 05       -0.244   0.371  no pattern detected
#>  6 07        0.333   0.210  no pattern detected
#>  7 08        0.378   0.152  sporadic hotspot   
#>  8 09       -0.422   0.107  sporadic coldspot  
#>  9 10       -0.0222  1      sporadic hotspot   
#> 10 11       -0.378   0.152  sporadic coldspot  
#> # ℹ 75 more rows

Created on 2023-11-29 with reprex v2.0.2

from sfdep.

johnForne avatar johnForne commented on June 20, 2024

from sfdep.

JosiahParry avatar JosiahParry commented on June 20, 2024

The tool uses the Individual time step windowing method and does not calculate Gi* in the context of the whole cube.

local_gstar_perm() also handles the lack of assumption of normality. This is done through the conditional permutation approach of calculating pseudo p-values. local_g_spt is not exported (i dont think) and is intended strictly for the use case of spacetime objects.

The pseudo p-values (simulated) are used to classify the hotspots (e.g. was it significant 5 times or fewer type of thing). The Gi* star value determines the type of hotspot (negative cold positive hot). You calculate the trend itself on the Gi* values themselves. So you chunk your data into k time steps calculate your statistic and then do a trend test on the statistic itself. :)

I hope that helps :)

from sfdep.

johnForne avatar johnForne commented on June 20, 2024

from sfdep.

johnForne avatar johnForne commented on June 20, 2024

from sfdep.

JosiahParry avatar JosiahParry commented on June 20, 2024

I can only recommend that you use emerging_hotspot_analysis() and use the approach there. Alternatively, if you have ArcGIS Pro available to you, you can use the original implementation there.

One last alternative approach you can take is to use the internal function spt_nb() to calculate space time neighbors and pass that to local_gstar_perm(). Below is an example of doing this.

To interpret the scores, you look at the sign of gi_star positive is "hot" and negative is "cold". I recommend you use p_folded_sim since it does not assume normality. You should not divide the p-value by 2. What you have described is a two-tailed test: that it is more extreme than expected under complete spatial randomness in either direction (hot or cold). You should also use a more conservative cutoff than 0.05 such as 0.001 as is recommended by Luc Anselin for conditional permutation approach to p-value calculation.

You should not be using the Gi* statistic to determine whether or not a trend is present. Lastly, its worth noting that the Mann-Kendall stat is non-parametric and has no assumption of normality.

library(sfdep)
library(dplyr)

x <- purrr::map_dfr(1:10, ~guerry)  %>%  
  dplyr::select(code_dept, crime_pers) %>% 
  # create an indicator for time period
  mutate(
    time_period = sort(rep(1:10, 85)),
    crime_pers = crime_pers * runif(850, max = 2)
  )

# calculate neighbors in space
nb <- include_self(st_contiguity(guerry$geometry))

spt_nbs <- sfdep:::spt_nb(
  nb,
  n_times = 10,
  n_locs = length(nb),
  k = 1
) |> 
  # handles bug in the internal function that makes it numeric 
  # not integer
  lapply(as.integer)

# make it into a nb class object 
attributes(spt_nbs) <- attributes(nb)

# calculate row-standardized weights
wts <- st_weights(spt_nbs)

# use local_gstar_p
gis <- local_gstar_perm(x$crime_pers, spt_nbs, wts)

head(gis)
#>       gi_star        e_gi       var_gi    p_value        p_sim p_folded_sim
#> 1  2.81081999 0.001484892 1.041060e-07  2.1986013 0.0279062810        0.048
#> 2  0.07275412 0.001086474 8.546990e-08  0.3839037 0.7010498597        0.684
#> 3  3.07011902 0.001030353 8.024913e-08  3.8285633 0.0001288935        0.008
#> 4 -0.47830940 0.001237108 1.067520e-07 -0.7156852 0.4741857566        0.468
#> 5  0.07303781 0.001317220 1.300446e-07 -0.3082567 0.7578870420        0.808
#> 6 -1.86127253 0.001134215 6.955435e-08 -1.8565154 0.0633801274        0.044
#>   skewness  kurtosis
#> 1    0.024 0.4356863
#> 2    0.342 0.4500153
#> 3    0.004 0.3997048
#> 4    0.234 0.2333309
#> 5    0.404 0.3420084
#> 6    0.022 0.3065077

from sfdep.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.