Giter Club home page Giter Club logo

ungroup's People

Contributors

andreaskiermeier avatar mpascariu avatar timriffe avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

ungroup's Issues

MortalitySmooth dependency now archived

Great package, I use the heck out of it, but it's tricky to install at the moment.

The following observations are from a user perspective:

MortalitySmooth is now in the CRAN archives. Maybe in part this is because its dependency svcm has been archived. And for this reason now ungroup has been archived, and its dependencies have installation problems.

if one does

install.packages("https://cran.r-project.org/src/contrib/Archive/svcm/svcm_0.1.2.tar.gz", repos = NULL, type = "source")
install.packages("https://cran.r-project.org/src/contrib/Archive/MortalitySmooth/MortalitySmooth_2.3.4.tar.gz", repos = NULL, type = "source")

followed by

remotes::install_github("mpascariu/ungroup")

Then one can install (provided Rcpp, RcppEigen, and rgl are available).

From package maintainer perspective, I do not know what the proper course of action is. I've emailed the maintainers of svcm and MortalitySmooth and you too proposing a temporary solution.

0s produce error

The following worked for an earlier version of pclm:

x <- c(14:19,seq(20,50,by=5))
y <- c(0, 5.89614302375851, 27.5281691833457, 154.360824153406, 404.073482638157, 
826.841462608498, 15596.6097885998, 31266.5457249206, 32973.6915087617, 
28942.3542158762, 14290.9937703288, 1988.94222234551, 25.5661614888236
)
pclm(x=x,y=y,nlast=5)
# Error in if ((d < tol || dd < 0.1) && it >= 4) break : 
#  missing value where TRUE/FALSE needed

Trial and error leads me to conclude that this is due to the 0 in the first age bin. In this case I'd like to just get that 0 back. Perhaps fitting could be shielded from its nefarious effects, while still preserving the 0 in situ. Likewise for NAs (prev issue)? Or, if you consider the error to be justified, then it merits documentation and an informative warning. Another workaround would be a weight vector where I can assign a weight of 0 myself to cases where bin values are 0. Or is that possible already?

The use of offset produces negative output rates when input rates are low

Thank you for all your hard work in this very useful package.

I have noticed that when I try to ungroup mortality rates of uncommon diseases I get negative rates in the outputs. For example

library(ungroup) # CRAN version 1.1.0
m1 <- pclm(seq(0, 80, 5), c(0, 0, 0, 0, 0, 0, 0, 0, 5, 15, 10, 17, 20, 30, 40, 45, 49), 21, rep(1e5, 17))
summary(m1$fitted)
# Min.    1st Qu.     Median       Mean    3rd Qu.       Max. 
# -1.812e-01 -5.685e-04 -1.947e-04 -4.592e-03 -8.000e-08  0.000e+00 

I was wondering if there is a way to constrain rates to non negative values already. Otherwise it might be a good idea to add this constrain in the fitting algorithm.

law = "makeham" producing values > 1000

Hi Marius!

Is there something wrong with this set of steps? This example data is coming from DDSQLtools, where you once left a note to dig deeper if ever a value >1000 were produced. The example data don't seem pathological, and the extrapolation isn't that far. HT @cimentadaj

mx <- c(0.0296771105565131, 0.00665264110919088, 0.00342875960445963, 
        0.00377901902282611, 0.00707116273138672, 0.0134041945263743, 
        0.0184073459431529, 0.0222593253608793, 0.0257132898736745, 0.0360888832546771, 
        0.0481218078806996, 0.0734298380538821, 0.107445906251669, 0.153227233797312, 
        0.192783421516418, 0.254067819476128, 0.349650365024805, 0.604515542507172, 
        0.705407912909985, 0.828063756883144)
x <- c(0,1,seq(5,90,by=5))


M <- MortalityLaw(
  x = x,
  mx = mx,
  fit.this.x = c(75, 80, 85, 90) ,
  law = "makeham",
  opt.method = "poissonL"
) 
pv <- predict(object = M,
              x = c(90, 95, 100, 105, 110))
range(pv)
summary(M)

NA’s not allowed

Testing the ungroup package in the Bangladesh data.
Issue raised by Jonas Scholey in January 22, 2018.

ug <- pclm2D(x = bang_Dx_2d$age, y = bang_Dx_2d[,-1],
                         offset = bang_Nx_2d[,-1], nlast = 20)
##  Error: 'y' contains NA values

NA’s not allowed. The problem with the Bangladesh data is that the last age group is not the same across the years, e.g. 1975 to 1981 the last observed age is 65+, 1982 to 1983 its 80+ and after that its 85+. The varying open age is the source of NA’s in higher ages.

Right now my only option would be to truncate all life-tables to have a last age group of 65+ and then run the 2D plcm on these truncated life-tables. Not good as I throw away data this way.

I’m not sure about the proper solution. I know that PCLM needs a rectangular surface, so part of the
solution would be to have a fixed omega (closing age of life-table) instead of a fixed nlast parameter. This ensures that even if the starting age of the last age group varies, the upper limit of the last age-group will not. Therefore I propose you have an omega (closing age of life-table) parameter instead of a nlast parameter.

So one solution would be for PCLM to recode any series of NAs over the last ages to 0s and redistribute the last observed Dx and Nx from [last observed x, omega]. You’d still have to think about how to deal with a cohort that dies out before reaching the last age group. . .

More generally, I strongly prefer an na.rm option as present in many base R functions. As PCLM is a smoother, it may very well be used to smooth over missing values. Missings due to varying last-age groups are tricky, but missings in between are not, you can simply smooth over them.

Ok. But for now let’s stick to truncating my data. The data is truncated so that the last age group is 65 across all periods. Accordingly I choose a higher nlast.

age_trunc <- bang_Dx_2d$age[1:18]
anyNA(age_trunc)
## [1] FALSE

bang_Dx_2d_trunc <- bang_Dx_2d[1:18, -1]
anyNA(bang_Dx_2d_trunc)
## [1] FALSE

bang_Nx_2d_trunc <- bang_Nx_2d[1:18, -1]
anyNA(bang_Nx_2d_trunc)
## [1] FALSE

ug <- pclm2D(x = age_trunc, y = bang_Dx_2d_trunc,
offset = bang_Nx_2d_trunc, nlast = 40)
## Ungrouping offset Ungrouping data
## Error in if ((d < tol || dd < 0.1) && it >= 4) break: missing value where TRUE/FALSE needed

Don’t know what went wrong here.

zero tolerance

It seems one could allow for 0s in the count being split with some simple filters. Here's an example, where the open age group (100+) has 0 count, and it breaks. If we only supply up to age 95, then it works. I can imagine some easy filters handling final age groups, or temporarily imputing a positive number for fitting, then putting the 0 back in afterwards. I do not know what the best way to handle 0s in intermediate ages would be, however. Any thoughts?

 Age <- c(0,1,seq(5,100,by=5))
 Value <- c(10000, 44170, 44775, 42142, 38464, 34406, 30386, 26933, 23481, 
            20602, 16489, 14248, 9928, 8490, 4801, 3599, 2048, 941, 326, 
            80, 17, 0)
 n <- length(Age)
# error
 pclm(x = Age, y = Value, nlast = 1)
 
# works
 pclm(x = Age[-n], y = Value[-n], nlast = 5)

# putting a positive value in the 0 makes it not break:
# so maybe a before-after treatment would suffice?
 Value2 <- c(10000, 44170, 44775, 42142, 38464, 34406, 30386, 26933, 23481, 
            20602, 16489, 14248, 9928, 8490, 4801, 3599, 2048, 941, 326, 
            80, 17, 5)
 pclm(x = Age, y = Value2, nlast = 1)

persp example to help purge rgl dependency

Hi MP! I think you could save the current rgl magic under \dontrun{}, and make plot.pclm2D() use something inspired by this under the hood:

Dx <- ungroup.data$Dx
Ex <- ungroup.data$Ex

# Aggregate data to ungroup it in the examples below
x         <- c(0, 1, seq(5, 85, by = 5))
nlast   <-  26
n         <- c(diff(x), nlast)
group  <- rep(x, n)
y         <- aggregate(Dx, by = list(group), FUN = "sum")[, -1]
offset  <- aggregate(Ex, by = list(group), FUN = "sum")[, -1]

P1 <- pclm2D(x, y, nlast)

x <- P1$bin.definition$output$location["left",] 
y <- P1$fitted %>% colnames() %>% as.integer()
z <- P1$fitted

z.facet.center <- (z[-1, -1] + z[-1, -ncol(z)] + z[-nrow(z), -1] + z[-nrow(z), -ncol(z)])/4
# Range of the facet center on a 100-scale (number of colors)
z.facet.range<-cut(z.facet.center, 100)
colpal<- colorspace::sequential_hcl(100,palette = "ag_GrnYl")

persp(x = x, 
      y=y, 
      z = z, 
      theta = 210, 
      col = colpal[z.facet.range], 
      border = "#AAAAAA50",
      shade = NA, 
      zlab = "fitted")

image

pclm failing

Loving this package, but presently stumped with a simple case. Here are some single age example data from the DemoTools package (pop1m_ind). I group them to almost abridged ages (except 1-4 is here 1-2 and 3-4). pclm() warns heavily for this case, but returns some fitted values. The sum of the values isn't particularly close to the sum of the original vector, however, and taking a look at the single age output shows unexpected patterns. Any guidance?

NB1: I get the same behavior if all age groups are of uniform width 5.
NB2: I tried giving control = list(lambda = 1e-5) and similar, but same behavior.

library(ungroup)
# start with some single age data
y_orig <- c(9544406, 7471790, 11590109, 11881844, 11872503, 12968350, 11993151, 
        10033918, 14312222, 8111523, 15311047, 6861510, 13305117, 7454575, 
        9015381, 10325432, 9055588, 5519173, 12546779, 4784102, 13365429, 
        4630254, 9595545, 4727963, 5195032, 15061479, 5467392, 4011539, 
        8033850, 1972327, 17396266, 1647397, 6539557, 2233521, 2101024, 
        16768198, 3211834, 1923169, 4472854, 1182245, 15874081, 1017752, 
        3673865, 1247304, 1029243, 12619050, 1499847, 1250321, 2862148, 
        723195, 12396632, 733501, 2186678, 777379, 810700, 7298270, 1116032, 
        650402, 1465209, 411834, 9478824, 429296, 1190060, 446290, 362767, 
        4998209, 388753, 334629, 593906, 178133, 4560342, 179460, 481230, 
        159087, 155831, 1606147, 166763, 93569, 182238, 53567, 1715697, 
        127486, 150782, 52332, 48664, 456387, 46978, 34448, 44015, 19172, 
        329149, 48004, 28574, 9200, 7003, 75195, 13140, 5889, 18915, 
        21221, 72373)

x       <- c(0,1,3,seq(5,100,5))
agi    <- c(1,2,2,rep(5,19),1)
y       <- tapply(y_orig,rep(x,times=agi),sum)
nlast  <- 1

# equal length
# length(x); length(y)

# lots of warnings
#  In y_/K$muA : longer object length is not a multiple of shorter object length
y1mod <- pclm(x = x, y = y, nlast = nlast)

# sums far off
sum(y1mod$fitted) 
# 393272639 for me
sum(y)                     
# 432502563

Allow for missing years (and interpolate) in two-dimensional smooth

My case is that I have population data for most part of the period every third year, but I have mortality data each year (but that might be missing as well for other regions). It should in theory be possible to interpolate over the years with missing data. The only thing I have come up with is this ad hoc solution:

Say I have population data at time point 1 and 3, and mortality data for 1, 2 and 3. Then the estimated population in time point 2 would be some kind of mean of pop_hat_t1 - mortality_hat_t1 (and then shift x axis) and pop_hat_t3+mortality_hat_t3 (hat is for estimated). And possibly iterate back and forward to minimize the difference. But it would be cumbersome when it would be possible to just smooth instead (albeit with some information borrowing from the future in case of new-borns/edge cases)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.