andykrause / hpir Goto Github PK

View Code? Open in Web Editor NEW

15.0 15.0 8.0 66.68 MB

House Price Indexes in R

R 63.09% HTML 36.91%

hpir's People

Contributors

Stargazers

Watchers

Forkers

zhaoxiaohe astraetech asobacloud gcasamat juisen ahlusar1989 hamin123

hpir's Issues

Impute missing periods

Use imputeTS package to impute missing periods.
-- May require conversion to a ts class object

Add comparers

Create some simple functionality to compare multiple indexes. Plot, analysis, etc.

Auto-generate accuracy data

Should wrapper functions automatically generate a set of accuracy data to do future accuracy tests?

Might work when only have the 'rt' approach, but will become considerably more difficult when adding additional methods.

hpiModel objects are capturing the full output of the lm, rlm, etc models. As these hold much of the estimation object, they are creating massive objects. Need to only capture the necessary information.

Sales versus Transaction terminology

Use of sales leaves out rental models. Consider changing to 'transactions'?

Create blending functionality

Blend 2 or more indexes together.
Weights of each based on parameter

1/n
By some vector

Error
Volatility
Combination
???
Have a parameter that tells if the vector is positive or negative (high is bad)

Give ancestry of the blend, including weights

Which means that the indexes need names...

zpiBlendIndex <- function(index.list,
                          data.obj,
                          blend.type,
                          error.wgt=zpi.options$calc$error.wgt){
  
 ## Set Error Weights
 
  # Set up capture list
  wgt.list <- list()
    
    # For each error obj, create a small df with necessary information (raw or smooth)
  for(i in 1:length(index.list)){
      
    error.temp <- index.list[[i]]@error
    if(length(error.temp) == 0) error.temp <- NA
      
    temp <- data.frame(nbr=names(index.list)[i],
                       error=error.temp,
                       vol=index.list[[i]]@vol)
    if(is.na(temp$error)) temp$error <- temp$vol
    
    wgt.list[[i]] <- temp  
  }
    
  # Convert to df
  wgts.df <- rbind.fill(wgt.list)
 
  # Recipricolate error and volatily measures
  wgts.df$e <- 1/wgts.df$error
  wgts.df$v <- 1/wgts.df$vol
  
  # Fix situations with INF or -INF values
  wgts.df$e <- ifelse(!is.finite(wgts.df$e), 100, wgts.df$e)
  wgts.df$v <- ifelse(!is.finite(wgts.df$v), 200, wgts.df$v)
  
  # If error and volatility are to be blended
  if(error.wgt > 0 & error.wgt < 1) {
    wgts.df$e.value <- (wgts.df$e / sum(wgts.df$e))*error.wgt
    wgts.df$v.value <- (wgts.df$v / sum(wgts.df$v))*(1-error.wgt)
    wgts.df$sum <- wgts.df$e.value + wgts.df$v.value
  }
  
  # If only volatility
  if(error.wgt == 0) {
    wgts.df$sum <- (wgts.df$v / sum(wgts.df$v))
  }
  
  # If only errors
  if(error.wgt == 1) {
    wgts.df$sum <- (wgts.df$e / sum(wgts.df$e))
  }
  
  # Compute sum of Weights
  wgts <- wgts.df$sum
 
 ## Combine Indexes   
  
  # Create capture list
  index.comb <- list()
  
  # Loop through and weight various indices
  for(j in 1:length(wgts)){
    index.comb[[j]] <- index.list[[j]]@value * wgts[j]
  }
  
  # Combine into a single index
  blend.index <- Reduce('+', index.comb)
  
 ## Create blended index object
  
  # Index object
  index.obj <- new('zpiindex',
                   market=zpi.options$current$market,
                   name=zpi.options$current$name,
                   level=zpi.options$current$level,
                   data='bld',
                   model=blend.type,
                   estimator='bld',
                   dme=blend.type,
                   time=1:length(blend.index),
                   value=blend.index,
                   lo=0,
                   hi=0,
                   impute=0,
                   vol=zpiIndexVol(blend.index),
                   ancestry=data.frame(type = names(index.list),
                                       wgts = wgts),
                   pred.method=zpi.options$current$pred.method)
  
 ## Return Values
  
  return(index.obj)
  
}

Create Error Calculation Functionality

Create ability to judge 'accuracy' of an index.

Two Keys ways:

Predict the value of a re-sale (absolute)
Improvement of a predictive model from NULL (no time adjustment) (relative)

Two approached to judge the re-sale approach

Holdout
Forecast Prediction

*Forecast prediction method does require choice of a forecaster (probably use forecast package).

See: U:\Andy\hpi_files_functions\hpiErrorAnalysis.r for more details.

Add random forest option

Create Blending Option

This includes:

Methods for accuracy, ancestry, comparison and series formation.

Clean up Help Headers

Many of the function help headers are out of date or just wrong (from copy and pasting).

Also, determine a system for using examples in function help documentation.

Add smoothers

Create a set of smoothing functions that can smooth out the index. Retain ability to keep raw numbers as well.

Make this a class that inherits the basic index class.

Provide warning if missing periods

Provide a warning if the periodicity is so fine that missing periods occur. Suggest next coarser periodicity.

Convert error responses from NULL to Error

In many of the data and model functions, errors are returned as NULL. Change this to ERROR.

Create minimum limit on repeat transaction distances

Essentially allow a parameter to limit home flips from the data..

Allow custom weights in Repeat-Trans model

Allow the Repeat Transaction model to take a custom weight set (not be limited to just Case-Shiller weighting)

I think this fix will work (but need to look into it a bit more):

rtModel.weighted <- function(rt_df,
                             time_matrix,
                             price_diff,
                             estimator,
                             ...){

  if (is.null(list(...)$weights)){
    # Run base model
    lm_model <- stats::lm(price_diff ~ time_matrix + 0)

    # Estimate impact of time dif on errors
    rt_df$time_diff <- rt_df$period_2 - rt_df$period_1
    err_fit <- stats::lm((stats::residuals(lm_model) ^ 2) ~ rt_df$time_diff)

    # Implement weights
    wgts <- stats::fitted(err_fit)
    wgts <- ifelse(wgts > 0, 1 / wgts, 0)
  } else {

    wgts <- list(...)$weights
  }

  # Re-run model
  rt_model <- stats::lm(price_diff ~ time_matrix + 0, weights=wgts)
  # Add Class
  class(rt_model) <- 'rtmodel'

  # Return
  rt_model

}

Ensure `prop_id` is not a factor

Can create very large sized objects if not corrected.

Series Volatility should have naming option and summary

Should be able to give the series volatility calculations a name (to allow distinct analyses of smooth versus not)

Should have a summary function to allow for global comparisons between indexes/series.

Add relative periods

Right now, periods start and end on calendar static days (all annual time frames start on Jan 1, end on Dec 31), etc.

Provide an option to make these relative to the start OR the end date.

Add a wrapper to smooth all indexes in a series.

Add functionality that allows for all indexes to be easily smoothed using the same parameters.

Returning GridExtra plots

How to return GridExtra plots as object (with a class)?

Create an `hpits` class

Create an object class that inherits from ts, but includes names, values and index numerics.

Create methods for this class.

Differentiate Series and Index Accuracy

Both in objects and in functionality.

Estimates of uncertainty

Develop a metric of uncertainty (standard errors, etc.)

What does literature say on this??

Add Median Price method

Mostly for comparison's sake

Add performance metric functionality

Error rates
Volatility
Revision
(Relative Performance as adjustment in basic OLS model)

Remove Blending

Remove blending functions. Will add back in later.

#' @title blendIndexes
#' @description Blend together two or more indexes
#' @usage blendIndexes(index_list, weights)
#' @param index_list A list of identical length indexes (objects of class `hpiindex`)
#' @param weights default=NULL; A vector of weights the same length as the `index_list`
#' @param ... Additional Arguments
#' @return an `hpiblend` (S3) object, inheriting from `hpiindex` (S3) object containing:
#' \item{name: vector of period names}
#' \item{numeric: vector of period absolute values}
#' \item{period: vector of period relative values}
#' \item{index: `ts` object of the blended index values}
#' \item{imputed: list with a vector of imputation for each of parent indexes}
#' \item{blended: logical indicating that this is a blended index}
#' \item{weights: vector of the weights used to blend}
#' \item{parents: list of `ts` objects of the parent indexes}
#' @section Further Details:
#' Leaving "weights" to be NULL results in a 1/n weighting.
#'@examples
#'\dontrun{
#'blend_index <- blendIndexes(index_list = list(rt_index, hed_index),
#'                            weights = c(.6, .4))
#'}
#' @export

blendIndexes <- function(index_list,
                         weights=NULL,
                         ...){

  # Check classes
  cs <- unlist(lapply(index_list, function(x) 'hpiindex' %in% class(x)))
  if (any(!cs)){
    message('All objects in "index_list" must be objects of the class "hpiindex')
    stop()
  }


  # Check if all same length,
  lens <- lapply(index_list, function(x) length(x$index))
  if (length(unique(lens)) > 1){
    message('All indexes must be the same length')
    stop()
  }

  # Compute weights
  if (is.null(weights)){
    weights <- rep(1 / length(index_list), length(index_list))
  } else {
    if (length(weights) != length(index_list)){
      message('Weights must be the same length as the index_list')
      stop()
    }
    if (round(sum(weights), 4) != 1){
      message('Weights must sum to 1')
      stop()
    }
  }

  # Compute contribution
  index_w <- purrr::map2(.x=index_list,
                         .y=weights,
                         .f=function(x, y) x$index * y)

  # Sum and convert to TS
  index_blend <- Reduce('+', index_w)

  # Return Values
  structure(list(name = index_list[[1]]$name,
                 numeric = index_list[[1]]$numeric,
                 period = index_list[[1]]$period,
                 index = index_blend,
                 imputed = lapply(index_list, function(x) x$imputed),
                 blended = TRUE,
                 weights=weights,
                 parents=lapply(index_list, function(x) x$index)),
            class = c('hpiblend', 'hpiindex'))

}


plot.hpiblend <- function(b_index){

  # Create index data
  index_data <- data.frame(period=b_index$period,
                           index=as.numeric(b_index$index),
                           name='Blended',
                           type='b',
                           stringsAsFactors=FALSE)

  anc_data <- data.frame(period=rep(b_index$period, length(b_index$parents)),
                         index=unlist(b_index$parents),
                         name=as.character(paste0('Ancestor  :',
                                                  sort(rep(1:length(b_index$parents),
                                                           length(b_index$index))))),
                         type='a',
                         stringsAsFactors = FALSE)

  plot_data <- rbind(index_data, anc_data)

  # Set colors and sizes
  col_vals <- c('gray50', 'blue')
  size_vals <- c(.5, 1.5)


  # Create plot
  blend_plot <- ggplot(plot_data,
                       aes(x=period, y=index,
                           group = as.factor(name),
                           color=as.factor(type),
                           size=as.factor(type))) +
    geom_line() +
    scale_color_manual(values=col_vals) +
    scale_size_manual(values=size_vals) +
    ylab('Index Value\n') +
    xlab('\nTime Period') +
    theme(legend.position='none',
          legend.title = element_blank())

  structure(blend_plot, class = c('blendplot', class(blend_plot)))

}

Complete Unit Testing

Finish units tests for:

Smoothing
~~Volatility Measures~~
Revision
* Series of Indexes
Accuracy Calculations (Forecast and KFold)

Add hedonic method

Add volatility calculator

Add a calculation of the volatility for each index.

Offer ability to change window (and maybe method)

'name' field in period_table is a factor

Fix this so it is character (or maybe a date of a certain format??)

Add ancestry indexes to blend plots

For blended index plotting method, add dashed plots of the ancestry indexes.

Enhanced Plotting

Plotting functions for

Error (accuracy)
Volatility
Revision
Some combinations of the above
Maps

Vignette Improvements

Discuss:

~~Smoothing~~
~~Volatility Measures~~
~~Revision~~
* ~~Series of Indexes~~
~~Accuracy Calculations (Forecast and KFold)~~
~~Blending~~

Determine:
Which message outputs to keep
~~Structure for consistently formatting functions, arguments, packages, etc.~~

Fix:
~~Plot Size Images~~
Give full arguments in all function calls

Split the unit tests

Should split them by functionality:

Prep Tools (dateToPeriod)
RS
Hed
Analyis Tools (vol, rev, acc, smooth, blend, etc.)

Plot.hpi fails with sf data

Use of sf related data causes failure in the plot.hpi function.

Methods for error estimation of blended and smoothed indexes

How to seemlessly calculate errors from blended or smoothed indexes?

Add Frequency Modulation Estimator

Add parcel points to the data

Add parcel points (in sf standard) to the data that comes with the package

Add explanation of imputation to vignettes

The vignette needs an explanation of imputation for missing periods in the model results.

This occurs in `modelToIndex()~ function.

Revision Functionality

Ability to calculate the revision-ness.

Need to happen in steadily growing, forward prediction type environment. There is no 'agreed' on method here. Check the literature.

Previous code used:


revisionWrap <- function(indexid.data,
                         index.data){
  
  ind.list <- names(table(indexid.data$usid))
  ind.cap <- list()
  
  for(qq in 1:length(ind.list)){
    
    ind.i <- index.data[grep(ind.list[[qq]], index.data$usid), ]
    
    rev.list <- list()
    for(i in 2:120){
      rev.list[[i-1]] <- sd(ind.i[ind.i$time==i,]$value)
    }
    ind.cap[[qq]] <- mean(unlist(rev.list), na.rm=TRUE)
  }
  names(ind.cap) <- ind.list
  
  ind.df <- data.frame(usid=names(ind.cap),
                       rev=unlist(ind.cap))
  
  return(ind.df)
  
}

Make classes add (inherit) early classes if not a list
Clearly define in the manuals

Move 'estimator' class object into generics

Don't have users input object of certain class, rather have the generic create it prior to dispatch.