andykrause / hpir Goto Github PK
View Code? Open in Web Editor NEWHouse Price Indexes in R
House Price Indexes in R
Use imputeTS
package to impute missing periods.
-- May require conversion to a ts
class object
Create some simple functionality to compare multiple indexes. Plot, analysis, etc.
Should wrapper functions automatically generate a set of accuracy data to do future accuracy tests?
Might work when only have the 'rt' approach, but will become considerably more difficult when adding additional methods.
hpiModel objects are capturing the full output of the lm
, rlm
, etc models. As these hold much of the estimation object, they are creating massive objects. Need to only capture the necessary information.
Use of sales leaves out rental models. Consider changing to 'transactions'?
Blend 2 or more indexes together.
Weights of each based on parameter
zpiBlendIndex <- function(index.list,
data.obj,
blend.type,
error.wgt=zpi.options$calc$error.wgt){
## Set Error Weights
# Set up capture list
wgt.list <- list()
# For each error obj, create a small df with necessary information (raw or smooth)
for(i in 1:length(index.list)){
error.temp <- index.list[[i]]@error
if(length(error.temp) == 0) error.temp <- NA
temp <- data.frame(nbr=names(index.list)[i],
error=error.temp,
vol=index.list[[i]]@vol)
if(is.na(temp$error)) temp$error <- temp$vol
wgt.list[[i]] <- temp
}
# Convert to df
wgts.df <- rbind.fill(wgt.list)
# Recipricolate error and volatily measures
wgts.df$e <- 1/wgts.df$error
wgts.df$v <- 1/wgts.df$vol
# Fix situations with INF or -INF values
wgts.df$e <- ifelse(!is.finite(wgts.df$e), 100, wgts.df$e)
wgts.df$v <- ifelse(!is.finite(wgts.df$v), 200, wgts.df$v)
# If error and volatility are to be blended
if(error.wgt > 0 & error.wgt < 1) {
wgts.df$e.value <- (wgts.df$e / sum(wgts.df$e))*error.wgt
wgts.df$v.value <- (wgts.df$v / sum(wgts.df$v))*(1-error.wgt)
wgts.df$sum <- wgts.df$e.value + wgts.df$v.value
}
# If only volatility
if(error.wgt == 0) {
wgts.df$sum <- (wgts.df$v / sum(wgts.df$v))
}
# If only errors
if(error.wgt == 1) {
wgts.df$sum <- (wgts.df$e / sum(wgts.df$e))
}
# Compute sum of Weights
wgts <- wgts.df$sum
## Combine Indexes
# Create capture list
index.comb <- list()
# Loop through and weight various indices
for(j in 1:length(wgts)){
index.comb[[j]] <- index.list[[j]]@value * wgts[j]
}
# Combine into a single index
blend.index <- Reduce('+', index.comb)
## Create blended index object
# Index object
index.obj <- new('zpiindex',
market=zpi.options$current$market,
name=zpi.options$current$name,
level=zpi.options$current$level,
data='bld',
model=blend.type,
estimator='bld',
dme=blend.type,
time=1:length(blend.index),
value=blend.index,
lo=0,
hi=0,
impute=0,
vol=zpiIndexVol(blend.index),
ancestry=data.frame(type = names(index.list),
wgts = wgts),
pred.method=zpi.options$current$pred.method)
## Return Values
return(index.obj)
}
Create ability to judge 'accuracy' of an index.
Two Keys ways:
Two approached to judge the re-sale approach
*Forecast prediction method does require choice of a forecaster (probably use forecast
package).
See: U:\Andy\hpi_files_functions\hpiErrorAnalysis.r for more details.
This includes:
Methods for accuracy, ancestry, comparison and series formation.
Many of the function help headers are out of date or just wrong (from copy and pasting).
Also, determine a system for using examples in function help documentation.
Create a set of smoothing functions that can smooth out the index. Retain ability to keep raw
numbers as well.
Make this a class that inherits the basic index class.
Provide a warning if the periodicity is so fine that missing periods occur. Suggest next coarser periodicity.
In many of the data and model functions, errors are returned as NULL. Change this to ERROR.
Essentially allow a parameter to limit home flips from the data..
Allow the Repeat Transaction model to take a custom weight set (not be limited to just Case-Shiller weighting)
I think this fix will work (but need to look into it a bit more):
rtModel.weighted <- function(rt_df,
time_matrix,
price_diff,
estimator,
...){
if (is.null(list(...)$weights)){
# Run base model
lm_model <- stats::lm(price_diff ~ time_matrix + 0)
# Estimate impact of time dif on errors
rt_df$time_diff <- rt_df$period_2 - rt_df$period_1
err_fit <- stats::lm((stats::residuals(lm_model) ^ 2) ~ rt_df$time_diff)
# Implement weights
wgts <- stats::fitted(err_fit)
wgts <- ifelse(wgts > 0, 1 / wgts, 0)
} else {
wgts <- list(...)$weights
}
# Re-run model
rt_model <- stats::lm(price_diff ~ time_matrix + 0, weights=wgts)
# Add Class
class(rt_model) <- 'rtmodel'
# Return
rt_model
}
Can create very large sized objects if not corrected.
Should be able to give the series volatility calculations a name (to allow distinct analyses of smooth versus not)
Should have a summary function to allow for global comparisons between indexes/series.
Right now, periods start and end on calendar static days (all annual time frames start on Jan 1, end on Dec 31), etc.
Provide an option to make these relative to the start OR the end date.
Add functionality that allows for all indexes to be easily smoothed using the same parameters.
How to return GridExtra plots as object (with a class)?
Create an object class that inherits from ts
, but includes names, values and index numerics.
Create methods for this class.
Both in objects and in functionality.
Develop a metric of uncertainty (standard errors, etc.)
What does literature say on this??
Mostly for comparison's sake
Remove blending functions. Will add back in later.
#' @title blendIndexes
#' @description Blend together two or more indexes
#' @usage blendIndexes(index_list, weights)
#' @param index_list A list of identical length indexes (objects of class `hpiindex`)
#' @param weights default=NULL; A vector of weights the same length as the `index_list`
#' @param ... Additional Arguments
#' @return an `hpiblend` (S3) object, inheriting from `hpiindex` (S3) object containing:
#' \item{name: vector of period names}
#' \item{numeric: vector of period absolute values}
#' \item{period: vector of period relative values}
#' \item{index: `ts` object of the blended index values}
#' \item{imputed: list with a vector of imputation for each of parent indexes}
#' \item{blended: logical indicating that this is a blended index}
#' \item{weights: vector of the weights used to blend}
#' \item{parents: list of `ts` objects of the parent indexes}
#' @section Further Details:
#' Leaving "weights" to be NULL results in a 1/n weighting.
#'@examples
#'\dontrun{
#'blend_index <- blendIndexes(index_list = list(rt_index, hed_index),
#' weights = c(.6, .4))
#'}
#' @export
blendIndexes <- function(index_list,
weights=NULL,
...){
# Check classes
cs <- unlist(lapply(index_list, function(x) 'hpiindex' %in% class(x)))
if (any(!cs)){
message('All objects in "index_list" must be objects of the class "hpiindex')
stop()
}
# Check if all same length,
lens <- lapply(index_list, function(x) length(x$index))
if (length(unique(lens)) > 1){
message('All indexes must be the same length')
stop()
}
# Compute weights
if (is.null(weights)){
weights <- rep(1 / length(index_list), length(index_list))
} else {
if (length(weights) != length(index_list)){
message('Weights must be the same length as the index_list')
stop()
}
if (round(sum(weights), 4) != 1){
message('Weights must sum to 1')
stop()
}
}
# Compute contribution
index_w <- purrr::map2(.x=index_list,
.y=weights,
.f=function(x, y) x$index * y)
# Sum and convert to TS
index_blend <- Reduce('+', index_w)
# Return Values
structure(list(name = index_list[[1]]$name,
numeric = index_list[[1]]$numeric,
period = index_list[[1]]$period,
index = index_blend,
imputed = lapply(index_list, function(x) x$imputed),
blended = TRUE,
weights=weights,
parents=lapply(index_list, function(x) x$index)),
class = c('hpiblend', 'hpiindex'))
}
plot.hpiblend <- function(b_index){
# Create index data
index_data <- data.frame(period=b_index$period,
index=as.numeric(b_index$index),
name='Blended',
type='b',
stringsAsFactors=FALSE)
anc_data <- data.frame(period=rep(b_index$period, length(b_index$parents)),
index=unlist(b_index$parents),
name=as.character(paste0('Ancestor :',
sort(rep(1:length(b_index$parents),
length(b_index$index))))),
type='a',
stringsAsFactors = FALSE)
plot_data <- rbind(index_data, anc_data)
# Set colors and sizes
col_vals <- c('gray50', 'blue')
size_vals <- c(.5, 1.5)
# Create plot
blend_plot <- ggplot(plot_data,
aes(x=period, y=index,
group = as.factor(name),
color=as.factor(type),
size=as.factor(type))) +
geom_line() +
scale_color_manual(values=col_vals) +
scale_size_manual(values=size_vals) +
ylab('Index Value\n') +
xlab('\nTime Period') +
theme(legend.position='none',
legend.title = element_blank())
structure(blend_plot, class = c('blendplot', class(blend_plot)))
}
Finish units tests for:
Add a calculation of the volatility for each index.
Offer ability to change window (and maybe method)
Fix this so it is character (or maybe a date of a certain format??)
For blended index plotting method, add dashed plots of the ancestry indexes.
Plotting functions for
Discuss:
Determine:
Which message outputs to keep
Structure for consistently formatting functions, arguments, packages, etc.
Fix:
Plot Size Images
Give full arguments in all function calls
Should split them by functionality:
Use of sf related data causes failure in the plot.hpi
function.
How to seemlessly calculate errors from blended or smoothed indexes?
Add parcel points (in sf standard) to the data that comes with the package
The vignette needs an explanation of imputation for missing periods in the model results.
This occurs in `modelToIndex()~ function.
Ability to calculate the revision-ness.
Need to happen in steadily growing, forward prediction type environment. There is no 'agreed' on method here. Check the literature.
Previous code used:
revisionWrap <- function(indexid.data,
index.data){
ind.list <- names(table(indexid.data$usid))
ind.cap <- list()
for(qq in 1:length(ind.list)){
ind.i <- index.data[grep(ind.list[[qq]], index.data$usid), ]
rev.list <- list()
for(i in 2:120){
rev.list[[i-1]] <- sd(ind.i[ind.i$time==i,]$value)
}
ind.cap[[qq]] <- mean(unlist(rev.list), na.rm=TRUE)
}
names(ind.cap) <- ind.list
ind.df <- data.frame(usid=names(ind.cap),
rev=unlist(ind.cap))
return(ind.df)
}
First two periods showing up always as 100.
Add a clear explanation of the class structure and hierarchy to the vignette and to the Readme
Add simple objects of each class to the package.
Reference these in the examples so that they actually run.
If the missing time periods are at the tail end of the time periods, give a warning that extrapolation is occuring.
Add option to smooth a smooth.
Make order = c(3,3) for example. Will smooth at three, then 3 again.
Will allow for more clear plotting methods
Don't have users input object of certain class, rather have the generic create it prior to dispatch.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.