ramhiser / paper-hdrda Goto Github PK

View Code? Open in Web Editor NEW

0.0 5.0 0.0 8.55 MB

High-Dimensional Regularized Discriminant Analysis

R 20.11% Makefile 0.64% TeX 79.25%

r paper machine-learning classification regularization discriminant-anlaysis high-dimensional-data

paper-hdrda's Introduction

High-Dimensional Regularized Discriminant Analysis

This repository includes the paper and simulation code. Our paper is available on arXiv.org.

Installation

In order to reproduce the simulations and build the article, requisite R packages must be installed. The requirements.r file is a script that will install necessary packages. Either run the source or from the CLI:

Rscript requirements.r

paper-hdrda's People

Contributors

Watchers

paper-hdrda's Issues

Add timing comparison

Use the rda function from the klaR package as baseline

Add coauthor organizations to paper

JAR

uStudio, Inc.
1806 Rio Grande St
Austin, TX 78701
CKS

Myeloma Institute
University of Arkansas for Medical Sciences
4301 West Markham # 816
Little Rock, Arkansas 72205
PDY

Department of Management and Information Systems
Baylor University
One Bear Place #98005
Waco, Texas 76798-7140
DMY

Department of Statistical Science
Baylor University
One Bear Place #97140
Waco, Texas 76798-7140

Move paper to generic LaTeX template with doublespacing

Submit paper to CSDA

After #21 and #19 are complete, submit the paper to CSDA. Instructions on CSDA's site.

Consider other classifiers

Add rda from the klaR package
Drop Pang's classifier?
Add kNN and SVM?

Investigate metrics than error rate

Examples:

Precision
Recall
F-score
AUC
ROC

Write cover letter to CSDA

The cover letter to AoAS needs to be updated for CSDA. Straightforward enough.

Upload final draft to arXiv

After Mrs. Young proofreads the paper and I've applied the edits, upload the paper to arXiv before submitting paper to CSDA.

Change title and name of classifier

Our proposed classifier clearly does not generalize the RDA classifier but instead improves and modernizes it for high-dimensional data. With this in mind, we need a slick name. The title of the paper should reflect the name somehow.

Add timing comparison

Previously, I coded up a timing comparison, comparing simdiag::hdrda vs klaR::rda. We'll bring that back with a more exhaustive comparison and also include the PenalizedLDA::PenalizedLDA from Witten and Tibshirani (2011).

I'll also add the diagonal classifiers but will likely forego including them in the paper. We will need to remark that loosely justifies this. Frankly, the diagonal classifiers will be much faster because they are simpler, but at the cost of classification accuracy.

In the Witten and Tibshirani (2011) paper, they also perform a timing comparison with 4 populations with varying feature dimensions:

p=20
p=200
p=2000
p=20000

They perform the timing comparison over 25 repetitions and report the mean and standard deviation of the runtimes. We'll do something similar but with a lot more repetitions.

Create figure of classification accuracies

Revamp two computational complexity paragraphs

These two paragraphs immediately follow Proposition 1 and are found on pages 9-10 of AoAS submission.

Make more concise
Incorporate computational complexity
(Optional) Write algorithm for training/model selection (JCGS papers do this)

Consider classification study with simulated data

Although I find classification studies with simulated data largely pointless, I'd rather add roughly two simulation configurations to ensure the paper is published. If a referee desires more than two, so be it. But two should be good enough.

Rewrite salespitch of HDRDA classifier in introduction

After the regularization literature review in the introduction, we begin with "Here, we propose the high-dimensional RDA classifier..." I am now of the opinion that this is not the route we want to go. It does not reference to Friedman's (1989) RDA classifier, and at least one reviewer was critical of this.

The wording then needs to change but still maintain a strong presence. Here are working blurbs to add to paragraph or to replace original sentences.

(After brief discussion of Friedman's classifier) We reparameterize Friedman's (1989) RDA classifier so that the resulting covariance-matrix estimator is a convex combination of ... Our parameterization improves the interpretation of the contribution of each observation weighted by the pooling parameter. We show that our parameterization results yields an equivalent, dual decision function that can be efficiently calculated for p >> N.

Record optimal tuning parameters from classification study

Summarize in paper. How?

Possibilities:

Table
2D Heatmap

Apply CSDA LaTeX template

After Mrs. Young proofreads the paper and I've applied the edits, the paper needs to be transitioned to Elsevier's LaTeX template. Instructions are here.

Compare classifiers on simulated data sets

Finetune intro

The CSDA reviewers made a few comments that lead me to believe the intro should be more clear. With that in mind, we need to finetune the intro and ensure we are communicating the following:

We modify RDA to gain interpretability
HDRDA inherently yields dimension reduction
HDRDA is much faster than RDA
- Mention savings in terms of Big Oh
- Timing comparison (from #26)
HDRDA should be used instead of RDA (for p > N?)
- Add RDA to classification study?
HDRDA is competitive in terms of classification performance

Rewrite abstract

I am not happy with the current abstract. It does not sell the paper well enough.

Apply edits from Mrs. Young

Mrs. Young is proofreading the paper. After she's finished, apply her edits.

Investigate alpha_k = 1 - gamma

If effective, demonstrate this and original approaches in paper?

Submit paper to JSCS

Journal: Journal of Statistical Computation and Simulation

Instructions for authors.

Update abstract and intro from biology to generic high-dimensional discussion

Focus on general high-dimensional data
See Jieping Ye's papers and follow suit

Add link to GitHub repo in paper

Mention open-source, reproducible, etc.

Update \oplus notation to 2x2 block-diagonal

The \oplus notation used in the paper is less conventional and may be a bit confusing. Instead, we'll switch to 2x2 block-diagonal matrices.

For example, in equation 10, we use the notation W_k \oplus \gamma I_{p-q}. Instead, we should replace this with:

\begin{bmatrix}
W_k & 0 \\
0 & \gamma I_{p-q}
\end{bmatrix}

The results will be more intelligible. It will take a bit of effort though to ensure that no orphan notation is introduced.

Revamp discussion

Currently, the discussion is weak. Rewrite it. Items that should be discussed:

Shrinkage includes several methods with the appropriate choice of alpha and gamma
We allow the differences in covariance matrices to relax the common linear assumptions employed for high-dimensional data (i.e., LDA)
We do not require the restrictive assumption that features are uncorrelated/independent

Discuss computational complexity

We have stressed that our proposed classifier is much faster. We need to add more details to back up our claim.

Provide computational complexity for proposed classifier
Contrast with computational complexity of the original RDA classifier

Rewrite results after sims are revamped

Create figure showing contours as a function of lambda

The idea here is to demonstrate the effect of the tuning parameter lambda. One emphasis in the paper that we are leaning towards is stressing the benefits of relaxing the linearity assumptions of the LDA classifier.

The figure should display the following:

Contours for approximately 4-5 populations
The covariance matrices should be obviously different when lamdba = 0
The contours should be identical for lambda = 1
Display 5 subfigures each for lamdba = 0, 0.25, 0.5, 0.75, 1.

When lambda is introduced in the paper, add one sentence that says we demonstrate the effect of lambda in the figure.

ramhiser / paper-hdrda Goto Github PK

paper-hdrda's Introduction

High-Dimensional Regularized Discriminant Analysis

Installation

paper-hdrda's People

Contributors

Watchers

paper-hdrda's Issues

Recommend Projects

Recommend Topics

Recommend Org