Comments (3)
Hi Michael,
Of course I am always open to contributions.
First a question, how does this compare to the DeLong code that I have in src/delong.cpp
? This code also calculates the AUC (it is returned in the theta
item, see line 105), and they are supposed to be equivalent.
You may also want to check the RcppBootstrap branch: xrobin/pROC/RcppBootstrap. I have a C++ implementation for ROC curve and AUC that is much faster than the current one (in src
and inst/include
specifically). I never finalized it and it is still failing some tests when direction
is changed but otherwise should be working.
I guess the real question here is how to get rid of the overhead due to the construction of the ROC curve. I see the following things to think about:
- What interface do you want to provide? Currently you first need to create a
roc()
object and then callauc()
. At the moment you can callauc
directly with the data. The AUC object contains a reference to the ROC curve (actually a copy) but I think it would be possible to either change that, or introduce aroc.auc
constructor (that might have an overhead too). Do you have an example code in mind the the user would call? - This method would probably be restricted to the full AUC?
from proc.
Hi
Thanks for responding quickly. Yes, the code in src/delong.cpp
is the fast algorithm that I am referring to. I looked at the source code for pROC::auc
when a vector of responses and predictors was passed and saw that it called pROC::roc.default
. The interface that I imagine would be most useful is a user-facing function that calculates AUC without creating the ROC curve as an intermediate step. For an example of the code that the user would call, look at Metrics::auc
. Unfortunately, the implementation of that function struggles with integer overflow if the size of the data is large. Also, Metrics::auc
uses base::rank
which could be improved with a c++ version like the one provided by data.table::frank
. However, this performance speed-up is small. On my computer, with 1 million observation, base::rank
runs in 0.6 seconds and data.table::frank
runs in 0.1 second. With 10 million observations, it is 11 seconds and 1 second.
Since this method is restricted to the full AUC, it might make sense for this functionality to be provided in a separate function that exists outside of the primary pROC::roc
pipeline. What do you think? A primary use for this function would be to evaluate the performance of a model quickly when performing a search over the feature or hyper-parameter space.
from proc.
I agree it should be separate, and actually it sounds like it should be a separate package.
pROC does a lot of checks on the inputs, and accepts a pretty large range of formats (numeric, ordered, dealing with NAs etc.), using arbitrary levels and direction for the comparison. This has of course a significant impact on the runtime that you'll probably want to avoid if you're interested in pure speed. It has little impact when dealing with large data sets, but I can see a usefulness for your code also when dealing with a large number of curves. I think it would be confusing to have some functions not check their inputs as thoroughly in pROC and I'd rather avoid that.
A separate package such as fastAUC
would give you the freedom to skip those checks altogether. Starting from the delong.cpp code it should be pretty straightforward. The function really only takes the case and control vectors as input and calculates the AUC. Or did you have a different implementation in mind?
from proc.
Related Issues (20)
- ggroc.list parameter legacy.axes break HOT 2
- One-sided CIs for AUCs HOT 2
- Averaging 10 ROC curves HOT 4
- How to print the threshold without specificity and sensitivity HOT 2
- Cannot create a roc curve with a formula and a with clause HOT 2
- CRAN submission failed with new message Apparent methods for exported generics not registered
- Fix warning: `aes_string()` was deprecated in ggplot2 3.0.0 HOT 1
- Move aes_string() to aes() HOT 1
- Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0. HOT 1
- The `path` argument of `expect_doppelganger()` is deprecated as of vdiffr 1.0.0. HOT 1
- Uncaught warnings in tests HOT 1
- Support for spaces in column names with formula
- A non-monotonic ROC is being produced by ggroc HOT 2
- Obuchowski and McClish (1997) sample size calculation incorrect HOT 6
- Mean ROC curve in ggroc() HOT 5
- pROC, detectable AUC HOT 2
- What does "direction" mean in roc function HOT 3
- Default method parameter in ci.auc function is different from documentation HOT 1
- Example for AUPRC with confidence interval HOT 1
- Incorrect AUC value and CI [bug] HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from proc.