Giter Club home page Giter Club logo

Comments (3)

xrobin avatar xrobin commented on July 18, 2024

Hi Michael,

Of course I am always open to contributions.

First a question, how does this compare to the DeLong code that I have in src/delong.cpp? This code also calculates the AUC (it is returned in the theta item, see line 105), and they are supposed to be equivalent.
You may also want to check the RcppBootstrap branch: xrobin/pROC/RcppBootstrap. I have a C++ implementation for ROC curve and AUC that is much faster than the current one (in src and inst/include specifically). I never finalized it and it is still failing some tests when direction is changed but otherwise should be working.

I guess the real question here is how to get rid of the overhead due to the construction of the ROC curve. I see the following things to think about:

  • What interface do you want to provide? Currently you first need to create a roc() object and then call auc(). At the moment you can call auc directly with the data. The AUC object contains a reference to the ROC curve (actually a copy) but I think it would be possible to either change that, or introduce a roc.auc constructor (that might have an overhead too). Do you have an example code in mind the the user would call?
  • This method would probably be restricted to the full AUC?

from proc.

mfrasco avatar mfrasco commented on July 18, 2024

Hi

Thanks for responding quickly. Yes, the code in src/delong.cpp is the fast algorithm that I am referring to. I looked at the source code for pROC::auc when a vector of responses and predictors was passed and saw that it called pROC::roc.default. The interface that I imagine would be most useful is a user-facing function that calculates AUC without creating the ROC curve as an intermediate step. For an example of the code that the user would call, look at Metrics::auc. Unfortunately, the implementation of that function struggles with integer overflow if the size of the data is large. Also, Metrics::auc uses base::rank which could be improved with a c++ version like the one provided by data.table::frank. However, this performance speed-up is small. On my computer, with 1 million observation, base::rank runs in 0.6 seconds and data.table::frank runs in 0.1 second. With 10 million observations, it is 11 seconds and 1 second.

Since this method is restricted to the full AUC, it might make sense for this functionality to be provided in a separate function that exists outside of the primary pROC::roc pipeline. What do you think? A primary use for this function would be to evaluate the performance of a model quickly when performing a search over the feature or hyper-parameter space.

from proc.

xrobin avatar xrobin commented on July 18, 2024

I agree it should be separate, and actually it sounds like it should be a separate package.

pROC does a lot of checks on the inputs, and accepts a pretty large range of formats (numeric, ordered, dealing with NAs etc.), using arbitrary levels and direction for the comparison. This has of course a significant impact on the runtime that you'll probably want to avoid if you're interested in pure speed. It has little impact when dealing with large data sets, but I can see a usefulness for your code also when dealing with a large number of curves. I think it would be confusing to have some functions not check their inputs as thoroughly in pROC and I'd rather avoid that.

A separate package such as fastAUC would give you the freedom to skip those checks altogether. Starting from the delong.cpp code it should be pretty straightforward. The function really only takes the case and control vectors as input and calculates the AUC. Or did you have a different implementation in mind?

from proc.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.