fbroc 0.1.0 release

By | May 12, 2015

Last week I released my first R-package fbroc. Using fbroc you can use bootstrap analysis to quickly calculate confidence regions for the curve itself as well as derived performance metrics like the AUC.

Getting the package published was less work than I expected, because the new book by Hadley Wickham R packages was a big help in having my very first submission ever accepted. I strongly recommend it to everyone just getting started with R-package authorship.

The confidence interval for the AUC is [0.75, 0.8].

fbroc visualization of the ROC curve and the AUC including confidence intervals.

When you should use fbroc

Considering the number of R-packages on CRAN, it is of no surprise that there are several other packages for ROC curve analysis. Options include using boot (with ROCR) or the excellent pROC package.

My main priority in writing fbroc was to outperform the alternate options in terms of speed. If you just want to bootstrap a single ROC curve, then pROC is probable the better choice at this point of time, mainly due to its rich feature set.

However, when conducting simulation studies with a larger number of bootstrap replicates or when you want a fast response time, I hope that you will be happier with fbroc. In fact, one main motivation was the embedding of fbroc in a Shiny app, which doesn`t force the user to wait a few minutes after uploading data. My own implementation is already hosted here on my Shiny server.

For information on how to start using fbroc in your R scripts, please go here.

Benchmark

A short R-script will serve nicely to demonstrate the performance of fbroc relative to pROC and boot.

require(fbroc)
require(pROC)
require(boot)
require(ROCR)
require(ggplot2)

roc.stat <- function(x, index, true.class) {
  performance(prediction(x[index], true.class[index]), "auc")@y.values[[1]]
}

boot.with.ROCR <- function(x, y) {
  boot.ci(boot(data = x, statistic = roc.stat, R = 1000,
               strata = y, true.class = y), type = "perc")
}

n.seq <- c(5:25, seq(30, 50, 5), 65, 75, 100, 130, 150, 175, 250, 375, 500, 750, 
          1000, 1250, 1500, 2000, 2500, 3000, 4000, 5000)
length.seq <- length(n.seq)

time.fbroc <- time.pROC1 <- time.pROC2 <- time.pROC3 <- 
              time.boot.ROCR <- rep(0, length.seq)

for (i in 1:length.seq) {
  n <- n.seq[i]  # samples per group
  y <- rep(c(TRUE, FALSE), each = n)
  x <- rnorm(2*n) + 1.5 * y
  time.fbroc[i] <- system.time(perf.roc(boot.roc(x, y, n.boot = 1000), "auc"))[3]
  time.pROC1[i] <- system.time(ci.auc(roc(y, x, algorithm = 1), method = "bootstrap", 
                                      boot.n = 1000, progress = "none"))[3]
  time.pROC2[i] <- system.time(ci.auc(roc(y, x, algorithm = 2), method = "bootstrap", 
                                      boot.n = 1000, progress = "none"))[3]
  time.pROC3[i] <- system.time(ci.auc(roc(y, x, algorithm = 3), method = "bootstrap", 
                                      boot.n = 1000, progress = "none"))[3]
  time.boot.ROCR[i] <- system.time(boot.with.ROCR(x, y))[3]
}
fbroc outperforms all other options by at least one order of magnitude

Log-log plot showing a benchmark of the calculation time for pROC, ROCR and fbroc given different sample sizes.

Note that with low n overhead is the dominant factor for fbroc, ROCR and pROC algorithm 3. Good asymptotic behavior is demonstrated by fbroc, ROCR and pROC algorithm 2.

One nice feature of fbroc is that it is consistently better by at least one order of magnitude, while the relative performance of the other methods depend on sample size.

I will write more about the algorithm used by fbroc in the future, but it basically comes down to using an algorithm scaling linearly with the number of observations and implementing it efficiently in C++ via the package Rcpp.

Current features and future development

At the moment fbroc offers only a rather limited number of features:

  • Very fast bootstrapping of ROC curves.
  • Visualization of confidence regions for the ROC curve.
  • Analysis of the AUC including confidence intervals.

I will expand the scope of fbroc in the future and am already working on the next version which should hopefully be out early next month. However, I wanted to have the first version including the fast C++ algorithm on CRAN as quickly as possible as I consider it the core feature of the package.

My long-term vision for fbroc includes:

  • Paired ROC curve analysis
  • Power calculations
  • Cutoff optimization
  • Additional performance metrics (e.g. partial AUC)
  • Low-memory mode

Bold bullet points are slated to be included in the next release.

The Shiny application built on top of fbroc will also be updated to support and implement the new features offered in future releases.

In the next weeks, I will also write a series of posts that will go into more detail about ROC curves and the algorithm implemented in fbroc.

Closing note

Since this is also my first post on this page, I will be especially grateful for any comments and suggestions on how to improve my writing style.

Leave a Reply

Your email address will not be published. Required fields are marked *