Author Archives: Erik

fbroc 0.3.0 released!

Yesterday, I submitted fbroc 0.3.0 to CRAN and it was accepted shortly afterwards. By now it should have been built for most platforms. As usual, I will spent the next time to upgrade the Shiny application as well. If you haven’t read it before, the main new feature in 0.3.0 is the analysis of paired ROC curves.

Afterwards I will have to decide whether I want to polish what is there, add some documentation and a vignette to release a v1.0. The alternative would be add one more major feature. Currently what I want to add most are some functionalities related to cutoff optimization.

Short update on fbroc development

After a long break due to a vacation and other taking care of some other more urgent issues, I have started resuming work on fbroc. I hope to have version 0.3.0 out in a few weeks. The main new feature (handling paired ROC curves) is already completed, but I need to polish the code and add documentation and more examples.

I also finally fixed a sort-of-bug when trying to plot or calculate confidence intervals for a ROC curve with a large (> number of negative samples. By default, the number of grid points into which the interval was subdivided to calculate confidence intervals for the TPR was set to the maximum of 100 and the number of negative samples.

This in turn led to an integer overflow when trying to allocate enough memory to store the TPR at each of these points for every bootstrap iteration. Changing the default to a more reasonable 250 also nicely reduces the time needed to plot the curve.

Another change will be that I will now use the S3 generic function perf to get performance estimates, instead of perf.roc. Paired ROC curves will be a different object and this will help make the interface intuitive. Some example code:

# Generate some example data
y <- rep(c(TRUE, FALSE), each = 500)
x1 <- rnorm(1000) + y
x2 <- 0.5*x1 + 0.5*rnorm(1000) + y

# Single ROC curve

result.boot <- boot.roc(x1, y)
perf(result.boot, "tpr", fpr = 0.2)

                Bootstrapped ROC performance metric

Metric: TPR at a fixed FPR of 0.2
Bootstrap replicates: 1000
Observed: 0.586
Std. Error: 0.037
95% confidence interval:
0.52 0.66

# Paired ROC curve

                Bootstrapped ROC performance metric

Metric: TPR at a fixed FPR of 0.2
Bootstrap replicates: 1000

Classifier 1: 
Std. Error: 0.037
95% confidence interval:
0.52 0.666

Classifier 2: 
Std. Error: 0.019
95% confidence interval:
0.844 0.918

Std. Error: 0.033
95% confidence interval:
-0.356 -0.226

Correlation: 0.47

Thoughts about Harrell’s Regression Modeling Strategies

This month I picked up the new second edition of the classic Regression Modeling Strategies by Frank E. Harrell, Jr. I had a copy of the first edition at my workplace for the last few years, but at work I didn’t have the time to really study it from cover to cover. The release of the second updated edition was a great opportunity to pick up an improved version of the book for myself.

Picture of the book cover of the Second Edition of Regression Modeling Strategies by Frank E. Harrell, Jr.

Book Cover of the Second Edition of Regression Modeling Strategies by Frank E. Harrell, Jr.

Since then I had time for a first cursory read, but at this moment I am just writing down some of my first impressions. So don’t consider this a full and informed book review as this point of time.

The book covers these three main areas:

  • General concerns when modelling like validation, feature selection and how to handle missing data
  • An Introduction to maximum likelihood estimation and the various types of regression models (e.g. logistic and survival models)
  • Case studies, in which the regression modeling strategies are applied to real problems

For me the case studies are the most valuable part of the book. They offer great insight into how an experienced analyst will be able to avoid many not-so-obvious pitfalls and get the most out of the collected data. The further reading sections at the end of each chapter are perfectly written to offer relevant excerpts and whet your appetite for more.

If I have the time for a full review, I will go into more detail later. For now, it suffices to say that there is a good reason Regression Modeling Strategies is considered a classical treatise on the subject of regression modelling strategies.

Another thing worth mentioning is the tight integration with R. While the presented strategies can be implemented in other languages, all the example code is in R. There is also a R-package rms accompanying the book. This new package replaces the old Design package, which has by now been removed from CRAN. There are many improvement, which are worth a post dedicated to rms alone.

Since R is my favorite statistics language this is a good for me, but unfortunate for people using other options like SAS, which is still the language of choice for most statisticians in the pharma business.

I do not completely share the enthusiasm of the author regarding the usage of splines. There are many situations where they are an excellent choice, but their interpretation is not straightforward. This especially holds when interactions are concerned.

This might be a matter of taste, but I do disagree with the recommendation of using significance tests for non-linearity to justify more complexity (said splines) to customers. I see the possible advantage, but this could well lead to the customer to later complain about using a parametric regression model whenever the Shapiro-Wilk test shows a slight deviation from normality.

While I already mentioned how valuable the further readings section are, there are some comments on the AUC and ROC curves in the second chapter by Briggs and Zaretzki which read like a bit of a strawman attack to me. To be precise, I have not yet fully read the cited papers themselves, so it is not unlikely that I will change my opinion later.

Even in that case, this comment is still relevant for the way that they are cited. In addition, as the author of the fbroc package I cannot deny a potential bias from my side.

They write:

The receiver of a diagnostic measurement… wants to make a decision based on some and is not especially interested in how well he would have done had he some different cutoff.

This is certainly true, but is also not the purpose for which a ROC curve should be used. For example, the ROC curve helps to decide whether a diagnostic test is better suited for confirming a suspicion or to screen a large population. In both cases, we have different prevalences of the disease. This in turn requires us to consider different trade-offs between sensitivity and specificity.

The ROC curve offers a good starting points for these considerations. It is not intended for the end-user, especially the patient receiving the result of a test. The same argument applies to the comment of David Hand:

The relative importance of misclassifying a case as a noncase, compared to the reverse, cannot come from the data itself. It must come externally, from considerations of the severity one attaches to the different kinds of misclassification.

Again, this is true. But this severity is often patient-specific and depends on the application and target population you are considering for the diagnostic test. The ROC curve should obviously not be the primary endpoint of the development of a routine diagnostic test.

Back from the Harz

Due to a thankfully mild eye infection I caught at the end of my vacation and the kids having their birthdays it has taken quite a bit longer than I announced before, but I will now able to continue developing fbroc and writing. Fortunately, our vacation in the Harz was great and I being well-rested always help my motivation.

The Harz is the highest mountain region (still not that high) in the northern part of Germany and in case you are going (or are considering to) there, here are some suggestions about what to do.

  1. Visit some mines. The Harz has a history of iron ore mining and every little town seems to have its own mine. It is very interesting to see how they worked there and there are often beautiful mineral formations as well. When you consider working conditions and the resulting life expectancy it makes one feel grateful to live here and now. Since you often have to stoop, I wouldn’t recommend if you have trouble with your back and are of above average height. Most of the mines also have museums to give the historical background.

  2. Not obviously for a mountain regions, there are lots of opportunities to go swimming. There are lots of lakes and small rivers, many built to supply all the mines with water to drive the wheels. Some of them are now open for bathing (often without payment). This was very nice on some of the hotter (>30 degrees Celsius) days.

  3. Above all, do some hiking. As an additional motivation, there is a stamp system, the Harzer Wandelnadel, where you get regarded with various badges, if you collect enough by hiking to various locations. As usual, this gamification also helps to motivate children. The stamps are usually well-placed, so that you are rewarded with a great view or a visit to a historical significant site.

There are also many other options, like visiting Goslar or the highest mountain in the region, the Brocken (1141 m). In addition, there are also many gondola lifts, summer toboggan runs and some nice castles. All in all, it’s well worth one or even several visits.

Castle in Wernigerode on a hill

Castle in Wernigerode

No updates for a while

For the next four weeks I will either be travelling or otherwise occupied. Updates to both this site and fbroc should resume in the second half of August.

Paired roc curves in fbroc 0.3.0

Currently I am working on fbroc 0.3.0. The main feature will be the possibility for paired ROC curves. Paired ROC curves are needed when you are comparing two different classifiers on the same dataset. From my experience this use case is even more common than looking at a classifier in isolation, since there is usually at least an imperfect standard method already. If this is not the case, then what you are trying to predict might not be considered interesting by other people.

As an example, look at the different tests available for HIV. They have different sensitivities, specificities and costs, so that it is important to know the limitations and advantages of each test.

Bootstrapping paired ROC curves

When bootstrapping paired data, it is important to not just bootstrap both classifier separately but do so jointly. This means that you have to use the same bootstrap shuffles for both classifiers. Remember that

$latex mathrm{Var}(X – Y) = mathrm{Var}(X) + mathrm{Var}(Y) – 2 mathrm{Cov}(X, Y)$.

So when X and Y are highly correlated, the variance of the difference is smaller than the individual variances of the two classifiers. Often a sample that is different to classify in one prediction model is also different for the alternative model. This leads to high correlation between two different classifiers. Therefore, if you want to compare the AUC of two classifiers it is important to use the correct bootstrap procedure. Bootstrapping both classifier separately ignores this correlation, leading to misleading results.

Since fbroc now has classes for ROC curves and they have been written in mind of me wanting to implement this feature, the amount of required work in the C++ section of the fbroc code was negligible. The only difficulty was that much of the code requires the observations to be sorted in order of ascending prediction values. Because this ordering is not be the same for both classifiers, I had to shuffle some of the outputs back in the original order.

Estimating performance for paired ROC curves with fbroc

The interface for working with paired ROC curves is similar to what is already implemented in fbroc 0.2.1. Here is a code example for you.

result.boot <- boot.paired.roc(roc.examples$Cont.Pred, 
result.perf <- perf.paired.roc(result.boot)
List of 13
 $ Observed.Performance.Predictor1: num 0.929
 $ CI.Performance.Predictor1      : num [1:2] 0.891 0.962
 $ Observed.Performance.Predictor2: num 0.895
 $ CI.Performance.Predictor2      : num [1:2] 0.842 0.944
 $ Observed.Difference            : num 0.0341
 $ CI.Performance.Difference      : num [1:2] 0 0.0761
 $ conf.level                     : num 0.95
 $ Cor                            : num 0.654
 $ metric                         : chr "AUC"
 $ params                         : num 0
 $ n.boot                         : int 1000
 $ boot.results.pred1             : num [1:1000] 0.92 0.963 0.932 0.928 0.891 ...
 $ boot.results.pred2             : num [1:1000] 0.899 0.951 0.882 0.907 0.866 ...
 - attr(*, "class")= chr [1:2] "list" "fbroc.perf.paired"

Unfortunately, I have not yet implemented a nice printing function yet, but even so the output should be understandable. Note that this is an example where there is a high correlation between both classifiers, so that the confidence interval for the difference is actually smaller than that for the performance of the second classifier.

Visualizing the relative performance of paired ROC curves

Of course, fbroc 0.3.0 will also allow you to plot paired ROC curves. The straightforward way is to just show two ROC curves in the same plot, similar to the ROC curves for each classifier.

Two paired ROC curves with their overlapping confidence regions.

Two paired ROC curves with their overlapping confidence regions.

It is also illuminating for which False Positive Rates our favored classifier is actually better than the alternative. If a FPR smaller than 10% is required, than a higher TPR for a FPR between 30 and 40% is worthless.

Difference in True Positive Rate (TPR) for two paired classifiers

Difference in True Positive Rate (TPR) for two paired ROC curves

Note that these graphs are both work in progress. For example, I am not completely happy with the default colors yet. I will also implement a similar plot for the difference in FPR over the TPR, but this required some small work on the C++ side of the code.


Please remember that the released versions of fbroc do not yet contain these features. If you really need them now drop me a note or try to get the latest version from github to work for you.

fbroc 0.2.1 release

The fbroc 0.2.1 release was added to CRAN yesterday. Actually, I released fbroc 0.2.0 the day before and then had to update again due to a pointer bug found by valgrind as kindly pointed out by Brian Ripley, which did causes crashes or erros in my testing. Further versions will always be tested thoroughly with valgrind before I release them to avoid this happening again.

The complete change log is below, but the main new features are

  • Analysis of TPR at a fixed FPR and vice versa
  • Included example data to test the package
  • Minor performance increase due to reduced memory overhead

If you want to get started quickly, the shiny app was also updated and can be found here.


In the example data now included with fbroc, one of the numerical predictors suffers from extreme outliers in the negative class. These outliers have higher values than all other samples, including the positives. As a result the number of outliers included in the bootstrap sample determines the shape of the ROC curve near a false positive rate of 0. Let us look at the data in the shiny app. You can also generate the same plot in the R console.

roc.obj <- boot.roc(roc.examples$Cont.Pred.Outlier, roc.examples$True.Class,
                    n.boot = 2000)
plot(roc.obj, show.metric = "tpr", fpr = 0.03)
The TPR at a FPR of 0.03 depends upon how many of the outliers are included, making the confidence intervall very wide.

Using fbroc 0.2.1 and the updated shiny app to analyse example data included in the package. There are some extreme outliers in the negative samples, so that there is no cutoff with a TPR > 1 at a FPR of zero. The TPR at a FPR of 0.03 depends upon how many of the outliers are included, making the confidence intervall very wide.

Note the wide confidence intervals. In this case, the performance histogram is very useful since it shows that the distribution has two discrete peaks. Again, you can also use the R console.

perf.obj <- perf.roc(roc.obj, "tpr", fpr = 0.03)
The histogram is bimodal with a large peak at zero and a smaller peak at around 0.45.

Histogram of the bootstrapped TPR at a fixed FPR of 0.03. There is a high chance of a TPR of zero and a small chance of a larger TPR if the outliers are not included in the bootstrap sample.

Please feel free to experiment further. The results for the discrete predictor can be very interesting as well. Note that I documented how fbroc handles discrete predictors here.

Comple changelog

fbroc 0.2.1


  • fixed a off-by-one pointer error

fbroc 0.2.0

New features

  • Allow uncached bootstrap of the ROC curve to avoid memory issues, this now the new default
  • New performance metrices: TPR at fixed FPR and FPR at fixed TPR

Other changes

  • Stand-alone function to find thresholds calculate.thresholds was removed. To calculate thresholds
    please call boot.roc and look at list item roc of the outpot
  • Smarter default for the number of steps in conf.roc
  • Smarter default for the number of bins in plot.fbroc.perf

Internal Changes

  • Completely refactored C++ code for improved maintability


  • Function now works properly
  • For duplicated predictions not all relevant thresholds were found reliably, this was fixed

ROC curves and ties

While working on the next version of fbroc I found a bug that caused fbroc to produce incorrect results when the numerical predictor variable includes ties, that is, if not every value is unique. I have already fixed this bug in the development version, so if you have ties in your predictor I recommend trying the development version instead of fbroc 0.1.0. Instructions for the installation can be found here.

More interestingly, the bug led me to consider some conceptual difficulties with the construction of ROC curves when ties are present.

Constructing the ROC curve

1. Calculating TPR and FPR for all cutoffs present in the data

Let us first focus on the combinations of TPR and FPR that are actually achieved for the given data. Remember, that if we have n positive samples, for any cutoff the TPR must be a multiple of 1/n. Note that the following code snippet does not use the same logic or code as fbroc and that the R code is much slower.

# Function to calculate TPR/FPR pairs
get.roc.curve <- function(pred, true.class) {
  #every cutoff gives the same result as one of these
  thresholds <- c(min(pred) - 1, sort(unique(pred))) 
  tpr <- rep(1, length(thresholds)) # allocate space for true 
  fpr <- rep(1, length(thresholds)) # and false positive rate
  n.pos = sum(true.class)
  n.neg = sum(!true.class)
  for (i in 1:length(thresholds)) {
    tpr[i] = sum(true.class & (pred > thresholds[i])) / n.pos
    fpr[i] = sum(!true.class & (pred > thresholds[i])) / n.neg
  return(data.frame(TPR = tpr, FPR = fpr))
# Example
true.class <- rep(c(TRUE, FALSE), each = 10) 
pred <- 1 * true.class + rnorm(20) #follows binormal model
result <- get.roc.curve(pred, true.class)
graph <- ggplot(data = result, aes(x = FPR, y = TPR)) + 
         geom_point(size = 1.5) + xlim(0, 1) +
Shows combinations of TPR/FPR that are achieved for different cutoffs given the example data.

Shows combinations of TPR/FPR that are achieved for different cutoffs given the example data.

2. Connecting the dots

To get from this set of points to the actual ROC curve, we need to connect the dots. This raises a question: what should our TPR be at an FPR that is never actually assumed in the data? For example, if we have 80 negative samples there never is a threshold at which we have a FPR of exactly 1%. There are two strategies:

  1. Connect the dots next to each other by straight lines.
  2. Since a high TPR and a low FPR is always preferable, we define the TPR at a fixed FPR of 1% to be the highest TPR we observe at an FPR lower than or equal to 1%.

Without ties both strategies turn out to be equivalent.

ROC curve consisting of connected points

The points have been connected to form the full ROC curve. Both strategies yield the same curve.

Given the existence of ties, this is no longer the case. The following example will demonstrate the differences.

An example where it matters

Let us generate more example data first:

true.class <- rep(c(TRUE, FALSE), each = 1000)
pred <- ifelse(true.class, sample(7:14, 1000, replace = TRUE),
                           sample(1:8, 1000, replace = TRUE))

There are many ties in that data, but the ties that matter are 7 and 8, because they include both positive and negative samples. If we look at the TPR/FPR combinations achieved, we obtain the following picture. Note the diagonal displacements in the upper left corner.

TPR/FPR points are not next to each other on a vertical or horizontal line, due to the presence of ties.

TPR/FPR combinations that are achieved for a predictor with ties. This causes the diagonal displacement between neighboring points.

Depending on how we construct the ROC curve with the new development version of fbroc, we get two different shapes:

pred <- as.numeric(pred)
graph1 <- plot(boot.roc(pred, true.class, n.boot = 1e4, 
                        tie.strategy = 1), show.conf = FALSE)
graph2 <- plot(boot.roc(pred, true.class, n.boot = 1e4, 
                        tie.strategy = 2), show.conf = FALSE)
grid.arrange(graph1, graph2, ncol = 2)
Two roc curves, one with diagonal connections, one with discontinuities

ROC curves with ties drawn with fbroc. The left-hand graph is drawn using strategy 1, the right-hand graph is drawn using strategy 2 (fbroc default).

From the point of view of the end user (e.g. a doctor making a diagnosis), I strongly prefer the second strategy. If the user asks for the TPR at a fixed FPR of 0.1 and gets a result of 0.8, he assumes that this is a realistic combination of TPR and FPR.

With the second strategy the user is assured that a cutoff exists, for which we obtain a TPR of 0.8 and a FPR smaller than or equal to 0.1. That is, there is a cutoff at which the performance is as least as good as stated. On the other hand, with the first strategy it is likely that the user will have to settle either for a smaller TPR or a higher FPR than reported.

Most other packages (e.g. pROC and ROCR) seem to use the first strategy. One possible reason is, that the Area Under the Curve (AUC) is equivalent to the Mann-Whitney U statistic only when we follow the first approach.

How does fbroc handle this situation?

As written above, I personally favor the second way of connecting the dots, since I consider it to more faithful to what we want ROC curve to represent. After quite a bit of thinking this is what I decided:

  1. The ROC curve and its confidence region will be visualized using the second strategy when there are ties by default. If you want, you can force fbroc to use the first strategy instead.
  2. When estimating the TPR at a fixed FPR or vice versa, the second strategy will always be used.
  3. When calculating the AUC, fbroc will use the first option instead. This way the AUC remains equivalent to the Mann-Whitney U statistic and to that given by other R packages (ROCR, pROC).

The drawback is, that in the presence of ties, the Area under the Curve will no longer be equal to the area under the ROC curve as plotted. Also if you force fbroc to use strategy 1 to draw the ROC curve, the confidence region is currently still based on strategy 2. This can lead to the ROC curve being on the outside of the confidence region.

Two roc curves, one with diagonal connections, one with discontinuities

ROC curves with ties drawn with fbroc. The left-hand graph is drawn using strategy 1, the right-hand graph is drawn using strategy 2 (default). Confidence region shown are always based on the second strategy.

I will consider allowing the user to configure this behavior more to his or her liking, but for now this seems like a reasonable compromise. This choice seems to be unique to fbroc, since both pROC and ROCR follow the first strategy instead. In the meantime, remember that this makes absolutely no difference if you do not have any ties in your predictor.

Getting highlighting with SyntaxHighlighter Evolved to work

I had some problem getting syntax highlighting to work correctly on this site. First, I tried using a combination of Jetpack Markdown and SyntaxHighlighter Evolved to write my posts. But when I previewed or published the post, several special character were turned into html code. For example, the less symbol < was rendered as &lt; instead. At one point it even corrupted my post, so that the html code was permanently saved into my Markdown text. Fortunately, WordPress allows you to go back to earlier revisions of your posts.

Google showed that the problem is not unknown, but I did not find a solution for my problem quickly. So I tried out WP Code Prettify instead. This worked but I didn’t like the look of it too much, so I experimented some more and finally got SyntaxHighlighter Evolved to work correctly.

Instead of using the triple backticks ``` I had to start the code block with the language specific command, e.g. cpp in square brackets for C++. Interestingly enough, just one code block in this form also caused all other original code blocks to render correctly, even though I had not changed them yet. I suppose it triggered some sort of different preprocessing of my post. Hopefully, this will be fixed in the future so that you can use more standard Markdown instead.

Maybe this will help someone with the same problem in the future? Anyway, it also showed me that I have to edit my code blocks so that you don’t need to use the scroll bar. Personally, I like to work with up to 100 characters per line instead of the usual 80, but this seems to be too many for the line to display in WordPress without scrolling.

An example

A C++ code block rendered with SyntaxHighlighter Evolved

A C++ code block rendered with SyntaxHighlighter Evolved

Using classes to write fbroc

Currently I am working on the next version of fbroc. One important new feature will be the analysis of paired ROC curves, which is important when you compare two classifiers. One example would be comparing some new diagnostic method to the state-of-the-art.

Before doing this I wanted to improve my C++ code. In the first version I didn’t use any C++ feature besides making use of the Vector classes in Rcpp. Without classes you usually need many more arguments per function and it is harder to reuse code efficiently. Implementing paired ROC curves is much more natural, if you look at paired ROC curves as a class containing two single ROC curve objects. This will make implementing them much more straightforward and the code more maintainable.

After realizing this I refactored the fbroc C++ code, so that everything related to the ROC curves was encapsulated in a class named ROC.


When I tested my first implementation of the ROC class on example data consisting of 1000 observations, I was a bit disappointed with the performance. The new code turned out to be about 30-40% slower than the old.

However, when I took a careful look at what went on, I found the reason: unnecessary memory allocations. In many cases I allocated a new NumericVector or IntegerVector for each bootstrap replicate, even though the size of the vector remains constant while bootstrapping.

As an example, compare the following code snippets. In both cases ‘index_pos’ and ‘index_neg’ are member of ‘ROC’.

With memory allocation:

void ROC::strat_shuffle(IntegerVector &shuffle_pos, IntegerVector &shuffle_neg) {
index_pos = NumericVector (n_pos);
index_neg = NumericVector(n_neg);
for (int i = 0; i < n_pos; i++) {
index_pos[i] = original_index_pos[shuffle_pos[i]];
for (int i = 0; i < n_neg; i++) {
index_neg[i] = original_index_neg[shuffle_neg[i]];
// recalculate ROC after bootstrap

Without memory allocation:

void ROC::strat_shuffle(IntegerVector &shuffle_pos, IntegerVector &shuffle_neg) {
for (int i = 0; i < n_pos; i++) {
index_pos[i] = original_index_pos[shuffle_pos[i]];
for (int i = 0; i < n_neg; i++) {
index_neg[i] = original_index_neg[shuffle_neg[i]];
// recalculate ROC after bootstrap


The graph below shows the performance of the new code vs the old used in fbroc 0.1.0.

The new version takes less time until we have more than 10000 samples per group. Afterwards both version perform the same.

Time used by fbroc 0.1.0 vs time used by the unreleased class-based version of fbroc

To generate it, I used a slightly modified version of the script used here.

Since the time for memory allocation is usually not dependent upon the size of the memory being allocated, the overhead stops to matter when the number of observations gets very large. But in the case that you have more than 10000 observations per group, you probably don’t need to bootstrap anyway.