The post Partial AUC support added to fbroc 0.4.0 appeared first on Civilized Statistics.

]]>- Partial AUCs over both TPR and FPR ranges can be calculated
- You can now adjust text size for plots
- In the ROC plot the (partial) AUC can now optionally be shown instead of confidence regions

- The location of the text showing the performance in the ROC plot has been shifted downwards and

to the left

There are only two changes worthwhile mentioning in fbroc 0.4.0. The first one is an option to adjust the text-size when printing out the performance details on the ROC plot. This change was motivated by the text sometimes being too wide for the graph – I observed this effect on my mobile phone.

A more important addition is the ability to handle partial AUC, integrating the part of the ROC curve over a specific FPR or TPR interval. The typical McClish correction for the partial AUC is applied by default. I will talk about it and more details about the partial AUC in a later post.

Plotting the partial ROC area was also a bit of a challenge, as the overlap with the confidence region around the ROC curve makes it difficult. Therefore, when fbroc shows the confidence region, the partial AUC region is only denoted by a pair of dotted lines. By setting

show.conf = FALSE

in the plotting call when the metric being shown is the partial AUC, the relevant area is shown instead.

As a minor bonus this now also works for the normal AUC.

The C++ code for calculating the partial AUC was somewhat tricky, as it needed to work when integrating over both FPR and TPR. As an example, take a look at the function used to integrate over a TPR interval by calculating the area contributed by the part of the ROC between the (i-1)-th and the i-th cutoff.

double pauc_tpr_area(NumericVector &tpr, NumericVector &fpr, NumericVector ¶m, int index) { // necessary check to avoid division by zero later if (tpr(index - 1) == tpr[index]) return 0; // cases where relevant TPR interval is not included if (tpr[index - 1] < param[0]) return 0; if (tpr[index] > param[1]) return 0; double left = std::max(tpr[index], param[0]); double right = std::min(tpr[index - 1], param[1]); double base_val = 1 - fpr[index]; double slope = (fpr[index] - fpr[index - 1]) / (tpr[index - 1] - tpr[index]); double value_left = base_val + (left - tpr[index]) * slope; double value_right = base_val + (right - tpr[index]) * slope; return (right - left) * (value_left + value_right); }

The first line excludes a case where the contribution to the partial AUC is zero anyway, because we are looking at a line instead of an area. After using this code segment

double left = std::max(tpr[index], param[0]); double right = std::min(tpr[index - 1], param[1]);

to account for the case that the area actually contributing is just a slice of the full trapezoid between the TPRs for the (i-1)-th and i-th cutoff when the TPR interval used for the partial AUC does not fully encompass it, the difference would cancel out in the product used for the trapezoid rule in this line here

return (right - left) * (value_left + value_right);

but would lead to NaN numbers when fbroc calculates the slope for the trapezoid rule as follows.

double slope = (fpr[index] - fpr[index - 1]) / (tpr[index - 1] - tpr[index]);

After releasing fbroc 0.4.0 I will first update the shiny interface, before working more on the package itself. I will also first create a shiny interface for the analysis of paired ROC curves – something I originally planned to do before releasing fbroc 0.4.0. As it turned out, I decided to update the package first since I find that more enjoyable.

The post Partial AUC support added to fbroc 0.4.0 appeared first on Civilized Statistics.

]]>The post Technical troubles resolved appeared first on Civilized Statistics.

]]>The post Technical troubles resolved appeared first on Civilized Statistics.

]]>The post Technical troubles ongoing appeared first on Civilized Statistics.

]]>The post Technical troubles ongoing appeared first on Civilized Statistics.

]]>The post Technical troubles appeared first on Civilized Statistics.

]]>I am trying to get this page back online, but have not tracked down the issue yet. Currently I have tried disabling some old plugins. I hope this will fix the issue, but we will have to see.

The post Technical troubles appeared first on Civilized Statistics.

]]>The post On hiatus appeared first on Civilized Statistics.

]]>Things are looking better now, but I want to work on the fbroc shiny interface before taking time to post things here.

The post On hiatus appeared first on Civilized Statistics.

]]>The post Shiny interface for fbroc updated appeared first on Civilized Statistics.

]]>The most difficult part was to get the graphs and boxes to scale correctly. First, I had trouble keeping the correct aspect ratio and then the boxes surrounding them did not scale properly with graph size. For some window sizes, part of the graph was outside the box. Since I had trouble fixing this, I will describe the problem and the solution later in a separate blog post. Maybe it will help someone else.

If you are interested in the code, you can find in on the GitHub page, as always. To test the interface, go here instead.

As mentioned, next up is another update of the interface to support the main new feature of fbroc: comparison of two classifiers trying to solve the same prediction task on the same data. Even with very different classification algorithm, there is usually still a significant correlation between the predictions of the two models. Often the two subsets of samples misclassified by the classifiers have a large overlap. To correctly compare the model with bootstrap methods in this case, it is critical to keep the correlation intact. After the first release of fbroc, this feature was my highest priority.

The post Shiny interface for fbroc updated appeared first on Civilized Statistics.

]]>The post Dangers of implicit type conversion in R appeared first on Civilized Statistics.

]]>paste("example", 1)

works by implicit type conversion of numeric to character, and you do not need to use

paste("example", as.character(1))

instead. Usually, this is very convenient. But there are at least two ways I observed where this implicit type conversion can cause major bugs.

The first way has to with factors and is pretty well known. If you use factor in an index, the factor is converted to integer. This is an example of implicit type conversion, as you do not have to tell R to do it and you are not even warnred that R converted your type. In some cases, your factor levels correspond to the names of what you are indexing, and you would expect that R is going to index by matching factor level to column name.

factor.var <- as.factor(c("A", "B", "C")) # define factor num.var <- 1:3 # numeric variable names(num.var) <- c("C", "B", "A") # names match levels of factor as.integer(factor.var) # explicit conversion to integer [1] 1 2 3 # implicit type conversion of factor.var to integer num.var[factor.var] C B A 1 2 3 # factor.var is explicitly converted to character num.var[as.character(factor.var)] A B C 3 2 1

I think most people working with R stumbled over this at least once. I know I did. There is also a chance that sometimes the factor levels are just in the right order for the code to work, so you might get away with doing it at first.

Somewhat less well known is what happens if you compare a number with a string. Look at

0.01 < "0.05" [1] TRUE

Looks fine, right? But now consider

0.0000001 < "0.05" [1] FALSE

What went wrong? R can not always convert a character to a numeric, so in this case it does the “safe” operation of converting the number to character instead.

as.character(0.01) [1] "0.01" as.character(0.0000001) [1] "1e-07"

The second number is small enough to be converted to scientific notation. And based on the documented rules of comparing strings

"1e-07" < "0.05" [1] FALSE

is the correct and expected result. This one is especially nasty as it depends on a global setting of R, which is how many digits to accept before switching to scientific notation.

options(scipen=10) 0.0000001 < "0.05" [1] TRUE as.character(0.0000001) "0.0000001"

This means that if you write code like this and put it in a package, the result will depend upon the settings of the user. Bugs like these tend to be very hard to track down.

What do we learn from this? Always be careful when mixing up types in R. It is very convenient, but can also be dangerous. Use explicit casting whenever you do non-standard things with your variables to avoid nasty surprises. Also try to keep in mind what class your variables actually have! For example, people new to R often do not expect that text columns from data tables (e.g. csv or tsv) are converted to to factors by default when reading them into R.

The R-Inferno has more on this and other common pitfalls when working with R. Don’t miss reading at least the free .pdf version. If I run into other examples, I will also write about them in the future as well.

The post Dangers of implicit type conversion in R appeared first on Civilized Statistics.

]]>The post Belated happy Halloween! appeared first on Civilized Statistics.

]]>The post Back from autumn vacation appeared first on Civilized Statistics.

]]>This includes the Weltvogelpark Walsrode which was a great success with the children, so that we went there twice.

On the day we drove back, we first spent some time in the Heidepark, which was also appreciated by all. Only my poor son was a little unhappy at times, that many of the rides were not open to kids younger than six years. I originally planned to do some work on this site and on the shiny interface for fbroc when I come back, but I decided to wait a week since the rest of the family had a bit more spare time due to the school holidays. Now that they are over I should have some time next week.

The post Back from autumn vacation appeared first on Civilized Statistics.

]]>The post One week break appeared first on Civilized Statistics.

]]>The post One week break appeared first on Civilized Statistics.

]]>