Monthly Archives: November 2015

Shiny interface for fbroc updated

I am happy to announce an updated shiny interface to my R package fbroc. Before updating the interface further to include the new features of fbroc, I wanted to update the interface first. Fortunately, there is an excellent package for the creation of dashboard interfaces for shiny: shinydashboard. I had to rewrite parts of the shiny interface. It was very much worth it.

The most difficult part was to get the graphs and boxes to scale correctly. First, I had trouble keeping the correct aspect ratio and then the boxes surrounding them did not scale properly with graph size. For some window sizes, part of the graph was outside the box. Since I had trouble fixing this, I will describe the problem and the solution later in a separate blog post. Maybe it will help someone else.

Comparison of old and new shiny interface

The TPR at a FPR of 0.03 depends upon how many of the outliers are included, making the confidence intervall very wide.

Old interface with fbroc 0.2.1

Updated shiny interface for fbroc, using package shinydashboard

Updated shiny interface for fbroc, using package shinydashboard

If you are interested in the code, you can find in on the GitHub page, as always. To test the interface, go here instead.

Outlook

As mentioned, next up is another update of the interface to support the main new feature of fbroc: comparison of two classifiers trying to solve the same prediction task on the same data. Even with very different classification algorithm, there is usually still a significant correlation between the predictions of the two models. Often the two subsets of samples misclassified by the classifiers have a large overlap. To correctly compare the model with bootstrap methods in this case, it is critical to keep the correlation intact. After the first release of fbroc, this feature was my highest priority.

Dangers of implicit type conversion in R

As you might be aware, R usually does implicit type conversion of your input variable in the expected type whenever necessary. For example, paste expects characters and therefore

paste("example", 1)

works by implicit type conversion of numeric to character, and you do not need to use

paste("example", as.character(1))

instead. Usually, this is very convenient. But there are at least two ways I observed where this implicit type conversion can cause major bugs.

Implicit conversion of factors to integers

The first way has to with factors and is pretty well known. If you use factor in an index, the factor is converted to integer. This is an example of implicit type conversion, as you do not have to tell R to do it and you are not even warnred that R converted your type. In some cases, your factor levels correspond to the names of what you are indexing, and you would expect that R is going to index by matching factor level to column name.

factor.var <- as.factor(c("A", "B", "C")) # define factor
num.var <- 1:3 # numeric variable
names(num.var) <- c("C", "B", "A") # names match levels of factor
as.integer(factor.var) # explicit conversion to integer
[1] 1 2 3
# implicit type conversion of factor.var to integer
num.var[factor.var] 
C B A
1 2 3
# factor.var is explicitly converted to character
num.var[as.character(factor.var)]
A B C
3 2 1

I think most people working with R stumbled over this at least once. I know I did. There is also a chance that sometimes the factor levels are just in the right order for the code to work, so you might get away with doing it at first.

Number character comparisons

Somewhat less well known is what happens if you compare a number with a string. Look at

0.01 < "0.05"
[1] TRUE

Looks fine, right? But now consider

0.0000001 < "0.05"
[1] FALSE

What went wrong? R can not always convert a character to a numeric, so in this case it does the “safe” operation of converting the number to character instead.

as.character(0.01)
[1] "0.01"
as.character(0.0000001)
[1] "1e-07"

The second number is small enough to be converted to scientific notation. And based on the documented rules of comparing strings

"1e-07" < "0.05"
[1] FALSE

is the correct and expected result. This one is especially nasty as it depends on a global setting of R, which is how many digits to accept before switching to scientific notation.

options(scipen=10)
0.0000001 < "0.05"
[1] TRUE
as.character(0.0000001)
"0.0000001"

This means that if you write code like this and put it in a package, the result will depend upon the settings of the user. Bugs like these tend to be very hard to track down.

What do we learn from this? Always be careful when mixing up types in R. It is very convenient, but can also be dangerous. Use explicit casting whenever you do non-standard things with your variables to avoid nasty surprises. Also try to keep in mind what class your variables actually have! For example, people new to R often do not expect that text columns from data tables (e.g. csv or tsv) are converted to to factors by default when reading them into R.

Further reading

The R-Inferno has more on this and other common pitfalls when working with R. Don’t miss reading at least the free .pdf version. If I run into other examples, I will also write about them in the future as well.

Back from autumn vacation

We are back from our autum vacation in the Lüneburger Heide. Fortunately, we had autumn weather so that the kids could have fun. Beside from a rather short hiking trip to the Grundloses Moor, we spent some time in the nice garden and visited several other interesting locations in the area.

The kids & me at the Grundloses Moor in the Lüneburger Heide

The kids & me at the Grundloses Moor in the Lüneburger Heide

This includes the Weltvogelpark Walsrode which was a great success with the children, so that we went there twice.

Kids feeding lories in the Weltvogelpark Walsrode

Kids feeding lories in the Weltvogelpark Walsrode

On the day we drove back, we first spent some time in the Heidepark, which was also appreciated by all. Only my poor son was a little unhappy at times, that many of the rides were not open to kids younger than six years. I originally planned to do some work on this site and on the shiny interface for fbroc when I come back, but I decided to wait a week since the rest of the family had a bit more spare time due to the school holidays. Now that they are over I should have some time next week.