1 ...visualizing classifier performance Tobias Sing Dept. of Modeling & Simulation Novartis Pharma AG Joint work with Oliver Sander (MPI for Informatics, Saarbrücken)
2 2 | ROCR | Tobias Sing | July 2, 2007 Classification Binary classification (Instances, Class labels): (x 1, y 1 ), (x 2, y 2 ),..., (x n, y n ) y i {1,-1} - valued Classifier: provides class prediction Ŷ for an instance Outcomes for a prediction: 1 1True positive (TP) False positive (FP) False negative (FP) True negative (TN) True class Predicted class
3 3 | ROCR | Tobias Sing | July 2, 2007 Some basic performance measures P(Ŷ = Y): accuracy P(Ŷ = 1 | Y = 1): true positive rate P(Ŷ = 1 | Y = -1): false positive rate P(Y = 1 | Ŷ = 1): precision 1 1True positive (TP) False positive (FP) False negative (FP) True negative (TN) True class Predicted class
4 4 | ROCR | Tobias Sing | July 2, 2007 Performance trade-offs Often: Improvement in measure X measure Y becomes worse Idea: Visualize trade-off in a two-dimensional plot Examples: True pos. rate vs. false pos. rate Precision vs. recall Lift charts …
5 5 | ROCR | Tobias Sing | July 2, 2007 Scoring classifiers Output: continuous (instead of actual class prediction) Discretized by choosing a cut-off f(x) ≥ c class „1“ f(x) < c class „-1“ Trade-off visualizations: cutoff-parameterized curves
6 6 | ROCR | Tobias Sing | July 2, 2007 ROCR Only three commands pred
7 7 | ROCR | Tobias Sing | July 2, 2007 Examples (1/8): ROC curves pred
8 8 | ROCR | Tobias Sing | July 2, 2007 Examples (2/8): Precision/recall curves pred
12 12 | ROCR | Tobias Sing | July 2, 2007 Examples (6/8): Cutoff labeling – multiple runs plot(perf, print.cutoffs.at=seq(0,1,by=0.2), text.cex=0.8, text.y=lapply(as.list(seq(0,0.5,by=0.05)), function(x) { rep(x,length([email protected][[1]]))}), col= as.list(terrain.colors(10)), text.col= as.list(terrain.colors(10)), points.col= as.list(terrain.colors(10)))
14 14 | ROCR | Tobias Sing | July 2, 2007 Examples (8/8): Some other examples perf
15 15 | ROCR | Tobias Sing | July 2, 2007 Extending ROCR: An example Extend environments assign("auc", "Area under the ROC curve", envir = long.unit.names) assign("auc", ".performance.auc", envir = function.names) assign("auc", "fpr.stop", envir=optional.arguments) assign("auc:fpr.stop", 1, envir=default.values) Implement performance measure (predefined signature).performance.auc
16 16 | ROCR | Tobias Sing | July 2, 2007 Thank you! http://rocr.bioinf.mpi-sb.mpg.de http://rocr.bioinf.mpi-sb.mpg.de Sing et al. (2005) Bioinformatics