Wizard Help: About receiver operating characteristic (ROC) curves

About receiver operating characteristic (ROC) curves

Receiver operating characteristic (ROC) curves can efficiently summarize the predictive power of binary-outcome models in a graphical manner. The area under the curve (sometimes called the AUC) is a useful criterion for model evaluation; in general, larger areas correspond to better model fits. Note that ROC curves are only available for models with two category outcomes.

To view a receiver operating characteristic (ROC) curve:

In the Model view, ensure the outcome is an indicator variable, or is treated as categories with only two categories
Click ROC curve in the residual scatterplot view

While the area under the curve can be used as a quick criterion for model fit, reading the curve in depth requires some knowledge of binary classifiers.

A binary classifier makes a concrete prediction for each record (for example, A or B), rather than just assign probabilities to each outcome (for example, “20% chance of A and 80% chance of B”). A binary-outcome model in Wizard can be used as a binary classifier by choosing a cutoff probability. Above the cutoff, one category is chosen, and below it, the other category is chosen. The most obvious cutoff is 50%, but others may be chosen if different kinds of classification errors have different costs.

The receiver operating characteristic curve is constructed by evaluating the performance of binary classifiers for all possible cutoffs. For each cutoff probability, the classifier’s true positive rate (fraction of positives correctly classified) is plotted against its false positive rate (fraction of negatives incorrectly classified). In this context, a “negative” refers to the base outcome, or a value of 0, and a “positive” refers to an observed value of 1, or the category that is not the base outcome.

In general, classifiers with a high true positive rate and low false positive rate are more desirable than those with a low true positive rate and high false positive rate. In terms of the ROC curve, points closer to the top-left corner are better classifiers than those further away from it. In this way, ROC curves with a larger area are associated with better classifiers than ROC curves with a smaller area. A perfect classifier would cover the entire ROC graph and have an area of 1.0; a set of random classifiers would hug the diagonal dashed line of the ROC graph and have an area of 0.5.

Note that the cutoff probabilities themselves are not visible on an ROC curve. To view the cutoff probability for each point along the curve, it is necessary to export the ROC curve’s data as a CSV, Excel, or JSON document. You can do so by choosing Model > Export Residual Graph > Export Data as CSV (or Excel or JSON). This will export each cutoff along with its associated sensitivity (true positive rate) and specificity (one minus the false positive rate).