Data Formats for ROC Curve Fitting

John Eng, M.D.
The Russell H. Morgan Department of Radiology and Radiological Science
Johns Hopkins University, Baltimore, Maryland, USA

Please send any bugs, questions, comments, or suggestions to electronic mail will be answered.



General Comments   (Back to main JROCFIT page.)

For all three formats below, the activity being evaluated involves an individual classifying cases as either "positive" or "negative". In addition, the individual specifies his/her level of confidence with the classification of each case according to an ordinal rating scale (1, 2, 3, ...). The meaning of the ordinal rating scale is dependent on the particular data format as described below. For each format, the number of categories in the rating scale must be entered after the data format on the main calculation Web page.

The data for each format is organized as multiple lines of numbers. The numbers on each line may be separated by any number of spaces or tab characters. This character format is often called "ASCII" format and can be exported and imported by many spreadsheet and database programs.


Format 1   (Back to main JROCFIT page.)

In this data format, each line represents one case. On each line, there are two numbers. The first number is either "0" or "1", depending on whether the case is truly positive ("1") or truly negative ("0"). The second number is an integer (1, 2, 3, ...) representing the confidence rating for each case. For example, in a 6-point rating scale, the categories would have the following meaning:

     1 - Definitely negative
     2 - Probably negative
     3 - Possibly negative
     4 - Possibly positive
     5 - Probably positive
     6 - Definitely positive


Format 2   (Back to main JROCFIT page.)

This data format allows the calculation of sensitivity, specificity, and overall accuracy in addition to the ROC curve. As in the previous data format, each line represents data from one case. Each line has five fields. The first field is either "0" or "1", depending on whether the case is truly positive ("1") or truly negative ("0"). The second field is a text string indicating the true location of the abnormality, if one is present. If the case is truly negative, then specify "none" as the location. The third field is either "0" or "1", depending on whether the individual thought the case is positive ("1") or negative ("0"). The fourth field is a text string indicating where the individual thought there is an abnormality, if he/she thought the case is positive. If the individual thought the case is negative, then he/she should specify "none" as the location. The fifth field is the level of confidence (1, 2, 3, ...) the individual associated with his/her response. Since positivity and negativity is specified separately, the rating scale indicates degree of confidence, whether positive or negative. For example, in a 3-point rating scale, the categories would have the following meaning:

     1 - Low confidence (that case is pos or neg)
     2 - Moderate confidence (that case is pos or neg)
     3 - High confidence (that case is pos or neg)


Format 3   (Back to main JROCFIT page.)

This data format is very different than Formats 1 or 2. In this data format, there are always only two lines of numbers, regardless of how many cases there are. Each line has a number for each of the categories in the rating scale. Therefore, there are 2 rows and N columns, with N being the number of categories in the rating scale. The first row represents negative cases, and the second row represents positive cases. The first column represents the first rating category, the second column represents the second rating category, and so on. The numbers in each "cell" represent the frequency that each category was used for the positive and negative cases. For example, if the individual responded with a confidence rating of "4" in 10 of the positive cases, then the 4th number in the 2nd row would be "10". The meaning of the rating categories is the same as in Format 1 above. Data Format 3 is the same as that used by the original ROCFIT program, but it is less convenient to use than the others since the data are not usually collected in this form.

(Page last updated: 2/11/2001)