roc(3)
NAME
Statistics::ROC - receiver-operator-characteristic (ROC)
curves with nonparametric confidence bounds
SYNOPSIS
use Statistics::ROC; my ($y) = loggamma($x); my ($y) = betain($x, $p, $q, $beta); my ($y) = Betain($x, $p, $q); my ($y) = xinbta($p, $q, $beta, $alpha); my ($y) = Xinbta($p, $q, $alpha); my (@rk) = rank($type, @r); my (@ROC) = roc($model_type,$conf,@val_grp);
DESCRIPTION
This program determines the ROC curve and its
nonparametric confidence bounds for data categorized into
two groups. A ROC curve shows the relationship of
probability of false alarm (x-axis) to probability of
detection (y-axis) for a certain test. Expressed in
medical terms: the probability of a positive test, given
no disease to the probability of a positive test, given
disease. The ROC curve may be used to determine an
optimal cutoff point for the test.
The main function is roc(). The other exported functions
are used by roc(), but might be useful for other
nonparametric statistical procedures.
- loggamma
- This procedure evaluates the natural logarithm of
gamma(x) for all x>0, accurate to 10 decimal places.
Stirlings formula is used for the central polynomial
part of the procedure. For x=0 a value of
743.746924740801 will be returned: this is
loggamma(9.9999999999E-324). - betain
- Computes incomplete beta function ratio
Remarks:
Complete beta function: B(p,q)=gamma(p)*gamma(q)/gamma(p+q)log(B(p,q))=ln(gamma(p))+ln(gamma(q))-ln(gamma(p+q)) - Incomplete beta function ratio:
- I_x(p,q)=1/B(p,q) * int_0^x
- t^{p-1}*(1-t)^{q-1} dt
- --> log(B(p,q)) has to be supplied to calculate
- I_x(p,q)
log denotes the natural logarithm - $beta = log(B(p,q))
$x = x
$p = p
$q = q - The subroutine returns I_x(p,q). If an error oc
- curs a negative value
{-1,-2} is returned. - Betain
- Computes the incomplete beta function by calling
loggamma() and betain(). - xinbta
- Computes inverse of incomplete beta function ratio
Remarks:Complete beta function: B(p,q)=gamma(p)*gamma(q)/gamma(p+q)log(B(p,q))=ln(gamma(p))+ln(gamma(q))-ln(gamma(p+q)) - Incomplete beta function ratio:
- alpha = I_x(p,q) = 1/B(p,q) * int_0^x
- t^{p-1}*(1-t)^{q-1} dt
- --> log(B(p,q)) has to be supplied to calculate
- I_x(p,q)
log denotes the natural logarithm - $beta = log(B(p,q))
$alpha= I_x(p,q)
$p = p
$q = q - The subroutine returns x. If an error occurs a
- negative value {-1,-2,-3}
is returned. - Xinbta
- Computes the inverse of the incomplete beta function
by calling loggamma() and xinbta(). - rank
- Computes the ranks of the values specified as the
second argument (an array). Returns a vector of ranks
corresponding to the input vector. Different types of
ranking are possible ('high', 'low', 'mean'), and are
specified as first argument. These differ in the way
ties of the input vector, i.e. identical values, are
treated: - · high: replace ranks of identical values with their
- highest rank
- · low: replace ranks of identical values with their
- lowest rank
- · mean: replace ranks of identical values with the
- mean of their ranks
- roc Determines the ROC curve and its nonparametric
- confidence bounds. The ROC curve shows the
relationship of "probability of false alarm" (x-axis)
to "probability of detection" (y-axis) for a certain
test. Or in medical terms: the "probability of a
positive test, given no disease" to the "probability
of a positive test, given disease". The ROC curve may
be used to determine an "optimal" cutoff point for the
test. - The routine takes three arguments:
- (1) type of model: 'decrease' or 'increase', this
states the assumption that a higher ('increase') value
of the data tends to be an indicator of a positive
test result or for the model 'decrease' a lower value. - (2) two-sided confidence interval (usually 0.95 is
chosen). - (3) the data stored as a list-of-lists: each entry in
this list consits of an "value / true group" pair,
i.e. value / disease present. Group values are from
{0,1}. 0 stands for disease (or signal) not present
(prior knowledge) and 1 for disease (or signal)
present (prior knowledge). Example: @s=([2, 0],
[12.5, 1], [3, 0], [10, 1], [9.5, 0], [9, 1]); Notice
the small overlap of the groups. The optimal cutoff
point to separate the two groups would be between 9
and 9.5 if the criterion of optimality is to maximize
the probability of detection and simultaneously
minimize the probability of false alarm. - Returns a list-of-lists with the three curves:
@ROC=([@lower_b], [@roc], [@upper_b]) each of
- the curves is
again a list-of-lists with each entry consisting
- of one (x,y) pair.
- Examples
$,=" ";
print loggamma(10), "0;
print Xinbta(3,4,Betain(.6,3,4)),"0;- @e=(0.7, 0.7, 0.9, 0.6, 1.0, 1.1, 1,.7,.6);
print rank('low',@e),"0;
print rank('high',@e),"0;
print rank('mean',@e),"0; - @var_grp=([1.5,0],[1.4,0],[1.4,0],[1.3,0],[1.2,0],[1,0],[0.8,0],
[1.1,1],[1,1],[1,1],[0.9,1],[0.7,1],[0.7,1],[0.6,1]);
- @curves=roc('decrease',0.95,@var_grp);
print "$curves[0][2][0] $curves[0][2][1] 0;
AUTHOR
Hans A. Kestler, hans.kestler@medizin.uni-ulm.de or
h.kestler@ieee.org
SEE ALSO
Perl/Tk userinterface for drawing ROC curves (to be
uploaded shortly).
R.A. Hilgers, Distribution-Free Confidence Bounds for ROC
Curves (1991), Meth Inform Med, 30:96-101
Algorithm 291, Logarithm of the gamma function. Collected Algorithms of the ACM, Vol II, 1980
Numerical Recipes in C, second edition, by Press,
Teukolsky, Vetterling and Flannery, Cambridge University
Press, 1992.
G.W. Cran, K.J. Martin and G.E. Thomas (1977).Remark AS
R19 and Algorithm AS109, A Remark on Algorithms AS 63: The
Incomplete Beta Integral AS 64: Inverse of the Incomplete
Beta Function Ratio, Appl Statist, 26:111-114.
- K.J. Berry, P.W. Mielke, Jr and G.W. Cran (1990) Algorithm
AS R83, A Remark on Algorithm AS 109: Inverse of the
Incomplete Beta Function Ratio, Appl Statist, 39:309-310.