Introduction
This document deals with FRR (False Rejection
Rate) measurements for fingerprint recognition. Each finger is assigned
to a unique ID. The personal FRR is defined as the FRR for just one ID.
Fingerprint samples are defined as different measurements of the same finger,
resp. ID. Score values are the result of comparisons between two fingerprints
and describe the similarity. In this investigation, the scores assume values
between 0 (least similarity) and 100 (most similarity).
For the determination of the personal FRR,
it is obvious that the larger the number of prints (samples) of the same
finger, the higher the accuracy of the FRR result. However, if a global
FRR which is defined, e.g., as the mean value of personal FRRs, is to be
calculated, there are two parameters to be chosen: The number of prints
per finger (ID) and the number of different fingers (IDs). The question
rises whether it is better to have a data basis with a larger number of
IDs or a larger number of samples per ID to get best accuracy with least
effort.
Theory
Assume a fingerprint data base with N different
fingers (IDs) and M prints (samples) per ID. The request (query) part of
this data base may be represented by a matrix of N columns and M rows,
each cell containing a request fingerprint. The reference part of the data
base contains N sets of reference fingerprints according to the N different
IDs. After matching each fingerprint with the corresponding references,
a new matrix of M by N score values is generated. The score values
show a random behavior which can be represented by the probability (distribution)
function. From this probability function usually the FRR curve is calculated
straight-forward, showing the probability that a score value succeeds a
certain threshold. To simplify the consideration, only the mean value and
the standard deviation are regarded which can be directly calculated from
the score values.
There are two main reasons for the random
behavior of the score values:
-
Different finger positioning and pressure
-
Different personal characteristics (number
of minutiae, image quality, behavior, etc.)
The first type of error is nearly independent
on the finger (ID) and does always happen. The second type of randomness
does only happen between different IDs. As a result, the degree of randomness
should be different between different scores from one ID which only shows
type 1 errors and the scores between different IDs which represent both
type 1 and 2 errors. To see the difference, the matrix representation of
the score values has been used to calculate the mean value and standard
deviation row by row or column wise. What is to be expected is that the
row mean values should show a higher variance since it is affected by two
error types in contrast to one with the column mean values. We regard two
extremal cases:
-
Only one ID is available: If we increase the
number of scores for this finger and take the mean value, only the error
type 1 is reduced. If we assume M to approach infinity, this error approaches
0. However the personal effects (error type 2) are not affected at all.
-
Only one request print per ID is available:
If we increase the number of scores by increasing the number of different
IDs both error types are reduced simultaneously!
From this we may conclude that for the
determination of the global FRR it is to be preferred to have more different
fingers instead of more prints resp. samples per finger.
However, it should be noticed that it
is more expensive to have more different IDs than to have more prints per
ID!
Experimental results
Using the data base FGA1010x a score value
matrix has been calculated with 68 columns for 68 different IDs which were
successful during enrollment and 100 rows for 100 samples per ID. From
this matrix the column and row mean values and standard deviations have
been calculated using the corresponding MS-Excel functions "Mean" and "Variance".
(The standard deviation "sigma" is defined as the square root of the variance.)
Table 1 shows the upper left part of the Excel table with results.
Further results are given in Table 2, see Appendix.
What we are looking for are the mean values
and standard deviations of the genuine score distribution. This has not
to be confused with the measurement errors (belonging to a measurement
error distribution) which also have been calculated as standard deviations
and which should tend to zero as the number of score values in the test
approaches infinity.
The following observations are obvious,
see Table 1 and 2:
-
The row mean values show a small variation,
in contrast to the column mean values (both yellow). This fact is expressed
by a standard deviation (sigma) of 2.86 (sigma of Req sample mean) versus
15.73 (sigma of Ref ID mean).
-
The mean of the column mean values is equal
to the mean of the row mean values (= 60.62). This is trivial.
-
The mean of the row standard deviations (22.60,
estimated as the square root of the mean of Req sample variance) is significantly
higher than the mean of the column standard deviations (16.52, estimated
as the square root of the mean of Ref ID variance)
-
The column mean values should only slightly
depend on error type 1 because this error has been eliminated by averaging.
As a result, the standard deviation of the column mean values (15.73) should
mainly represent the error type 2
-
The column standard deviations should solely
represent the error 1 by definition. As a result, the square root of the
mean of the column variances (16.52) should be a good measure for the error
type 1.
-
The square root of the mean of the row variances
(22.66) should represent the combined error type 1 and 2.
-
The sum of the squared type 1 and 2 errors
(= variances) approaches the squared combined error:
16.52² + 15.73²
~ 22.81² ~ 22.66²
-
The square root of the mean of the row (=
Req sample) variances (22.66) only shows a small deviation to the global
standard deviation which is given by 22.68
-
Suppose a test has been made with one score
sample per ID (the first one in the tables) and 68 IDs. Then the actual
mean value would have been 65.25 (compared to 60.62 +- 2.86) and the actual
standard deviation 21.44 (compared to 22.66).
-
Suppose a test has been made with only one
ID (the first one in the tables) but 100 samples. Then the actual mean
value would have been 89.45 (compared to 60.62 +- 15.73) and the actual
standard deviation 12.75 (compared to 16.52).
-
The last two points reveal that both scenarios
deliver different distributions with personal distributions being "more
narrow" than global distributions. (This does not automatically imply that
a personal distribution is better with respect to FRR since the personal
mean value should be as high as possible, but often is not.) The square
root of the mean of the variances of the personal score distributions (16.52)
is NOT equal to the standard deviation of the global score distribution
(22.68)!
-
The second observation is that the measurement
error (standard deviation) for the first test scenario is significantly
smaller than for the second one:
Table 0: Results for different
test scenarios
| Calculation method |
|
Mean of genuine score |
Sigma of measurement error |
|
Variance of genuine score |
Sigma of measurement error |
Global average:
6800 samples |
|
60.62
|
|
|
514.17
|
|
|
|
|
|
|
|
|
ID average:
68 IDs,
100 samples |
|
60.62
|
2.86
|
|
513.56
|
77.72
|
Test scenario 1:
68 IDs;
1 sample |
|
65.25
|
(2.86)
|
|
459.56
|
(77.72)
|
|
|
|
|
|
|
|
Sample average:
68 IDs,
100 samples |
|
60.62
|
15.73
|
|
272.95
|
127.42
|
Test scenario 2:
1 ID,
100 samples |
|
89.45
|
(15.73)
|
|
162.51
|
(127.42)
|
Conclusions
-
Although the number of IDs is smaller than
the number of prints per ID, a test based on 68 IDs and 1 sample delivers
more accurate results than a test based on 1 ID and 100 samples. For the
planning of tests this means that it is more advisable to have a large
number of participants than a large number of samples per finger (although
this is easier to achieve).
-
Due to the strong personal influences,
each ID must be represented by the same number of samples when calculating
global characteristics. Alternatively, the mean value of personal characteristics
may be taken, provided it delivers the desired result. (Example: The average
over all Ref ID means delivers an unbiased global mean value. This is not
true for the average over all Ref ID variances which does not estimate
the correct global variance. However, there should be no problem when calculating
the genuine distribution or the FRR from the personal genuine distributions
or the personal FRRs by averaging.)
Comments
All results are based on mean values and standard
deviations which represent the position and the width of the score distribution,
respectively. (If the distribution type were known a priori, it could eventually
be determined completely by mean and standard deviation.) To obtain a low
FRR, the mean of the genuine score distribution should be as high as possible
and, simultaneously, the standard deviation of the genuine score distribution
should be as small as possible. (This is a necessary condition, but it
is not sufficient unless the distribution function has a known simple form.
Especially the tails of the distribution which are most important for small
FAR and FRR values, may show remarkable deviations.)
Appendix
The following tables show a part of the score
value matrix together with the mean and variances of the columns (Ref IDs)
and rows (Req samples).
| Table
1 |
Ref
ID: |
10010 |
10013 |
10014 |
10016 |
10022 |
10048 |
10054 |
10059 |
10061 |
... |
| Req sample: |
|
|
|
|
|
|
|
|
|
|
|
| variance |
|
162.51 |
296.57 |
128.20 |
446.77 |
244.62 |
309.44 |
218.79 |
183.66 |
116.67 |
... |
| |
mean |
89.45 |
57.57 |
29.76 |
76.09 |
64.19 |
51.79 |
68.43 |
83.57 |
56.98 |
... |
| 459.56 |
65.25 |
100 |
22 |
34 |
93 |
74 |
31 |
59 |
100 |
65 |
... |
| 563.63 |
61.34 |
20 |
73 |
45 |
100 |
98 |
33 |
95 |
83 |
70 |
... |
| 457.79 |
65.87 |
100 |
28 |
40 |
73 |
74 |
31 |
73 |
61 |
70 |
... |
| 466.60 |
62.03 |
74 |
31 |
41 |
71 |
63 |
76 |
75 |
40 |
53 |
... |
| 372.03 |
65.76 |
100 |
41 |
44 |
73 |
100 |
36 |
55 |
93 |
59 |
... |
| 542.87 |
65.38 |
98 |
74 |
28 |
77 |
67 |
60 |
78 |
98 |
78 |
... |
| 537.31 |
65.97 |
100 |
66 |
19 |
86 |
82 |
71 |
59 |
71 |
83 |
... |
| 434.19 |
64.85 |
88 |
66 |
40 |
78 |
84 |
54 |
53 |
90 |
69 |
... |
| 495.86 |
64.59 |
91 |
84 |
13 |
75 |
62 |
30 |
89 |
78 |
62 |
... |
| 574.25 |
61.56 |
93 |
28 |
35 |
0 |
68 |
78 |
58 |
89 |
67 |
... |
| 477.86 |
61.81 |
76 |
80 |
27 |
52 |
49 |
60 |
50 |
80 |
51 |
... |
| 542.73 |
62.66 |
75 |
85 |
35 |
56 |
76 |
79 |
56 |
86 |
49 |
... |
| 611.21 |
59.25 |
100 |
42 |
13 |
85 |
55 |
56 |
100 |
74 |
40 |
... |
| 619.94 |
61.71 |
83 |
60 |
28 |
65 |
79 |
61 |
79 |
79 |
39 |
... |
| 513.31 |
63.97 |
100 |
58 |
56 |
90 |
74 |
75 |
46 |
83 |
74 |
... |
| 476.46 |
63.15 |
100 |
28 |
33 |
65 |
66 |
71 |
29 |
82 |
61 |
... |
| 453.72 |
60.84 |
75 |
59 |
52 |
68 |
58 |
85 |
40 |
87 |
46 |
... |
| 483.74 |
58.81 |
82 |
30 |
34 |
61 |
62 |
16 |
53 |
77 |
44 |
... |
| 423.55 |
61.03 |
97 |
65 |
37 |
75 |
49 |
60 |
97 |
91 |
53 |
... |
| 483.77 |
62.41 |
71 |
95 |
30 |
87 |
57 |
20 |
61 |
84 |
49 |
... |
| 447.42 |
63.96 |
86 |
66 |
31 |
100 |
75 |
78 |
77 |
83 |
62 |
... |
| 340.50 |
60.87 |
83 |
61 |
40 |
54 |
82 |
65 |
86 |
90 |
69 |
... |
| 466.72 |
64.76 |
90 |
92 |
29 |
100 |
86 |
72 |
49 |
88 |
1 |
... |
| 440.97 |
58.96 |
100 |
63 |
34 |
41 |
68 |
46 |
58 |
83 |
55 |
... |
| 504.97 |
62.32 |
100 |
49 |
47 |
72 |
38 |
50 |
100 |
84 |
58 |
... |
| 483.97 |
61.28 |
81 |
59 |
11 |
83 |
69 |
28 |
61 |
85 |
31 |
... |
| ... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
Table 2
Ref ID mean: Average of
one ID over all samples
Req sample mean: Average
of one sample over all IDs
Mean of Ref ID mean: Average
of all Ref ID means etc.
| Global score mean |
60.62 |
| Global score variance |
514.17 |
| Global score sigma |
22.68 |
|
|
| Mean of Ref ID mean |
60.62 |
| Variance of Ref ID mean |
247.52 |
| Sigma of Ref ID mean |
15.73 |
| Mean of Ref ID variance |
272.95 |
| SQRT of mean of Ref ID variance |
16.52 |
| Variance of Ref ID variance |
16236.13 |
| Sigma of Ref ID variance |
127.42 |
|
|
| Mean of Req sample mean |
60.62 |
| Variance of Req sample mean |
8.17 |
| Sigma of Req sample mean |
2.86 |
| Mean of Req sample variance |
513.56 |
| SQRT of mean of Req sample
variance |
22.66 |
| Variance of Ref sample variance |
6040.20 |
| Sigma of Ref sample variance |
77.72 |
|
|
| Number of Ref IDs |
68 |
| Number of Req samples |
100 |
|