Observations on Impostor Scores

Dr. Manfred Bromba
http://www.bromba.com/contacte.htm
2001-12-02; revised: 2007-07-29
Permanent address for citation: urn:nbn:de:0125-2008041419

Introduction

This document deals with FAR (False Acceptance Rate) measurements for fingerprint recognition. Each finger is assigned to a unique ID. Fingerprint samples are defined as different measurements of the same finger, resp. ID. Score values are the result of comparisons ("matches") between two fingerprints and describe the similarity. In this investigation, the scores assume values between 0 (least similarity) and 100 (most similarity). 

Based on a matrix representation of a fingerprint FAR examination certain conclusions on the nature of FAR determination are derived. Two experiments have been performed. In the first experiment every fingerprint reference has been matched against one sample of every request (query) fingerprint ID, excluding identical IDs. In the second case, every request sample of one impostor fingerprint ID has been matched against all reference IDs. These experiments shall give answers on questions such as:

  • Is there an asymmetry between references and requests?
  • What is the effect of several impostor trials with the same finger (ID)?
With regard to the FAR examination effort, it should be noticed that it is more expensive to have more (different) IDs than to have more prints (samples) per ID!

Experimental

Using the data base FGA1010x which encompasses 81 different IDs (= different fingers) two score value matrices have been calculated:
  • The first matrix (Table 1) comprises 68 columns for 68 successfully enrolled IDs used as reference (Ref) and 81 rows for all 81 IDs used as request (Req). The cells contain the match score for the corresponding IDs except for identical IDs.
  • The second matrix (Table 2) comprises 68 columns for 68 successfully enrolled IDs used as reference (Ref) and 100 rows for all samples available for reference ID 10010 used as request (Req). The cells contain the score for the corresponding matches. The match with the identical ID is omitted in the calculations (marked red in the table).
Both matrices contain the mean values and variances of the scores per column and per row using the corresponding MS-Excel functions "Mean" and "Variance".
 
Table 1   Ref ID: 10010 10013 10014 10016 10022 10048 10054 10059 10061 ...
One sample                      
  variance   2.33 1.45 0.96 0.99 1.22 0.76 1.21 1.52 0.96 ...
Req ID:   mean 1.15 1.20 0.68 0.66 0.94 0.49 1.03 1.05 0.73 ...
10010 1.30 1.22   3 2 2 2 1 1 2 3 ...
10011 1.02 0.62 0 1 0 0 0 0 1 0 0 ...
10013 0.65 0.51 2   0 0 0 0 2 1 0 ...
10014 1.08 0.66 1 2   0 2 0 0 1 1 ...
10016 1.39 1.00 0 1 0   2 0 3 0 2 ...
10022 1.05 0.99 2 0 0 2   0 0 2 2 ...
10048 1.60 1.06 0 2 0 0 0   0 2 1 ...
10054 1.34 0.76 0 2 0 3 0 0   0 0 ...
10059 2.78 1.28 4 0 2 0 0 0 0   4 ...
10061 1.53 0.84 0 1 1 0 0 0 1 5   ...
10062 1.49 1.15 2 2 2 0 0 0 0 2 1 ...
10065 1.17 0.62 1 0 0 0 0 0 2 0 0 ...
10066 0.53 0.52 1 0 1 1 0 0 0 1 0 ...
10068 1.77 1.25 0 1 0 3 2 0 3 0 0 ...
10069 2.09 1.13 2 0 0 0 1 2 0 2 4 ...
10074 1.29 1.10 3 2 2 0 3 2 1 2 0 ...
10075 0.94 0.60 2 0 0 1 0 0 1 3 2 ...
10076 1.70 1.40 2 2 4 0 0 1 2 3 1 ...
10077 1.06 0.61 0 0 0 1 1 0 1 0 0 ...
10099 1.28 0.81 0 0 1 0 0 4 0 2 2 ...
... ... ... ... ... ... ... ... ... ... ... ... ...

 
Table 2 Ref ID: 10010 10013 10014 10016 10022 10048 10054 10059 10061 ...
One ID                    
variance   162.51 2.83 1.10 0.85 1.17 0.47 0.57 2.40 0.83 ...
  mean 89.45 1.77 1.10 1.24 1.92 0.35 0.98 1.94 0.77 ...
1.30 1.22 100 3 2 2 2 1 1 2 3 ...
0.99 0.66 20 0 0 0 0 0 1 0 0 ...
1.71 1.45 100 2 2 1 2 0 1 0 0 ...
1.33 1.22 74 2 2 1 5 0 1 2 0 ...
1.45 0.91 100 2 1 3 1 0 0 2 0 ...
1.42 1.09 98 3 2 0 2 2 1 4 1 ...
1.87 1.09 100 0 0 2 2 0 0 0 0 ...
1.74 1.12 88 2 0 0 3 0 1 3 0 ...
1.66 1.64 91 5 3 1 2 1 1 4 1 ...
1.67 1.19 93 1 0 2 1 0 1 2 0 ...
0.76 0.70 76 3 1 1 2 0 0 3 1 ...
1.57 0.91 75 2 0 2 1 0 0 3 0 ...
1.44 1.12 100 2 2 0 1 0 0 2 3 ...
2.16 1.55 83 4 2 0 4 0 1 5 1 ...
2.34 1.43 100 2 3 2 3 0 2 3 0 ...
1.67 1.15 100 2 2 3 1 0 1 2 0 ...
1.49 1.24 75 3 2 0 2 1 0 2 1 ...
1.33 0.87 82 2 0 2 1 0 2 0 0 ...
2.23 1.36 97 2 0 1 3 0 1 2 0 ...
1.48 1.00 71 0 0 2 0 0 2 0 0 ...
... ... ... ... ... ... ... ... ... ... ... ...

Results

From the score matrices the main statistical properties have been calculated using the corresponding Excel functions. The results are given in Table 3:
 
Table 3
One sample
Ref ID mean: Average of one Ref ID over all Req IDs 
Req ID mean: Average of one Req ID over all Ref IDs
Mean of Ref ID mean: Average of all Ref ID means etc.
Global score mean 0.92
Global score variance 1.43
Global score sigma 1.19
Mean of Ref ID means 0.92
Variance of Ref ID means 0.13
Sigma of Ref ID means 0.36
Mean of Ref ID variances 1.32
SQRT of mean of Ref ID variances 1.15
Variance of Ref ID variances 0.21
Sigma of Ref ID variances 0.46
Mean of Req ID means 0.92
Variance of Req ID means 0.12
Sigma of Req ID means 0.35
Mean of Req ID variances 1.33
SQRT of mean of Req ID variances 1.15
Variance of Req ID variances 0.26
Sigma of Req ID variances 0.51
Number of Ref IDs 68
Number of Req IDs 81
One ID
Ref ID mean: Average of one ID over all samples
Req sample mean: Average of one sample over all IDs 
Mean of Ref ID mean: Average of all Ref ID means etc.
Global score mean 1.18
Global score variance 1.67
Global score sigma 1.29
Mean of Ref ID means 1.18
Variance of Ref ID means 0.63
Sigma of Ref ID means 0.80
Mean of Ref ID variances 1.05
SQRT of mean of Ref ID variances 1.03
Variance of Ref ID variances 0.40
Sigma of Ref ID variances 0.63
Mean of Req sample means 1.18
Variance of Req sample means 0.04
Sigma of Req sample means 0.20
Mean of Req sample variances 1.65
SQRT of mean of Req sample variances 1.29
Variance of Req sample variances 0.11
Sigma of Req sample variances 0.33
Number of Ref IDs 68
Number of Req samples 100

The following observations can be derived from Tables 1, 2, and 3:

  1. The global mean and variance estimations of the "one sample" case and the "one ID" case deliver "significantly" different values which seem to favor the "one sample" case (Table 3). However this may be an artifact of the fact that the "one ID" case is ID specific whereas the "one sample" case is not.
  2. In the "one sample" case the only "significant" deviation between the column and row distributions of the score values seems to be the measurement errors which are represented by "Sigma of Ref ID variance" and "Sigma of Req ID variance" and can be explained by the different number of IDs (Table 3)
  3. The global score means within one case (Table 3) are equal to the means of the row and column means. (This is trivial.)
  4. The sums of the variances of column / row means and the means of the column / row variances approaches the squared global score variances:

  5.  
    0.13 + 1.32 = 1.45 ~ 1.43 (one sample)
    0.12 + 1.33 = 1.45 ~ 1.43 (one sample)
    0.63 + 1.05 = 1.68 ~ 1.67 (one ID)
    0.04 + 1.65 = 1.69 ~ 1.67 (one ID)
         
  6. The variance of the means is small while the mean of the variances is near the global variance in both cases. (By theory, the variance of the means should approach zero and the mean of the variances approach the global variance for sufficiently high numbers of scores.)
  7. In the "one ID case" there is an extreme difference between the Ref ID results and the Req sample results. Especially the variance of Ref ID means is unusually high (0.63).
  8. In the "one ID" case the mean of the row standard deviations (1.29, estimated as the square root of the mean of Req sample variances) is significantly higher than the mean of the column standard deviations (1.03, estimated as the square root of the mean of Ref ID variance)
  9. In the "one ID" case the variance of the Req sample variances (0.11) is considerably smaller than the variance of the Ref ID variances (0.40).
  10. Suppose a test has been made with one request sample per ID (the first one in Table 2) and 68-1 reference IDs. Then the actual mean value would have been 1.22 (compared to 1.18 +- 0.20) and the actual standard deviation 1.14 (compared to 1.29).
  11. Suppose a test has been made with only one request ID (the first one in Table 2) and one reference ID (the second one in Table 2) but 100 request samples. Then the actual mean value would have been 1.77 (compared to 1.18 +- 0.80) and the actual standard deviation 1.68 (compared to 1.03).
  12. The second observation is that the measurement error (standard deviation) for the first test scenario is significantly smaller than for the second one:

  13.  
    Table 4: Results for different test scenarios
    Calculation method Mean of impostor score Sigma of measurement error Variance of impostor score Sigma of measurement error
    Global average:
    6700 samples
    1.18
     
    1.67
     
    ID average:
    67 IDs,
    100 samples
    1.18
    0.20
    1.65
    0.33
    Test scenario 1:
    67 IDs;
    1 sample
    1.22
    (0.20)
    1.30
    (0.33)
    Sample average:
    67 IDs,
    100 samples
    1.18
    0.80
    1.05
    0.63
    Test scenario 2:
    1 ID,
    100 samples
    1.77
    (0.80)
    2.83
    (0.63)
From these observations we may conclude:
  • There are three kinds of impostor distributions which are not identical: one ID to one ID, one ID to many IDs, many IDs to many IDs (1, 6, 7, 8, 11)
  • Request and reference prints seem to behave identical in this trial (2)
  • The one-to-many impostor distribution (mean variance = 1.32) seems to be narrower than the many-to-many distribution (global variance = 1.43)
  • The one-to-one impostor distributions (mean variance = 1.05) seems to be narrower than the one-to-many distribution (variance = 1.67)

Conclusions

  • Although the number of IDs is smaller than the number of prints per ID, a test based on 67 IDs and 1 sample delivers more accurate results than a test based on 1 ID and 100 samples. For the planning of tests this means that it is more advisable to have a large number of participants than a large number of samples per finger (although this is easier to achieve).
  • Due to the strong personal influences, each ID must be represented by the same number of samples when calculating global characteristics. Alternatively, the mean value of personal characteristics may be taken, provided it delivers the desired result. (Example: The average over all Ref ID means delivers an unbiased global mean value. This is not true for the average over all Ref ID variances which does not estimate the correct global variance. There should be no problem when calculating the impostor distribution or the FAR from the personal impostor distributions or the personal FARs by averaging.)

Comments

All results are based on mean values and standard deviations which represent the position and the width of the score distribution, respectively. (If the distribution type were known a priori, it could eventually be determined completely by mean and standard deviation.) To obtain a low FAR, the mean of the impostor score distribution should be as low as possible and, simultaneously,  the standard deviation of the impostor score distribution should be as small as possible. (This is a necessary condition, but it is not sufficient unless the distribution function has a known simple form. Especially the tails of the distribution which are most important for small FAR and FRR values, may show remarkable deviations.)