Fast Biometric Identification Considering User Statistics

Manfred Bromba · Bromba GmbH
First release: 2008-11-17 • Actual status: 2011-02-03

Introduction

Unlike comparing the identity of simple numbers, comparing biometric data is a computationally intensive task. Furthermore, a comparison does not simply deliver the answer "yes" for equality and "no" for inequality. Instead, the result of a biometric comparison will initially be a similarity score since biometric data generally do not show perfect identity (when captured from the same biometric characteristic) or perfect distinctness (when captured from a different biometric characteristic). Especially with large biometric data bases where a great number of biometric references are stored, the former effect leads to large computation times in the case of identification while the latter effect is responsible for misidentifications. In contrast to verification where the ID of a reference is given, identification requires the comparison of all data base reference data sets with the incoming biometric sample data to deliver the ID of a matching reference.
Regarding misidentifications, usually three types of failure are considered [BioFAQ]:
I.
  An "identity" which is known to be included in the reference data base is not found, e.g.,  since the actual similarity score is below a given threshold. This error corresponds to the FRR (False Reject Rate).
II.
An "identity" is mistaken for another identity in the reference data base, e.g., since the similarity score for the latter is erroneously higher than for the true identity. This error corresponds to the FIR (False Identification Rate).
III.
An "identity" which is not included in the reference data base is erroneously assigned to a foreign identity included in the data base, e.g., since the similarity score exceeds a given threshold. This error corresponds to the FAR (False Accept Rate).
Usually, there is a tradeoff between FRR and FAR. If the decision threshold is increased, FRR increases and FAR decreases, and vice versa. This holds for simple systems. Because identification systems have to cope with the fact that more than one comparison score may exceed the threshold, the situation becomes more sophisticated and more than one realization becomes possible. This fact constitutes another tradeoff: failure rates versus identification time. This is because real-life identification systems seem to increase error rates if identification times decrease, and vice versa. For that reason, this paper focuses on methods to decrease identification time while preserving error rates at the lowest possible status. Binning (a kind of preclassification to speed up identification) is not discussed here. Finally, we assume a unique ID as result without manual interaction as this is common in AFIS data bases used by police.

Full Search

In full search operation mode of an identification system, all references in a data base are compared against the incoming sample data. If N is the number of references, N comparisons have to be performed, delivering N similarity scores. Without loss of generality, we assume the score throughout this paper to lie between 0 and 100. Usually, the score value is quantized, e.g., in unit steps between 0 and 100. If Tc is the comparison time, the identification time in full search mode Tfs will be N times Tc plus an initialization time Ti (e.g., caused by loading the references into the main memory):
Tfs = Ti + N Tc
(1)
The failure rates depend on the decision strategy. Since more than one score value may exceed a decision threshold, further processing is required to achieve a unique result. For example, if two score values exceed the threshold, one may consider this case as rejection with result "unknown". This would increase the FRR and reduce the FAR. Another decision policy is to decide for the highest-score ID exceeding the threshold and rejecting only when the highest ID is not unique. These examples do hardly influence identification time. As a result, Tfs is the worst case with respect to identification time (and probably the best case with respect to failure rates).

Fast Identification

In fast identification mode, that ID, say n, is chosen whose score in a process of subsequent comparisons first exceeds the threshold. After that the comparisons stop. Obviously, if identification starts with ID 1, proceeds with ID 2, and so on, the identification time in fast identification Tfi is given by
Tfi = Ti + n Tc
(2)
where n is an integer number between 1 and N. Since it is not known in advance how large n is (otherwise we would only need a verification), Tfi is a random variable, too. If we know nothing about the a priori probability of the occurrence of ID n, it is useful to assume a uniform distribution with all n having the same probability to match the incoming sample data. Under this assumption and the additional assumption that the effects of FRR and FAR are negligible, the expected value of Tfi, written E{Tfi}, is given by
E{Tfi} = Ti + ½ (N+1) Tc  ~ Ti + ½N Tc
(3)
For large N this results in a reduction of the identification time by a factor of 2 with respect to the full search, see eq. (1).
Now we have to discuss the effect of fast identification on the error rates. If more than one ID score exceeds the threshold, the first one may not be the best one! On the other hand, if the probability that more than one score exceeds the threshold is small, the error rates will hardly differ. This probability is approximately given by N times the verification FAR [BioFAQ]. Since it makes sense that this product is very small, we can expect a negligible effect of the fast identification mode on FAR.
A more thorough consideration has to include the effect of failure cases on identification time. For example, in an false reject case where no false accepts occur, a full search is done. That is, the higher the FRR, the longer the mean identification time in fast identification mode! Thus, a better approximation for large N than eq. (3) would be
E{Tfi} ~ (1 - FRR) (Ti + ½N Tc) + FRR Tfs = Ti + ½N (1 + FRR) Tc
(4)
For FRR = 0, the mean time approaches the old approximation eq. (3). For FRR = 1, there is no benefit with the fast identification mode. For a typical FRR of 0.1, the factor ½ becomes 0.55 which is still a good value!
Until now we have discussed the most unfavorable case, the uniform distribution of the probabilities pn(t) (n= 1...N) that a given ID n matches the incoming biometric sample. If this distribution was known a priori to be nonuniform, it would be possible to reorder the sequence of comparisons such that higher-probability IDs are compared first. This should further accelerate fast identification in the mean. The only thing we have to do is to reorder the IDs such that their probabilities decrease monotonously. Then the mean value is given by
E{Tfi} ~ (1 - FRR) (Ti + s(t)Tc}) + FRR Tfs
 (5)
where s(t) is the mean value of the (ordered) distribution with respect to all references (which may depend on time).
Example: The identification system is used for time & attendance control. Not all employees are coming at the same time, everyone has his own preferences which expresses in an individual periodical distribution of arrival times. This periodical distribution can even be statistically measured by the application itself (remark: this is a privacy issue!). To find the optimum order at a time, the measured distributions of all employees have to be compared. The lowest probability estimation at a certain time (of day, week, or month) is compared last, the highest one first. This principle can simply be extended to shift-workers. Besides time depending statistics, there are also event-based statistics. For example, if an employee has already checked in, it is unlikely that he will check-in within the next several hours. Thus, after check-in, the corresponding reference only requires a very low priority and is compared, if at all, only after all employees who still have to come.
If we want to calculate the benefit of ordered statistics, we have to calculate s(t). It is simply the sum over the product nqn(t) with index n running from 1 to N. qn(t) is the ordered distribution such that q1(t) is the largest value and qN(t) the smallest one. Each qn(t) uniquely represents one pm(t), where m is the ID and n the index for the comparison sequence.
Example: For simplicity, let N = 1000,
p500(t) = 0.4
p900(t) = 0.6
and all other pm(t) = 0
Before reordering, the mean value s(t) is given by 500 x 0.5 + 900 x 0.5 = 700. That is, in the mean, 700 instead of 1000 comparisons have to be performed for this situation. After reordering, we have 
q1(t) = p900(t) = 0.6
q2(t) = p500(t) = 0.4
and all other qn(t) = 0
Then the new mean value is given by 1 x 0.6 + 2 x 0.4 = 1.4 comparisons instead of 1000, a really formidable improvement by a factor of 700/1.4 = 500 in this case!
The last example shows that a considerable improvement can be expected if suitable statistical data are available. Such data can be collected and updated automatically by the application, giving a self-optimizing system with respect to recognition speed.

Two-stage identification

Most biometric characteristics allow the creation of scalable references. For high speed comparison, a subset of biometric features which may lead to higher failure rates can be used. This suggests a two-stage identification scheme:
C1:
Full search with a fast comparison algorithm C1 adjusted for high FAR and low FRR
C2:
Full search over all references which exceeded the score threshold in step C1 with a high-precision but slow comparison algorithm C2, adjusted for very low FAR at moderate FRR
An example for such a procedure is fingerprint identification with fast minutiae comparison and high-precision image correlation comparison. Minutiae as features allow for fast comparison algorithms, the image itself enables extremely low error rates at low speed.
Example: Let N = 10000 be the size of the reference data base. In the first step of the 2-stage procedure we expect an identification with a full search time of 1s. The threshold is adjusted for a verification FAR = 0.1 at FRR = 0.005. This yields approximately 1000 references which exceed the threshold, inlucing the correct one except for false reject cases which occur with probability 0.005. These remaining references which contain the searched ID except for false reject cases are now identified in the second step where we assume a full search time of 1s, too, but this time only over 1000 references. In the second step we expect a verification FAR = 0.00001 at FRR = 0.015. This may result in the following performance, compared with a full search of all N references with comparison algorithm C2 which needs 10s for 10k references (for the determination of the failure rates the ROC of the system is required [BioFAQ]):
Procedure total FAR total FRR Identification time
1-stage C1
~0.01
~0.20
1 s
1-stage C2
~0.01
~0.02
10 s
2-stage
~0.01
~0.02
~2 s
The total FAR or identification FAR is nearly given by N times the verification FAR [BioFAQ]. Although the table is fictive (but represents the actual state of the art) and does not take into account statistical dependencies, it indicates the trade-off between speed and failure rates. Statistical dependencies should slightly increase the total FAR in the 2-stage procedure.
For a better understanding we analyze what is happening in the two-stage case. Only the FAR of Comparator C1 has a considerable effect on identification times in this case.
Stage 1
Stage 2
Total
Comparator
C1
C2
C1 & C2
Comparison speed per reference
10000/s
1000/s
~5000/s
Verification FAR
0.1
0.00001
~0.000001
Identification FAR
~1
~0.01
~0.01
Number of input references
10000
~1000
10000
Number of output references
~1000
~1
~1
Identification time
1s
~1s
~2s
To determine the parameters used in the last example, the Receiver Operating Characteristic (ROC) [BioFAQ] is very helpful. The following example ROC stems from a fingerprint recognition system with the two comparators C1 and C2. It can be used to determine the error rates at different operating points for a 2-stage fingerprint identification mode.

Fast Two-Stage Identification

For the 2-stage procedure in the last paragraph it makes little sense to activate the fast identification mode in the first step since there are a large number of references which exceed the threshold. Here, modified procedures are required. One way is to apply fast identification only to the second step. This would decrease only a part of the total identification time. 
Another solution is partitioning. In the first step, only a first part of the full data base is compared with the incoming sample using comparator C1. The ID candidates found in this step are then processed by the second step C2. If an ID is found, the procedure stops. Otherwise, the second and further parts of the reference data base are explored using the combination C1 / C2 until the search was successfull or the whole reference data base has been explored.
The question to be answered is "What is the optimum size of the partitions". To find a solution, we restrict the following calculations to the parameters of the last example and neglect the effect of FRR (=0). M shall be the number of equal sized partitions and must be larger than or equal to 10 because this is the reduction rate from stage C1 to stage C2 in our example. For simplicity, we assume that N is chosen such that K = N/M, the size of a partition, is an integer. Then the full search time for one partition is given by
TP = TC1 + TC2 = Ti1 + K Tc1 + Ti2+ (K/10) Tc2
(6)
Next we neglect initialization time, i.e., Ti1 = Ti2 = 0. Then with Tc2 = 10 Tc1, we obtain
TP = 2K Tc1
(7)
Tfs = M TP = 2MK Tc1 = 2N Tc1
(8)
The full search time for a partitioned system is the same as for the unpartitioned 2-stage system in the example and does not depend on the number or size of partitions! If the fast identification mode is based on partitions, we obtain as identification time Tfi:
Tfi =  m TP
(9)
E{Tfi} = ½ (M+1) TP  = K(M+1) Tc1 = N(1+1/M) Tc1 < 1.11 N Tc1
(10)
The last inequality comes from the requirement M > 9, see above. Together with the natural upper limit M = N, we get
(N+1) Tc1 < E{Tfi} < 1.11 N Tc1
(11)
For large N this gives only a small dependence on M, provided the reference distributions are uniform. However, to take the best benefit from nonuniform distributions, a large number of partitions seems appropriate. Since in our example the upper limit for M is on tenth of N, it seems generally to be advisable to choose M as N divided by the reduction rate of the number of references from step C1 to step C2. In our example, this delivers:
E{Tfi} = 1.1 N Tc1
(12)
and shows only a small 10% loss to be added to the 2-fold loss in the double-step procedure, see eq. (3).

Document history

2009-04-07: In the second table of the third example, under "Comparison speed per reference", "~500/s" had to be replaced by "5000/s"
2011-02-03: invalid URN links replaced by permalinks