Biometric Failure Rates

Intra-Characteristic Consideration

Manfred Bromba - Bromba GmbH

First issue: 2009-04-20 - Status: 2011-06-11
This document summarizes the elementary derivation of biometric failure rates with different degrees of generalization. It is dedicated to provide the basics for biometric testing procedures with the aim of comparability as it is required for standardization. This is done under the assumption that the basic relationship between two different characteristics is deterministic and may change from characteristic to characteristic.

Introduction The Simple Case Introducing Failure to Acquire Generalizing for Failure to Enrol Estimating Biometric Failure Rates

Introduction

This paper summarizes the derivation of the basic biometric failure rates using the framework of elementary probability theory. It treats the most general cases to include the unbiased reality where biometric performance present as time-depending data which are different not only for different biometric systems under test but also for different biometric individuals.
The basic biometric terms are found in the [BioFAQ]. As shown in Fig. 1, the main components of a biometric recognition system necessary to process biometric characteristics are the capture device, the feature extraction, the comparison and decision block and the enrolment database.
Biometric sample
Biometric features
Biometric characteristic
Biometric capture device
Biometric feature extraction
Comparison & decision
 
Biometric  enrolment database
Biometric templates
Fig. 1: Typical biometric recognition system
For example, in the case of face recognition, the capture device is a camera . It records the biometric characteristic and delivers a biometric sample. Often, the biometric sample is generated outside a PC while all other blocks may be implemented as software on the PC.
The feature extraction mainly provides the two functions "quality control" to support the success of the comparison & decision unit and the separation of the sample data from all information which is not suitable for recognition.
If the features could be successfully extracted from the sample data, they can be compared with previously stored reference features data resp. template data. During the comparison, the similarity between the features from a biometric sample and the features stored in the database is determined. If the similarity "score" exceeds a pre-adjusted threshold, the corresponding feature data are said to "Match". Otherwise, we have a "Non-Match". This is a typical two-valued decision. (According to ISO/IEC we use "match" not as a process but as a result.)
The process with the goal of comparing data to get a decision is called "recognition". The process with the goal to store the reference data ("enrolment") is similar to the recognition process except that the feature data are not compared but stored in the biometric enrolment database. Without enrolment no recognition is possible.
Since most biometric systems are dedicated to distinguish between authorized and non-authorized people, the concept of Genuines and Impostors has been introduced. A Genuine is a user which is enrolled in the biometric system with the intent to be recognized on demand. An Impostor is a user not enrolled in the system and thus is intended to be refused by the system. Impostor trials to get recognized anyhow are usually called attacks. Good biometric recognition systems are designed to clearly separate between Genuines and Impostors. However, due to the stochastical nature of biometric characteristics and their measurement, the limited separability of biometric features, and the imperfect realization, no biometric recognition system will be perfect. Several types of failures can be defined for a typical "1:1" comparison process:
Failure to Acquire: This failure occurs if the feature extraction (including all preceding operations) was not successful during a recognition attempt. Reasons may be inability to capture, insufficient sample quality (e.g., too noisy sample data), or insufficient number of features (e.g., too few minutiae). The probability of a Failure to Acquire event is called Failure to Acquire Rate (FTA). FTA can be adjusted by increasing or decreasing quality thresholds. Generally, a high quality threshold need not correspond to a better over-all recognition performance!
Failure to Enrol: This failure is very similar to Failure to Acquire and is defined as inability to store a new reference template. Main reason is a failing feature extraction. Often this is the only reason for Failure to Enrol. The probability of an Failure to Enrol event is called Failure to Enrol Rate (FTE or FER). If enrolment and recognition use the same building blocks, one may use different quality thresholds for enrolment and recognition. Usually, a higher threshold is chosen for enrolment since this increases performance during all subsequent recognition attempts. As a consequence, often FTE is larger than FTA.
False Non-Match: This failure happens in the comparison & decision stage, if an enrolled subject ("Genuine") is falsely not recognized because similarity does not exceed the given decision threshold. The corresponding False Non-Match Rate (FNMR) strongly depends on the decision threshold (we always assume two-valued decisions): The higher the similarity threshold, the higher the FNMR.
False Match: This failure happens in the comparison & decision stage, if a user who is not enrolled ("Impostor") is falsely recognized. The corresponding False Match Rate (FMR) strongly depends on the decision threshold (we always assume two-valued decisions): The lower the similarity threshold, the higher FMR. As a result, when changing the decision threshold, FMR and FNMR will change inversely proportional.
False Rejection: This failure corresponds to the rejection of a Genuine user without reference to its rejection reason. Often, the biometric recognition system does not reveal the kind of failure which leads to a rejection. However, besides False Acceptance, False Rejection is the only failure type which is always observable. If the internal failure rates such as FNMR and FTA are observable, the probability for a False Rejection, called False Rejection Rate FRR throughout this document, can be calculated from the internal failure rates. The formula will, however, depend on the implementation of the system.
False Acceptance: This failure type describes the event that an Impostor will be falsely recognized as Genuine. As for the False Rejection, this failure is always observable and may be calculated from the internal failure rates if these were known. The probability for a False Acceptance is called False Acceptance Rate FAR throughout this document.
As already announced, the following treatment of the failure rates is based on probabilities, not on measurements or repeated statistical trials. The problem of measuring unknown probabilities is a matter of statistics and will be treated in an extra chapter.
First, we define a sample space Ω which is the set of all possible outcomes of a (random) experiment. In our case the experiment is one pass of the enrolment or recognition process. It makes sense in the general case to consider enrolment and recognition as one process. Events are defined as subsets of Ω. The set of all events or subsets of Ω is a σ-algebra [Wikipedia] and is written Σ here.
In our case, typical outcomes are Failure to Acquire, Failure to Enrol, Match, Non-Match, each for Impostor, and Genuine. We denote these outcomes as
Outcome of a recognition process Symbol Corresponding event
Failure to Acquire for a Genuine
qg
{qg}
Failure to Enrol for a Genuine
eg
{eg}
Non-Match for a Genuine
ng
{ng}
Match for a Genuine
mg
{mg}
Failure to Acquire for an Impostor
qi
{qi}
Failure to Enrol for an Impostor
ei
{ei}
Non-Match for an Impostor
ni
{ni}
Match for an Impostor
mi
{mi}
The most simple way to get Ω is to combine all those outcomes of the system realization to be considered which completely describe a recognition process while being mutually exclusive. We always regard outcomes to be measurable. Internal results which are not measurable to the tester, are excluded. This may lead to different definitions such that the measured failure rates become incomparable! For example, if we only consider the comparison & decision stage, the set of outcomes will be Ω = {ng, mg, ni, mi}. If we add quality rejections as measurable output, we would get Ω = {qg, ng, mg, qi, ni, mi}. Here, the FMR resp. the FNMR cannot be used as quality measure for the comparison stage since in the second case it is fed with different ("filtered") data.
While outcomes are elements of Ω, events are defined as sets, respectively, as subsets of Ω. Σ, the set of all subsets of Ω, which is a power set [Wikipedia], is then given, e.g., by
Σ = {Ø, {qg}, {ng}, {mg}, {qi}, {ni}, {mi}, ..., Ω}
(1)
where Ø denotes the empty set which refers to the impossible event. If an event only contains one outcome as element, it is called elementary event.  Important combined events are typically defined by
E ≡ {Failure to Enrol} = {eg, ei}
Q ≡ {Failure to Acquire} = {qg, qi}
N ≡ {Non-Match} = {ng, ni}
M ≡ {Match} = {mg, mi}
G ≡ {Genuine} = {qg, ng, mg, eg}
I ≡ {Impostor} = {qi, ni, mi, ei}
(2)
Further events are Rejection and Acceptance, e.g., in systems with quality rejection:
R ≡ {Rejection} ≡ Ω\M := QUN = {qg, ng, qi, ni}
A ≡ {Acceptance} ≡ M = {mg, mi}
(3)
or more generally, if we include Enrolment,
R ≡ {Generalized Rejection} ≡ Ω\M := EUQUN = {eg, qg, ng, ei, qi, ni}
A ≡ {Acceptance} ≡ M = {mg, mi}
(4)
Rejections and Acceptances are those events which are visible to the users, while most other failure rates are more and less of internal nature, usually only visible to system experts.
To each set, a probability measure P in the probability space (Ω, Σ, P) [Wikipedia] can be assigned, with P(Ω) = 1 and P(Ø) = 0.
Here we list a few important formulae from probability theory [Wikipedia], we will use in the following discussion. Let X and Y be arbitrary events in Σ. Then
P(XUY) = P(X) + P(Y) - P(XY)
(5)
P(X|Y) P(Y) = P(XY)
(6)
P(X|Y) P(Y) = P(Y|X) P(X)
(7)
X and Y are stochastically independent <=> P(XY) = P(X) P(Y)
(8)
X and Y are disjoint <=> P(XY) = 0
(9)

The Simple Case

In the presence of noisy data, decision results may be different from trial to trial. For two-valued error-proned decisions we generally get four different possibilities as outcomes of a recognition trial, defined in the following decision matrix:
ng
ni
N
Non-Match
mg
mi
M
Match
G
Genuine
I
Impostor
 
Ω
The simple model is characterized by
E Ø{}
Q Ø
N{ng, ni}
M {mg, mi}
G{ng, mg}
I {ni, mi}
Ω = GUI = NUM
RΩ\M = N
AM
P(Ω) = 1 = P(GUI) = P(G) + P(I) = P(NUM) = P(N) + P(M)
(10)
because the events G and I as well as N and M are disjoint (GI=Ø, NM=Ø). Obviously the decision outcomes mg and ni are correct while mi and ng represent failures. In biometrics the following assignments are common:
Combination Denotation     Rate Probability
ng False Non-Match False Negative Type II error FNMR P(N|G)
mg Correct Match True Positive   CMR P(M|G)
ni Correct Non-Match True Negative   CNMR P(N|I)
mi False Match False Positive Type I error FMR P(M|I)
The failure rates are defined as conditional probabilities, with the a priori knowledge that the event comes, e.g., from a Genuine:
P(N|G) = P(NG)/P(G)
P(N|G) = P({ng}) / P({ng, mg})
P(N|G) = P({ng}) / (P({ng}) + P({mg})
(11)
For Impostors, simply replace G / g by I / i. The last equation uses the fact that {ng} and {mg} are disjoint sets in Σ. The statistical interpretation is quite simple: FNMR can be viewed here as the ratio between failing Genuine recognition attempts and all Genuine recognition attempts when performing sufficiently many attempts.
Rate Conditional Probability Result
FNMR P(N|G) = P(Score does not exceed threshold (Non-Match) | Genuine) = P({ng}) / (P({ng}) + P({mg}))
CMR P(M|G) = P(Score exceeds threshold (Match) | Genuine) = P({mg}) / (P({ng}) + P({mg}))
CNMR P(N|I) = P(Score dos not exceed threshold (Non-Match) | Impostor) = P({ni}) / (P({ni}) + P({mi}))
FMR P(M|I) = P(Score exceeds threshold (Match) | Impostor) = P({mi}) / (P({ni}) + P({mi}))
The rates represent the fraction of corresponding decisions such that
CMR + FNMR = 1
CNMR + FMR = 1
(12)
The False Reject Rate FRR and the False Accept Rate FAR can then generally be defined as
FRR ≡ P(R|G)
FAR ≡ P(A|I)
(13)
For the simple case this means:
FRR = P(Ω\M|G) = P(N|G) = FNMR
FAR = P(M|I) = FMR
(14)
Another general formula from probability theory is the Bayes formula with arbitrary sets X, Y being sets in Σ:
P(X|Y)P(Y) = P(Y|X)P(X)
(15)
If X=G and Y=I or X=M and Y=I, then Σ is completely spanned by the disjoint elements X and Y and we get the decompositions
P(G) = P(G|M)P(M) + P(G|N)P(N)
P(M) = P(M|G)P(G) + P(M|I)P(I)
(16)
Example: For a biometric system, 20% of the users will be Impostors, then 80% are Genuines and we get

P(G) = 0.8
P(I) = 0.2

If the False Match Rate is 0.001 and the False Non-Match Rate 0.1, i.e.

FMR ≡ P(M|I) = 0.001
FNMR ≡ P(N|G) = FNMR = 0.1

then
CMR ≡ P(M|G) = 1 − P(N|G) = 0.9
CNMR ≡ P(N|I) = 1 − P(M|I) = 0.999
P(MG) = P(G)P(M|G) = 0.72
P(NG) = P(G)P(N|G) = 0.08
P(MI) = P(I)P(M|I) = 0.0002
P(NI) = P(I)P(N|I) = 0.1998
P(M) = P(M|G)P(G) + P(M|I)P(I) = 0.7202
P(N) = P(N|G)P(G) + P(N|I)P(I) = 0.2798
P(G|M) = P(M|G)P(G)/P(M) = 0.999722...
P(I|M) = P(M|I)P(I)/P(M) = 0.000277700...
P(G|N) = P(N|G)P(G)/P(N) = 0.2859185...
P(I|N) = P(N|I)P(I)/P(N) = 0.7140814...
The meaning of these values is quite simple. From 100% users trying the biometric system, P(I) = 20% are known to be Impostors and P(G) = 80% to be Genuines. P(M) ~ 72% of the users are matched and P(N) ~ 28% rejected. 
From the users which are known to be Genuines, P(N|G) = FNMR = 10% are falsely rejected and P(M|G) = CMR = 90% are correctly matched. From the users which are known to be Impostors, P(M|I) = FMR = 0.1% are falsely matched and P(N|I) = CNMR = 99.9% were correctly rejected. 
From the users which are not matched, P(G|N) ~ 29% are Genuines and P(I|N) ~ 71.4% Impostors. From the users which are matched, P(G|M) ~ 99.97% are Genuines and P(I|M) ~ 0.028% are Impostors. 
Note, that the significant difference between, e.g., P(G|M) ~ 99.97% and P(M|G) = 90% is not a principle one but a normalization effect. The first time P(MG) = 72% has been normalized with P(M) ~ 72%, the second time with P(G) = 80%! 

Introducing Failure to Acquire

The situation becomes something more complicated, when we introduce additional outcomes related to quality rejections which affect both Genuines and Impostors although it normally will not distinguish between them. In this case the capture device or the feature extraction refuses to deliver biometric samples resp. features when a biometric characteristic is presented. That is, if 100% users try to get compared, only a percentage is really compared. The remaining users produce the event "Failure to Acquire". For simplicity we call the new event "Quality Rejection" or Q. Two new outcomes have to be included, qg  (Failure to Acquire for a Genuine) and qi (Failure to Acquire for an Impostor):
qg
qi
Q
Failure to Acquire
ng
ni
N
Non-Match
mg
mi
M
Match
G
Genuine
I
Impostor
Ω
This model is characterized by
E Ø
Q ≡ {qg, qi}
N{ng, ni}
M {mg, mi}
G{qg, ng, mg}
I {qi, ni, mi}
Ω = GUI = QUNUM
RΩ\M = QUN
AM
P(Ω) = 1 = P(GUI ) = P(G) + P(I) = P(QUNUM) = P(Q) + P(N) + P(M)
(17)
since the events G and I as well as Q, N, and M are mutually disjoint.
To keep the common definition of the Match or Non-Match rates, we have to adjust our probabilities. We now consider Matches or Non-Matches for users who are known to be Genuines or Impostors which have not been rejected by Quality control. Fig. 1 helps us to identify input and output at each stage by using the knowledge of the result of the previous stage as condition. Then we get
Input Output Remainder Denotation Rate Probability Result
G Q G\Q Failure to Acquire FTA P(Q|G) = P({qg}) / (P({qg}) + P({ng}) + P({mg}))
G\Q N G\(QUN) False Non-Match FNMR P(N|G\Q) = P({ng}) / (P({ng}) + P({mg}))
G\Q M GM Correct Match CMR P(M|G\Q) = P({mg}) / (P({ng}) + P({mg}))
           
I Q I\Q Failure to Acquire FTA P(Q|I) = P({qi}) / (P({qi}) + P({ni}) + P({mi}))
I\Q N I\(QUN) Correct Non-Match CNMR P(N|I\Q) = P({ni}) / (P({ni}) + P({mi}))
I\Q M IM False Match FMR P(M|I\Q) = P({mi}) / (P({ni}) + P({mi}))
If Q = Ø, the old relations remain valid. The formulae for the failure rates after feature extraction remain the same in any case, i.e., the quality control does not directly influence the failure rate definitions of the subsequent comparison & decision stage.
With the assumption of stochastical independence between Q and G as well as between Q and I we obtain
P(Q|G) = P(QG)/P(G) = P(Q) = P(Q|I)
(18)
using elementary probability theory. This justifies to set both terms P(Q|G) and P(Q|I) equal to FTA as already done in the table above. Using the decomposition formula, eq. (17) can be extended to
P(Q|G) + P(N|G) + P(M|G) = 1
P(Q|I) + P(N|I) + P(M|I) = 1
(19)
After some calculation efforts, we get for the False Reject Rate FRR in this special case
FRR ≡ P(R|G) = P(Ω\M|G) = P(QUN|G)
FRR = FTA + P(N|G)
(20)
When trying to find a better expression for FNMR = P(N|G\Q) we get:
P(N|G\Q) = P(N|(G(NUM)) = P(N|G) / (1 − FTA)
P(N|G) = (1 − FTA) FNMR
(21)
Inserting this in expression (20), the final result is
FRR = FTA + (1 − FTA) FNMR
(22)
The False Accept Rate FAR is given by
FAR ≡ P(A|I) = P(M|I)
(23)
For FMR = P(M|I\Q) we find
FMR = P(M|I\Q) = P(N|(I(NUM))) = P(M|I)/(1 − P(Q|I)) = FAR / (1 − FTA)
(24)
and finally
FAR = (1 − FTA) FMR
(25)
If FTA = 0, the False Reject Rate FRR meets FNMR and False Accept Rate FAR equals FMR. The only assumption we needed was independence between Q and G or I.

Generalizing for Failure to Enrol

We now generalize the failure rate definitions by including the enrolment as a cause for failures and assume the following procedures with fixed order for Genuines and Impostors:
Step Genuine Action
G1 An enrolment trial for the Genuine is performed. If enrolment fails, this refers to the elementary outcome eg.
G2 If enrolment was successful, a recognition trial starts, beginning with a quality check. If the quality check fails, the outcome qg is caused.
G3 If the quality check was successful for the Genuine, a comparison with the previously enrolled biometric template is done. The result of the comparison with subsequent decision will either be mg (matched) or ng (nonmatched).
Step Impostor Action
I1 An enrolment trial for a Genuine is performed (without enrolled Genuine no Impostor result!). If enrolment fails, this refers to the outcome ei for the Impostor.
I2 If enrolment of the Genuine was successful, a recognition trial for the Impostor starts, beginning with a quality check of the biometric sample. If the quality check fails, the outcome qi for the Impostor is caused.
I3 If the quality check was successful for the Impostor, a comparison with the previously enrolled biometric template of the Genuine is done. The result of the comparison with subsequent decision will either be mi (matched) or ni (nonmatched).
The set Ω can be visualized by the following table
eg
ei
E
Failure to Enrol
qg
qi
Q
Failure to Acquire
ng
ni
N
Non-Match
mg
mi
M
Match
G
Genuine
I
Impostor
 
Ω
The main properties of this model are
E ≡ {eg, ei}
Q ≡ {qg, qi}
N{ng, ni}
M {mg, mi}
G{eg, qg, ng, mg}
I {ei, qi, ni, mi}
Ω = GUI = EUQUNUM
RΩ\M = EUQUN
AM
P(Ω) = 1 = P(GUI ) = P(G) + P(I) = P(EUQUNUM) = P(E) + P(Q) + P(N) + P(M)
(26)
since the events G and I as well as E, Q, N, and M are mutually disjoint.
First we observe that all events E, Q, N, and M are represented as disjoint subsets of Ω which completely fill Ω. As a result, these events cannot be independent unless their probability is zero. Generally, the larger E and Q, the smaller N and M, etc., when measured by P. The recognition procedure described above will mainly determine the formulae for the failure rates. Further instructions have to be fixed to get comparable results. For example, if exactly the same biometric sample has been used for enrolment and comparison, the match rate will differ significantly. (It can be assumed that for the same sample and the same feature extraction procedure for enrolment and recognition, the event M will arise with much higher probability than for different samples of the same biometric characteristic!)
Following the procedure described above, we define the (internal) failure rates for each the Genuines and the Impostors by assuming the knowledge of the input and the result of the previous process step to get the conditional probabilities:
Input Output Remainder Denotation Rate Probability Result
G E G\E Failure to Enrol FTE P(E|G) =P({eg}) / (P({eg})+P({qg})+P({ng})+P({mg}))
G\E Q G\(EUQ) Failure to Acquire FTA P(Q|G\E) =P({qg}) / (P({qg})+P({ng})+P({mg}))
G\(EUQ) N G\(EUQUN) False Non-Match FNMR P(N|G\(EUQ)) =P({ng}) / (P({ng})+P({mg}))
G\(EUQ) M GM Correct Match CMR P(M|G\(EUQ)) =P({mg}) / (P({ng})+P({mg}))
           
I E I\E Failure to Enrol FTE P(E|I) =P({ei}) / (P({ei})+P({qi})+P({ni})+P({mi}))
I\E Q I\(EUQ) Failure to Acquire FTA P(Q|I\E) =P({qi}) / (P({qi})+P({ni})+P({mi}))
I\(EUQ) N I\(EUQUN) Correct Non-Match CNMR P(N|I\(EUQ)) =P({ni}) / (P({ni})+P({mi}))
I\(EUQ) M IM False Match FMR P(M|I\(EUQ)) =P({mi}) / (P({ni})+P({mi}))
Similar to Q we assume independence of E from G or I:
P(EG) = P(E)P(G)
P(EI) = P(E)P(I)
(27)
As a result
P(E|G) = P(E|I)
(28)
This time, the Rejection event R additionally includes E. The False Reject Rate FRR and the False Accept Rate FAR are defined by (26). After some calculation we obtain for the case of included Failure to Enrol
FRR = P(Ω\M|G) = (P({eg}) + P({qg}) + P({ng})) / (P({eg}) + P({qg}) + P({ng}) + P({mg}))
FRR = FTE + (1 - FTE) FTA + (1 - FTE)(1 - FTA) FNMR
(29)
where
FTE = P(E|G) = P(E|I)
FTA = P(Q|G\E) = P(Q|I\E)
FNMR = P(N|G\(EUQ))
(30)
FAR is given by
FAR ≡ P(M|I) = P({mi}) / (P({ei}) + P({qi}) + P({ni}) + P({mi}))
FAR = (1 − FTE) (1 − FTA) FMR
(31)
with
FTE = P(E|G) = P(E|I)
FTA = P(Q|G\E) = P(Q|I\E)
FMR = P(M|I\(EUQ)
(32)
False Accept Rates and False Reject Rates defined this way are also called Generalized FAR and Generalized FRR [BioFAQ, ISO/IEC 19795-1]. Advantage of this definition is the incorporation of FTE and FTA to try to make measurements more comparable. This is also of importance since there may be an indirect influence of FTE and FTA on FMR and FNMR. For example, if we keep away Impostor attacks with low quality characteristics this may decrease FMR. On the other hand, a good quality control will usually decrease also FNMR. Nevertheless, too much quality control may increase FAR and FRR unnecessarily.
Due to elementary constraints on the subsets of Ω the following compilation shows a few elementary relationships (CAR ≡ 1 − FRR):
P(E|G) + P(Q|G) + P(N|G) + P(M|G)
 = 1 =  FTE + (1 − FTE) FTA + (1 − FTE) (1 − FTA) FNMR + CAR
P(E|I) + P(Q|I) + P(N|I) + P(M|I)
 = 1 =  FTE + (1 − FTE) FTA + (1 − FTE) (1 − FTA) CNMR  + FAR
P(Q|G\E) + P(N|G\E) + P(M|G\E)
 = 1 =  FTA + (1 − FTA) FNMR + (1 − FTA) CMR
P(Q|I\E) + P(N|I\E) + P(M|I\E)
 = 1 = FTA + (1 − FTA) CNMR + (1 − FTA) FMR
P(N|G\(EUQ)) + P(M|G\(EUQ))
 = 1 =  FNMR + CMR
P(N|I\(EUQ)) + P(M|I\(EUQ))
 = 1 =  CNMR + FMR
(33)
Same colors in a row mark mathematically equivalent terms. The third and fourth equation results from the first and second one if dividing by (1 − FTE). The same happens with the last two equations when dividing the third and fourth one by (1 − FTA).
Modification for Scenario Testing: Sometimes, a further definition is used [ISO/IEC 19795-1]. In scenario tests, false matches are tried with templates which are enrolled for distinct users. That is, Genuine users also act as Impostors. However, users with a Failure to Enrol will not further participate in the test, neither as Genuine (trivial) nor as Impostor. This reduces the number of potential Impostors by a factor 1 − FTE without affecting FMR. That is, the FAR is too high and must be corrected. To see how to implement this, we consider this effect by adding an additional step I1a to the procedure for Impostors.
Step Genuine Action
G1 An enrolment trial is performed. If enrolment fails, this refers to the elementary outcome eg.
G2 If enrolment was successful, a recognition trial starts, beginning with a quality check. If the quality check fails, the elementary outcome qg is caused.
G3 If the quality check was successful, a comparison with the previously enrolled biometric template is done. The result of the comparison with subsequent decision will either be mg (matched) or ng (nonmatched).
Step Impostor Action
I1 An enrolment trial for the genuine is performed. If enrolment fails, this refers to the elementary outcome ei for the Impostor.
I1a New: An enrolment trial for the Impostor is performed. A failing enrolment defines an elementary outcome fi for the Impostor.
I2 If enrolment of the genuine was successful, a recognition trial for the Impostor starts, beginning with a quality check of the biometric sample. If the quality check fails, the elementary outcome qi for the Impostor is caused.
I3 If the quality check was successful for the Impostor, a comparison with the previously enrolled biometric template of the genuine is done. The result of the comparison with subsequent decision will either be mi (matched) or ni (nonmatched).
The set of outcomes, Ω, is now extended such that
eg
ei
E
Failure to Enrol
 
fi
F
Failure to Enrol
qg
qi
Q
Failure to Acquire
ng
ni
N
Non-Match
mg
mi
M
Match
G
Genuine
I
Impostor
Ω
It reflects that for Genuines one enrolment is sufficient, while for Impostor trials one enrolment for the Genuine is required and one enrolment for the Impostor is anticipated (although not really required).
The main properties of this model are
E ≡ {eg, ei}
F ≡ {fi}
Q ≡ {qg, qi}
N{ng, ni}
M {mg, mi}
G{eg, qg, ng, mg}
I {ei, fi, qi, ni, mi}
Ω = GUI = EUFUQUNUM
RΩ\M = EUFUQUN
AM
P(Ω) = 1 = P(GUI ) = P(G) + P(I) = P(EUFUQUNUM) = P(E) + P(F) + P(Q) + P(N) + P(M)
(34)
since the events G and I as well as E, F, Q, N, and M are mutually disjoint.
Step Input Output Remainder Denotation Rate Probability
G1 G E G\E Failure to Enrol FTE P(E|G)
G2 G\E Q G\(EUQ) Failure to Acquire FTA P(Q|G\E)
G3 G\(EUQ) N G\(EUQUN) False Non-Match FNMR P(N|G\(EUQ))
G3 G\(EUQ) M GM Correct Match CMR P(M|G\(EUQ))
             
I1 I E I\E Failure to Enrol 1 FTE1 P(E|I)
I1a I\E F I\(EUF) Failure to Enrol 2 FTE2 P(F|I\E)
I2 I\(EUF) Q I\(EUFUQ) Failure to Acquire FTA P(Q|I\(EUF))
I3 I\(EUFUQ) N I\(EUFUQUN) Correct Non-Match CNMR P(N|I\(EUFUQ))
I3 I\(EUFUQ) M IM False Match FMR P(M|I\(EUFUQ))

We assume that E, F, and Q are independent of G and I. Then
FAR = (1 − FTE1) (1 − FTE2) (1 − FTA) FMR
(35)
If the enrolment system for Genuines and Impostors is the same, we have FTE1 = FTE2 = FTE:
FAR = (1 − FTE)² (1 − FTA) FMR
(36)
FRR is not affected:
FRR = FTE + (1 − FTE) FTA + (1 − FTE)(1 − FTA) FNMR
(37)

Estimating Biometric Failure Rates from Experiments

When trying to measure the probabilities defined so far, a series of, say K, experiments has to be accomplished and the outcomes have to be observed. In our case an experiment is equal to a recognition attempt or trial, including enrolment. Usually, the more experiments are performed, the better the probabilities can be estimated. However, to get the best possible approximations, this requires that all experiments are executed under exactly the same conditions and requires that the stochastical behavior of the biometric characteristics and of the recognition system do not change during all experiments. Furthermore, all experiments should be stochastically independent. In the following we discuss what has to be observed to achieve this goal or what the consequences are, if the assumptions do not hold.

Time varying probabilities

Biometric characteristics are time depending. Even in the absence of random effects, the similarity of two different biometric characteristics will change over time. There may be short-term as well as long-term effects such as personal constitution / temperature or growth / aging, respectively. This makes our probabilities time dependent. To measure such probabilities requires to execute our experiments within a time interval which is small enough that changes do not introduce additional errors.

Double time dependence

If we are dealing with time depending biometric characteristics, we have to observe that enrolment and recognition have to be performed at the same time. This is not very realistic as enrolment practically is only done once and is planned to last over years, if possible. As a result, our probabilities get a second time variable, one for enrolment time instant and one for recognition time instant.

Stochastical dependence introduced by user behavior

User behavior is a significant source of failures, especially when regarding FRR. If user behavior changes, this adds to time dependence. Furthermore, if the user is able to react on the outcome of a previous experiment (learning effect), this will introduce stochastical dependencies, if the same user is involved in a series of experiments.

Stochastical dependence introduced by system variations

Also the technical equipment may add to dependencies between experiments. For example, sensor contamination during preceding experiments may deteriorate the system performance for subsequent users. This may require cleaning after each experiment to keep stochastical independence.

Individual similarities

Generally, in a 1:1 comparison, the result of an experiment has to do with the similarity between two biometric characteristics, i.e., the reference stored during enrolment and the actual sample. If the similarity is defined as a continuous measure, its amount will obviously be different for different pairs of biometric characteristics even in the absence of any random effects. That is, failures are not exclusively introduced by random effects. Also deterministic properties (deterministic means to deliver always the same result (outcome) in a series of experiments) such as low decision thresholds, bad algorithms for feature extraction or comparison, and pairs of too similar characteristics may cause (deterministic) errors. As a result, the failure measures defined so far only have a meaning for the same pair of biometric characteristics unless we extend the definition, e.g., by considering the biometric characteristics as "pseudo-random".
This effect is known from statistical differences between inter-characteristic and intra-characteristic measurements. In [Observations on Genuine Scores, Observations on Impostor Scores] it has been shown that the statistical variance of similarity scores is significantly smaller for comparison series with different samples of the same Genuine finger pair as with different Genuine finger pairs. The reason is quite simple. While for intra-characteristic measurements mostly random effects determine the variance, in the inter-characteristic case the "pseudo-random" differences in similarity score have to be added. To understand these differences we assume random effects to be zero. Then only the similarity score of a pair of characteristics is essential. Two cases are to be considered:
  • Depending on comparison algorithms, two sample pairs of different Genuine characteristics may constantly deliver the same different similarity scores, e.g., when sample pairs from different fingers have different number of features. (Note: In purely metric systems and the absence of stochastical effects and other errors the distance between samples of the same biometric characteristic should always be zero. As a result, there is no difference between different sample pairs in this case.)
  • Due to a different degree of dissimilarity, two different pairs of Impostor characteristics may constantly deliver different similarity scores. This is due to the fact that any distinct biometric characteristics have a non-zero similarity.
  • Statistical deficiencies introduced by one-time enrolment

    In a series of experiments, it cannot be expected that performing only one enrolment during the first experiment and then re-use the enrolment data data for all further experiments, will deliver the same statistical results as for enroling each time. It is easily realized that the statistical enrolment deviations are frozen by one-time enrolment and will not improve the estimation as the number of experiments increase. That is, a non-optimal but accepted enrolment will keep the recognition failures higher than necessary, independent of the number of experiments. (One way to escape this situation is to perform only one experiment per specific pair of characteristics. Instead, a statistics over many different pairs of characteristics is considered.)
    For a more detailed discussion let as assume that the experiment is performed K times using the generalized model (Generalizing for Failure to Enrol) which includes enrolment failures. We distinguish two cases:
    Case 1 Each experiment is a stochastically independent repetition of the first experiment.
    Case 2 The first experiment includes enrolment, all subsequent experiments take over the enrolment result of the first step.
    The stochastical model of an K-fold experiment can be generally described by the sample space
    Ω := Ω1 x Ω2 x Ω3 x ... x ΩK,
    (38)
    where the samples spaces Ωn with n = 1, 2, ..., K contain all outcomes of an individual experiment. Ω contains all outcomes of the series of K experiments. Finally, let P be the probability measure for subsets of Ω and Pn the corresponding measures for the Ωn. We take over the definitions of the general model (26), but without distinguishing between Genuines and Impostors (we assume stochastical independence and time invariance; to get the corresponding failure rates then only two series of experiments have to be performed, one for Genuines and one for Impostors),
    en
    En
    Failure to Enrol
    qn
    Qn
    Failure to Acquire
    nn
    Nn
    Non-Match
    mn
    Mn
    Match
       
    Ωn
    such that
    Ωn := {en, qn, nn, mn}
    En := {en}
    Qn := {qn}
    Nn := {nn}
    Mn := {mn}
    Ω := {all combinations ω = (a1, a2, a3, ..., aK) with Elements an of Ωn, n = 1,2, ..., K }
    (39)
    Ωn contains 4 outcomes as elements, thus Ω will contain 4K outcomes. After K trials exactly one of the 4K possibilities will be realized, e.g., ω = (e1, m2, e3, n4, n5, n6, e7, ..., mK). The outcomes of a multiple experiment can be visualized as a decision tree [Wikipedia]. The following formulae are valid in the general case with A = {(a1, a2, a3, ..., aK)} being an elementary event as subset of Ω and An = {an} being any elementary event as subset of Ωn
    P(A) = P1(A1) P2(A2|A1) P3(A3|A1UA2) ... PK(AK|A1UA2U...UAK-1)
    (40)
    If B is an arbitrary event as subset of Ω , we get
    P(B) = 
     
    ∑
    A
     P(A)
    (41)
    where the sum includes all subsets A of B.
    When trying to estimate failure rates, we are interested to count the individual outcomes en, qn, nn, mn during K trials. As already noted, there are 4K different composite outcomes ω after K experiments. To count the number of similar outcomes in such a vector, we define a set of four random variables 
    C ≡ (Ce, Cq, Cn, Cm)
    (42)
    Ce: ΩIR and ω → Ce(ω)
    Cq: ΩIR and ω → Cq(ω)
    Cn: ΩIR and ω → Cn(ω)
    Cm: ΩIR and ω → Cm(ω)
    (43)
    C counts the number of occurrences of singular outcomes e, q, n, resp., m in a specific composite outcome ω. If, for example, the result of K = 10 experiments is ω = (m1, m2, e3, n4, m5, e6, q7, m8, n9, m10), then Ce(ω) = 2, Cq(ω) = 1, Cn(ω) = 2, and Cm(ω) = 5 or C(ω) = (2,1,2,5).
    Generally, we have for all ω in Ω
    Ce(ω) + Cq(ω) + Cn(ω) + Cm(ω) = K
    (44)
    The probability distribution of C is given by
    P(C=(k,l,m,n)) ≡ P({ω| Ce(ω)=k,Cq(ω)=l,Cn(ω)=m,Cm(ω)=n}) = 
    {
     0 if k+l+m+n≠K
     p(k,l,m,n) if k+l+m+n=K
    (45)
    P(C=(k, l, m, n)) is quite useful when determining the statistical behavior of estimations to the biometric failure rates. In many practical cases, it can be calculated from the probabilities in Ωk , 1 ≤ k ≤ K, if the properties of the single experiments are known.
    Case 1: If all K experiments are independent, if additionally Ω1 = Ω2 = Ω3 = ... = ΩK, such that
    Ω := Ω1K := {e, q, n, m}K =  (EUQUNUM)K =
    = {ω = (a1, a2, ...,aK) | aj = e, q, n, or m for j = 1, 2, ..., K},
    (46)
    and if P1 = P2 = ... = PK, the probability distribution P(C=(k,l,m,n)) can be shown to be a multinomial distribution [Wikipedia]:
    P(C=(k,l,m,n)) = Mul(K; k,l,m,n; pe,pq,pn,pm)
    (47)
    where pe, pq, pn, and pm are abbreviations for P1(E), P1(Q), P1(N), and P1(M), respectively.
    If we are only interested in one component of Ω1, e.g., e, then Ce can be shown to follow a binomial distribution [Wikipedia].
    MMP(Ce = k) ≡ P({ω | Ce(ω) = k}) = (
    k
    K
    )pek(1 − pe)K−kM
    (48)
    The mean value and variance of Ce are given by
    E(Ce) = K pe
    Var(Ce) ≡ E(Ce²) − E(Ce)² = K pe(1 − pe)
    (49)
    Ce is important when trying to estimate FTE := pe by a series of K experiments. FTE can be approximated by F̃̃T̃̃ẼK such that
    F̃̃T̃̃ẼK := 
    Ce
    K
    E(F̃̃T̃̃ẼK) = 
    pe = Pn(E) = FTE
    Var(F̃̃T̃̃ẼK) = 
    pe(1 − pe)
    K
     = 
    FTE (1 − FTE)
    K
    (50)
    Obviously, F̃̃T̃̃ẼK is an unbiased estimation to pe = FTE. As K increases, the "estimation failure" measure Var(F̃̃T̃̃ẼK) becomes smaller and smaller, as we would expect it for a proper trial design.
    Now we will investigate how enrolment failures influence FAR and FRR. To simplify notation, we expect separated experiments for Genuines and Impostors so that it will be sufficient to consider Accept Rate AR and Reject Rate RR as well as their estimations instead. RR is defined by
    RR ≡ P1(Ω1\M) = P1(M) = P1(E) + P1(Q) + P1(N) = 1 − pm = pe + pq + pn
    (51)
    where M denotes the complement of M. We first estimate the probabilities by
    e := 
    Ce
    K
    q := 
    Cq
    K
    n := 
    Cn
    K
    m := 
    Cm
    K
    R̃R̃K := 1 − p̃m
    (52)
    and then RR by
    R̃R̃K := 1 − 
    Cm
    K
    (53)
    Since Cm is also distributed binomially, we get
    E(p̃q) = pq
    E(p̃n) = pn
    E(p̃m) = pm
    E(R̃R̃K) = P1(M) = 1 − P1(M) = 1 − pm = RR
    (54)
    and with P1(E) + P1(Q) + P1(N) + P1(M) = 1
    Var(p̃q) = 
    pq(1 − pq)
    K
    Var(p̃n) = 
    pn(1 − pn)
    K
    Var(p̃m) = 
    pm(1 − pm)
    K
    Var(R̃R̃K) = 
    Var(Cm)
     = 
    pm(1 − pm)
    K
     = 
    (1 − RR) RR
    K
     = 
    (1 − pe − pq − pn)(pe + pq + pn)
    K
    (55)
    Similarly, the Accept Rate AR is defined by
    AR ≡ P1(M) = pm
    (56)
    and will be estimated by
    ÃR̃K :=
    Cm
    K
    (57)
    Then
    E(ÃR̃K) = 
    P1(M) = pm = AR
    Var(ÃR̃K) = 
    P1(M) (1 − P1(M))
    K
     = 
    AR (1 − AR)
    K
    (58)
    Obviously, the variances of R̃R̃N and ÃR̃N are equal and tend to approach zero as K approaches infinity.
    Case 2: We assume that only the first experiment performs an enrolment. All other trials take over the enrolment result of the first step such that En = E1 for 2 ≤ n ≤ K. This leads to
    P(E2|E1) = P(E3|E1UE2) = P(E4|E1UE2UE3) = ... = 1
    (59)
    and
    P(E) = P1(E1) =: pe
    (60)
    where E = {e1, e2, e3, ..., eK}. Unlike case 1, in case 2 the first experiment is different from the following ones. So we cannot take the multinomial distribution to model the whole series of experiments. We start with the calculation of the enrolment failure probability pe = P1({e1}) = FTE (50). First we assume Ω1 = Ω2 = Ω3 = ... = ΩK
    KΩ :=  Ω1K := {e, q, n, m}K =  (EUQUNUM)K =
    {ω = (a1, a2, ...,aK) | aj = e, q, n, or m for j = 1, 2, ..., K}
    (61)
    That is, we expect an enrolment error in all experiments, not only the first one. The enrolment result of the first experiment will only be continued to all other experiments. The number k of occurrences of outcome e in ω is given by Ce and results in the simple distribution
    P(Ce = k) ≡ P({ω | Ce(ω) = k}) = 
    {
     1 − pe if k = 0
     pe if k = K
     0 else
    (62)
    The meaning is quite simple: If no enrolment failure occurs in the first experiment, it will occur in none of the K experiments. The probability for this case is 1 - pe. If an enrolment failure occurs in experiment 1, it will occur in all K experiments with total probability pe. Other enrolment failure counts than 0 and K do not exist and thus have probability 0.
    The mean value and variance of Ce is then given by
    E(Ce)
     = Kpe
    Var(Ce)
     ≡ E(Ce²) − E(Ce)² = K²pe − K²pe² = K² pe(1 − pe)
    (63)
    In a series of K experiments, FTE can be approximated by F̃̃T̃̃Ẽ such that
    F̃̃T̃̃Ẽ := 
    Ce
    K
    E(F̃̃T̃̃Ẽ) = 
    pe = FTE
    Var(F̃̃T̃̃Ẽ) = 
    pe(1 − pe) = FTE (1 − FTE)
    (64)
    That is, F̃̃T̃̃Ẽ is an unbiased estimation to FTE. However, as K increases, the "estimation failure" Var(F̃̃T̃̃Ẽ) remains constant! Comparing this with (50), case 2 will always have a higher probability for enrolment failures after K experiments than case 1, except for K = 1 where both scenaries coincide.
    To investigate the behavior of the estimations to AR and RR, we have to calculate the distribution of all components of C, i.e., Ce, Cq, Cn, and Cm. It is easily shown that the composite distribution of C is given by
    P(C := (Ce, Cq, Cn, Cm) = (k, l, m, n)) = pe δkN δ0lmn + (1 − pe) δk0 Mul(K; l,m,n; pq|e,pn|e,pm|e)
    (65)
    where Mul(K; l,m,n; pq|e,pn|e,pm|e) is the multinomial distribution [Wikipedia] for the conditional probabilities P((Cq,Cn,Cm)=(l,m,n)|E):
    lMul(K; l, m, n; pq|e, pn|e, pm|e) ≡ 
    K!
    l! m! n!
    pq|el pn|em pm|en
    (66)
    If enrolment was successful, all experiments are independent with same probabilities. In this case we have l+m+n=K because k=0. If k≠0, l+m+n=0 and thus Mul(K; l,m,n; pq|e,pn|e,pm|e) = 0. Using the probability distribution (65), all other probabilities can be calculated. As in case 1 we define:
    e := 
    Ce
    K
    q := 
    Cq
    K
    n := 
    Cn
    K
    m := 
    Cm
    K
    R̃R̃K := 1 − p̃m
    (67)
    Then the expected values are
    E(p̃e) = pe
    E(p̃q) = (1 − pe) pq|e = pq
    E(p̃n) = (1 − pe) pn|e = pn
    E(p̃m) = (1 − pe) pm|e = pm
    E(R̃R̃K) = 1 − pm
    (68)
    yielding unbiased estimations. More exciting are the variances:
    Var(p̃e) =  pe (1 − pe )
    Var(p̃q) = 
    pq
    K
    (1 − pq
    1 − Kpe
    1 − pe
    )
    Var(p̃n) = 
    pn
    K
    (1 − pn
    1 − Kpe
    1 − pe
    )
    Var(p̃m) = 
    pm
    K
    (1 − pm
    1 − Kpe
    1 − pe
    )
    (69)
    Especially, since R̃R̃K = 1 − p̃m
    Var(R̃R̃K) = 
    pm
    K
    (1 − pm
    1 − Kpe
    1 − pe
    )
    (70)
    Note that pm can be replaced by 1 − p − pq − pn. If FTE = pe = 0, the variances in case 2 reduce to those of case 1. The same holds, if K = 1. However, if K → ∞, the variances do not approach zero as we do expect from a well designed measurement:
    lim
    Var(R̃R̃K) = pm² pe / (1 − pe)
    K→∞
    (71)
    As a consequence of (71), with R̃R̃K, the estimation error cannot be reduced below a certain limit, whatever the number K of trials is! Especially in systems with large enrolment failure rates, and this is a common case, the value of increasing K is quite limited.

    Revision history
    2009-09-08: typos corrected
    2009-10-06: typo corrected
    2009-11-13: typos corrected
    2010-01-12: typo corrected
    2010-02-12: typos corrected, eq. 24: right parenthesis added
    2010-12-23: index N in eq. (40) and the text before replaced by K
    2011-06-11: Simple case: "error-proned" replaced by "error-prone"
    2011-06-11: Individual similarities: "Depending on comparison algorithms...": "sample" introduced for clarification
    2011-06-11: Same place: note added

    ()
    Cm
    K
    N
    ∑
    n=1
    N
    n=1
    N
    n=1
    N
    ∑
    n=1