For the proper display of all symbols in this document the font "Arial Unicode MS" must be installed

Performance Measures for Biometric Recognition

Manfred U. A. Bromba

First release: 2010-08-31  Status: 2012-01-15
This paper describes the performance of biometric recognition of individuals with the aid of distance measures in metric spaces. The difference between intra- and inter-characteristic definitions is discussed as well as its influence on the definition of biometric failure rates. On the basis of these definitions it is shown with the help of numerous examples how recognition performance can be influenced by proper enrolment, by adding individuality information (fusion), and by multiple sampling during recognition.

1. Introduction
2. Biometric Features and Metric Spaces
2.1 Feature spaces
2.2 Distance measurements
3. Biometric Recognition in Metric Spaces
3.1 Recognition
3.2 Enrolment - Verification - Identification
4. Biometric Performance Measures
4.1 Definition of Match and Non-Match
4.2 Separability
4.3 Intra- Versus Inter-Characteristic Failure Rates
5. Performance Parameters
5.1 Enrolment strategy
5.2 Adding feature components
5.3 Multiple sampling during recognition
6. References
7. Symbols

1. Introduction

One of the main tasks of biometric recognition systems is the comparison of biometric features. If the distance, measured between two features, is small enough, the features are attributed to the same individual. This suggests to represent biometric features as (multidimensional) points in metric spaces and to use the well-established term metric as measure for the distance (or inversely: similarity) between two features.

This paper extends the elementary intra-characteristic derivation of biometric failure rates to the case where the pseudo-statistical properties of an ensemble of biometric features from different biometric subjects are considered. While the intra-characteristic consideration only treats the stochastical properties of single comparisons between sample pairs of the same or two different characteristics, the inter-characteristic analysis extends the definitions to all pairs of different characteristics within the ensemble. This greatly helps to understand the fact that different pairs of features show quite different properties in real life. The main targets are to prepare the basics for optimum biometric recognition procedures and the cost-efficient evaluation of their performance.

2. Biometric Features and Metric Spaces

Automated recognition using biometric characteristics is based on the availability of biometric features which fulfill the following requirements as far as possible [BioFAQ]:
 
BCR1. Uniqueness (distinctness between individuals)
BCR2. Measurability
BCR3. Universality
BCR4. Permanence

A typical biometric recognition system can be separated into four function blocks:

Biometric sample
Biometric feature
Biometric Characteristic
Capture Device
Modality Detection
Feature Extraction
Comparison & Decision
 
Biometric  enrolment database
Biometric templates
Fig. 2.1: Typical biometric recognition system

During capturing a sample of the biometric characteristic is recorded which must "contain" the biometric feature. The Modality Detection is responsible for the availability of the desired modality (finger, face, iris, etc.) and may refuse images which do not meet predefined quality and other requirements, such as liveness. During feature extraction those information is omitted as far as possible which does not contribute to the requirements of uniqueness, universality, and permanence. The extracted feature is the basis for comparison. It can be expressed mathematically as a multi-dimensional vector or at least as a list of property parameters.

The main task of feature extraction can alternatively be described as a method to reduce dimensionality. For example, many biometric systems start with capturing a 2D image of the biometric characteristic using a digital black & white camera. If the image sensor has N x M pixels, ideally 1 dimension per pixel, i.e., NM dimensions, can be considered, where each dimension typically may represent 256 gray values. If the image represents a fingerprint, the dimension will reduce, e.g., to the number of extracted minutia multiplied by the number of minutia properties. The number of minutia properties is typically 4: horizontal position, vertical position, direction, and type (ending or bifurcation).

An important measure to reduce dimensionality is alignment. Alignment can be expressed as a projection operator with the property that already aligned data will not change if alignment is repeated. For example, the most common 1-dimensional feature of human is height. In a 3-dimensional Euclidean space, an advantageous alignment is the projection of the feet to the coordinate (x1,y1,z1)=(0,0,0) and the top of the head to (x2,y2,z2)=(0,0,h) where h is the height of the person. In this case, the coordinate z bears all information required. This alignment can be attained in at least two ways: Either the person is asked to stand upright at a certain point or the coordinates of feet and head are measured and then projected to (0,0,h) by using two translations and one rotation. This way the number of relevant dimensions reduces from 6 to 1. This example also demonstrates that alignment is not only a matter of feature extraction but also of the capturing process. In cases where alignment is not completely possible, even the comparison function is affected. For example, while iris images can simply be aligned in horizontal and vertical directions, rotation cannot be compensated for, especially if only one eye is captured. In such cases, the comparison stage has to try all rotations until the "distance" between the features approaches a minimum.

Sometimes, the camera itself delivers unwanted projections. For example, faces are embedded in the 3 dimensional Euclidean space. If the image is taken by a 2D camera, certain rotations are difficult to compensate, if the face is not aligned mechanically in front of the camera. Using 3D cameras, such unintentional rotations can be simply compensated in the digital domain. Additionally, the features to be extracted are hardly affected by illumination in the 3D case [3DFaceProject].

For fingerprints, alignment in the third direction is done mechanically by pressing the finger on the 2D sensor surface. During feature extraction, alignment is more difficult. Horizontal and vertical alignment is possible, if the fingerprint pattern would show a common property for orientation. Such a property could be the core of the ridge pattern flow. Unfortunately, small sensors may not show the center at all and many fingerprints show more than 1 center. But even if a core is found, this only allows translational alignment. Rotations remain a problem which can be solved by finding the best orientation during comparison, e.g., by trial and error. This will, however, increase computation time significantly.

Beside their distinctive properties, most biometric characteristics also show common properties which can be used to detect their presence after capturing. Such properties are only distinctive between different biometric modalities, not between different characteristics of the same modality. Although these properties cannot be used for recognition of individuals they are advantageous especially for alignment. Examples are the ridge structure of fingerprint, the presence of eyes, nose and mouth in faces, the general outline of a hand with five fingers and the special shape of an eye with pupil, iris and surrounding. Ultimately, detection and recognition use the same mathematical procedures. In this context, we will understand Modality Detection as recognition based on known common properties with respect to different individuals of a biometric characteristic (which define the modality), whereas Feature Detection is based on the known distinct properties with respect to different individuals of the same biometric characteristic. If said common properties are not present, no feature processing need follow, and the recognition fails anyway.

Since property detection and property estimation are mathematically similar tasks, detection may be viewed as an add-on of estimation. (Usually, detection requires that some measure of the parameters exceeds a certain threshold.) Generally, it must be regarded that the functional blocks capturing, detection, extraction, and comparison cannot always be separated exactly.

2.1 Feature Spaces

We start with the set of subjects which shall be distinguished only by their biometric (or other) features. This set shall contain K ∈ ℕ subjects sn with 1 ≤ n ≤ K and is written
S ≔  {s1, s2, s3, ..., sK}
(2.1)
such that |S| = K. From this set S which defines the set of all possible subjects we can build the subset S of subjects which shall be known to the biometric system ("subjects under consideration"). For simplicity we assume that just the first N members of S are also the members of S such that |S| = N ≤ K:
S ≔  {s1, s2, s3, ..., sN}
(2.2)
Usually, the subjects sn are indistinguishable (except for the virtual indices n), but not the same. To make them distinguishable, we next consider the space A ("attributes") of measurable properties which can be attributed to a subject. More concretely, A shall not only contain all possible elementary properties but also all combinations of properties as ordered n-tuples [Wikipedia], usually created as Cartesian products [Wikipedia] of sets. A contains all possible properties, i.e., properties which are required to distinguish individuals, properties which define a biometric or non-biometric modality, and so on.

To be able to distinguish individuals, uniqueness is required. For this purpose we attribute certain properties anA to a subject snS such that an individual ⓘn is created:
n ≔ (sn, an) ∈ ("I Fraktur")
(2.3)
where
S x A
(2.4)
is a set of individuals. The trick is now that we claim all elements of A to be measurable. As a result, we are able to determine the "difference" between two individuals ⓘn and ⓘm by measuring the distance d between their attributes as d(an, am). Only if their attributes are different, subjects can be distinguished and become individuals. In this sense, uniqueness is only defined for individuals, not for attributes and means that there is no second individual with the same attributes. This requires, of course, that there are at least as many different attributes as subjects.

Here we understand an individual to be separable, not necessarily to be unique as this is defined, e.g., in [Wikipedia]. Separability is guaranteed by the subject's virtual index only. This index is not measurable. For that reason we need measurable attributes to distinguish individuals practically.

The pairing (sn, an) may be of different strength and of different origin. Generally, individuality may be awarded artificially (password, RFID, clothes) or may be a naturally fixed part of the subject's body (face, iris, fingerprints).

As already mentioned, the space A itself may be a Cartesian product of elementary attribute spaces. This allows us to describe an attribute an as vector with arbitrary many components, e.g.,
an = (color, height, name, history, ...)
(2.5)
helping us to create more combinations so that we have enough different attributes an to distinguish all individuals in

On the other hand, attributes which are common for a non-empty set of subjects, may be of interest. If is the set of all individuals, e.g., all living humans, then a population is a subset (P script) of which is united by certain common properties of its members. Such properties could be profession, gender, race, hobby, name, etc. Of special interest are populations which have a measurable influence on biometric recognition. Regarding non-idealities, special populations may show better or worse failure rates, such as the population of bricklayers which may have problems with fingerprint recognition. When determining failure rates for biometric recognition systems, the properties of a population are essential when comparing the rates between different applications.

Common properties can also be used to define biometric modalities like behavior type, face geometrics, fingerprint pattern, iris pattern, or any other property like knowledge (passwords), possession (smartcard, key, wear), or marks like scars and tattoos. Here the focus is on common properties: What is common to all faces or to all fingerprints or to all passwords? The question, "What distinguishes all password from all faces or from noise?"  can be used for modality detection to decide whether the desired biometric (or other) characteristic has been measured or not. For modality detection, the distinctive properties of a characteristic are irrelevant.

The distinctive properties required for recognition are collected as feature vector fn. As feature space F we define a set of those attributes fn which can be used for comparison with other members of F. What can be compared is only depending on the capabilities of the comparison stage in Figure 2.1. Consequently, the feature space F is a subspace of all attributes and their combinations A:
FA
(2.6)
To prevent loss of generality, we also allow raw signals, coming directly from the sensor, to be compared by an appropriate comparison stage and to define a generalized feature space. The raw or sample signals create the sample space Σ. However, using Σ as generalized feature space may not necessarily help vividness. The intention is to allow the comparison function to provide optimum processing in a mathematical sense, at least theoretically.

In practice, not enough a priori knowledge about the biometric characteristic and the inherent character of the noise which deteriorates the ideal biometric feature may be available to define the optimum algorithm or the problem is too complicated to result in a simple solution. For that reason, most practical systems follow the block diagram of Figure 2.1. If so, the feature space contains all biometric features which come from a biometric characteristic or are stored as biometric references, resp. templates. Each feature fn as an element of F may comprise subfeatures (more elementary properties) or feature components. Nevertheless, if we talk about features as plural, we don't mean the subfeatures of one feature vector but different elements of F. We note that F need not be a vector space. Generally,
|S| ≠ |F|
(2.7)
More precisely, we regard F as the space of all possible features within a specific comparison-oriented representation. That is, if the comparison block of Figure 2.1 cannot process a feature, the feature cannot be a member of F.

From F the subset F of all used  features may be separated. "Used" means, each feature is used by at least one subject from S.  In the ideal case, for each known subject snS there is exactly one feature fn and a bijective map ⓘn : snfn called individual. In this case
|S| = |F|
(2.8)
If features are noisy, F becomes a random variable with usually different realizations F(ω) for different ω. In this case we get
|S| ≥ |F(ω)|
(2.9)
F(ω) ⊂ F for every ω ∈ Ω
(2.10)
since different features may accidentally fall together. If individuals are not unique even in the absence of noise, we may even get
|S| > |F|
(2.11)
For example, purely genotypic features may be equal for monozygotic twins, making them indistinguishable.

The case |S| < |F| would mean that there are more used features than subjects. This will be excluded by the requirement that at most one feature from F is assigned to one subject and is equivalent to the definition of functions [Wikipedia].
Example 2.1: Human height embedded in three dimensions
Each individual is characterized by two points: the top of the head and the bottom of the feet. In the 3D Euclidean space this requires two points with three coordinates each, resulting in 6 numbers per individual. If we assume no quantization in the lengths, F comprises infinite many 6-tuples, which can be compared using a suitable comparison algorithm. Alternatively, an alignment can be performed by preprocessing as indicated in Figure 2.1. Then F will be the space of all human heights, which can be compared with each other by very simple methods (metrics).
Example 2.2: Human height, aligned
If the individuals are already aligned in 3D space, one individual can be characterized by a single number, the distance between the two 3D points given in Example 2.1. This can be done by direct measurement or by calculating the Euclidean distance between the two points given in Example 2.1. In this example, alignment is performed as part of the capturing process, while in the previous one, alignment is done after capturing, either by the comparison stage or by the preprocessing steps.
Example 2.3: Fingerprint images
If fingerprints are compared without preprocessing, the raw images form the feature space F. Its dimension is given by the number of pixels. The range of each pixel is typically given by the size of one Byte, e.g., 0...255. The number of elements is easily computed, e.g., as |F| = L x M x 255 where L and M are the number of pixels in horizontal and vertical direction. In comparison to other representations, this one is computationally more intensive. One reason for this fact is that the reference must also be stored as image. This may require numerous computation steps to be repeated for each new comparison.
Example 2.4: Fingerprint minutiae
Since it is well known that the minutiae patterns are the carrier of uniqueness, most systems only store the minutiae properties as reference. This requires preprocessing to extract the minutiae from the fingerprint image. Remaining is a list of a variable number of minutiae which are each described by their horizontal and vertical position, the minutia type (ridge bifurcation or ending), and the angle of the ridge structure at the location of a minutia. The dimension of F is 4K if K is the maximum number of minutiae of one feature. While position and angle are expressed as a quantized real number, the minutia type only requires 1 bit. In this case, the feature components need not have the same structure.
Example 2.5: Hair color
Though hair color of an individual may not be a very distinctive feature, it may nonetheless serve as a good example. This feature is by far not perfect. It may change over time, can simply be altered artificially, is not available for all individuals, may change locally etc. The feature space may be a set of hair colors such as F ≔ {brown, black, blond, auburn, red, white}. More technically, F may be defined as the space of all optical hair spectra or more practically as a 3-component color vector.
Example 2.6: Keystroke
For keystroke recognition, the raw information are the timing components (and sometimes also the pressure) of certain key successions. The resulting features, together with their stochastical properties, are very distinctive between different persons.

2.1.1 Combination of Features

It has already been mentioned that features can be decomposed into subfeatures until elementary features remain. Conversely, there may be different feature spaces, e.g., F1 and F2 with different biometric modalities which may be combined to a new feature space such that
FF1 x F2 ≡ { f1F1, f2F2 | f = (f1, f2)}
(2.12)
Alternatively, F1 and F2 may come from different statistical measurements
FaF1) x F2) = { f1F1), f2F2) | f = (f1, f2)}
(2.13)
or from different time instants
FtF(t1) x F(t2) = { f1F(t1), f2F(t2) | f = (f1, f2)}
(2.14)
In certain cases it may be very useful to combine two (or more) features in such a way that the resulting feature remains a member of the original feature space F. Especially, if features of the same individual are varying due to noise, noise reducing processing may help to improve recognition performance significantly. Such processing methods are building the mean or median value. To keep the result of such processing in the original feature space, the feature space must be equipped with an appropriate structure. For example, to build mean values requires that the sum and the multiplication by a scalar is allowed in the feature space. In this case, a vector space [Wikipedia] would meet the requirements. In other cases where the components of a feature only indicate the presence (=1) or absence (=0) of a binary property of an individual, a mean value may not be appropriate. In such cases the median value may deliver improved features since only the most frequent value is to be taken and this value is in the feature space by definition.

Unfortunately, real-life situations are more complicated. For example, a fingerprint feature may have continuous and binary components. The location of a minutia is such a continuous component which may easily be treated as vector space component. On the other hand, minutia type is a binary feature component which requires different treatment. Often, certain minutiae are not available. In this case it makes no sense to replace them by a null vector since then averaging would deliver useless results.

Furthermore, features should be aligned before averaging. Practically, this can be handled by aligning two features by calculating the minimum distance for all possible alignments. If this minimum is zero or below a threshold, it is indicated that both features come from the same biometric characteristic. If so, a mean value may be taken.

Example 2.7: Binary features and median
Let F be a feature space such that F = {0,1}. Let fnFn) be a random feature with realizations
f1 = 0, f2 = 0, f3 = 1, f4 = 0, f5 = 1
(2.15)
The median [Wikipedia] value is seen to be
median{f1, f2 , f3, f4, f5} = median{0, 0, 1, 0, 1} = median{0, 0, 0, 1, 1} = 0
(2.16)
That is, the value with the higher frequency is taken. Note: If we have an even number of elements to be combined, the median is not unique as the number of 0s and 1s may be equal. In such a case, any value (0 or 1) may be taken as result. Statistically this should have little impact. (Sometimes, for the non-unique case, ½ is defined as median. We avoid this since ½ is not an element of F and thus may confuse the biometric comparison operation).
Example 2.8: Adding a unique ID as "feature component"
If biometric features are not sufficiently unique (this may also hold for passwords), some unification may be advised. This can most favorably be attained by adding a unique feature component to the other components of a biometric feature. The trick is that the unique component may be public while the non-unique one may represent a secret or something that can only be delivered by the dedicated individual. (Otherwise the unique component would be sufficient!) For passwords this method is very common since the very beginning of its use by combining the password with a user ID which can be measured easily with little errors. A user ID may also be a solution for biometric features since user IDs can be more rapidly be identified than biometric features.

It is important to use an appropriate distance measure for the new combination to get all the advantages one expects. Otherwise, the unification may fizzle out as we shall see in Example 5.5.

2.2 Distance measurements

To check whether a biometric sample belongs to the desired modality or whether two features are coming from the same biometric characteristic, distances have to be determined. In the case of comparing features, a general distance measure d over the space F of features which is a functional d: F x F → D ⊂ ℝ should be defined for every f, g, hF by the following properties:
 
DIS1. (i) d(f,g) ≥ 0 non-negativity
DIS2. (ii) f = g ⇒ d(f,g) = 0
DIS3. (iii) d(f,g) = 0 ⇒ f = g
DIS4. (iv) d(f,g) = d(g,f) symmetry
DIS5. (v) d(f,g) ≤ d(f,h) + d(h,g) triangle inequality
DIS6. (vi) d(f,g) ≤ max{d(f,h),d(h,g)}

Note that DIS1 is not independent of the other conditions. D is a set of non-negative numbers which may be countable or uncountable, finite or infinite but always being a subset of ℝ.

Depending on which conditions hold, different definitions can be given [Wikipedia]:
 

DIS1
DIS2
DIS3
DIS4
DIS5
DIS6
premetric
       
pseudoquasimetric
   
 
semimetric
 
pseudometric
 
quasimetric
 
metric
 
ultrametric

Which distance measure is taken depends on the concrete realization. For modality detection, the feature space F should be replaced by a suitable subspace of A. The distance measure over this subspace may be completely different from that over F.
Example 2.9: Human height: Euclidean metric
After having reduced the dimensions to 1, a suitable distance measure for measuring the difference of two heights f and g is the Euclidean metric d(f,g) ≔ |f − g| where f, g ∈ ℝ and D = ℝ.
Example 2.10: Discrete metric
Let F be the space of all PINs including zero: F = ℕ0 ≡ ℕ⋃{0}. Then the distance measure
δ(fi,fj) ≔ 
{
0 if fi = fj
1
if fifj
(2.17)
is a metric called discrete metric δ: F x F → D = {0,1}
Example 2.11: Correlation coefficient
Let F be a Hilbert space [Wikipedia] with scalar product <•,•> and norm ‖•‖= √<•,•>. Then the correlation coefficient χ(f,g) can be defined as 
χ(f,g) ≔ 
|<f,g>|
f‖ ‖g
(2.18)
for all nonzero f, gF. χ is a measure for the similarity of two vectors f and g which is independent of the amplitude since χ(αfg) = χ(f,g). χ can be written as
χ(f,g) = 1 - ½
f
f
g
g
²
(2.19)
If f = αg for any α≠0, then χ(fg) = 1 which represents the maximum due to Schwartz's inequality. In the case of minimum similarity we get χ(f,g) = 0 which is reached if f is a symmetric vector and g an antisymmetric one. Since similarity intuitively behaves inversely to distance we define a distance measure as
d(f,g) ≔
f
f
g
g
(2.20)
This measure is only a pseudometric since DIS3 is not fulfilled, e.g., in the case g ≔ αff where 0≠α∈ℝ, we get d(f,g) = 0.

2.2.1 Composite distance measures

Let F1 and F2 be two feature spaces with distance measures d1 and d2 such that d1: F1 x F1 → D1 and d2: F2 x F2 → D2. Furthermore , let α and β be real numbers with α > 0 and β > 0. Finally let F be a composite feature space such that FF1 x F2. Then d: F x F → D = αD1 βD2 with
d(z1,z2) ≔ αd1(x1,x1) + βd2(y2,y2)
(2.21)
is a distance measure:
 
COMPDIS1. d(z1,z2) ≥ 0 if d1 and d2 fulfill DIS1
COMPDIS2. z1 = z2 ⇒ d(z1,z2) = 0 if d1 and d2 fulfill DIS2
COMPDIS3. d(z1,z2) = 0 ⇒ z1 = z2 if d1 and d2 fulfill DIS3
COMPDIS4. d(z1,z2) = d(z2,z1) if d1 and d2 fulfill DIS4
COMPDIS5. d(z1,z2) ≤ d(z1,z3) + d(z3,z2) if d1 and d2 fulfill DIS5

The weighting factors α and β may have influence on D. For α > β, the ordering of the corresponding distance values in a discrete D is different from α < β. If α = β, this may reduce |D|. Both has a strong influence on the failure rates to be defined as we shall see!

Composite distance measures may be required when a feature comprises feature components of different type, for example real components such as length and binary components such as "available" and "missing".
Example 2.12: Hamming metric
The Hamming metric is well known as Hamming distance [Wikipedia] and describes the number of different symbols when comparing two strings. Since the equality of two symbols can be measured by the discrete metric δ (Example 2.10), we can describe the Hamming metric as composite metric using the discrete metric. Let F be a feature space with N different feature components. Each feature component is characterized by one symbol (which may be a number or a character etc.) and is represented by a feature subspace Fn such that
F = F1 x F2 x F3 x ... x FN
(2.22)
Then the Hamming metric d can be written as
d(f,g) = δ(f1,g1) + δ(f2,g2) + δ(f3,g3) + ... + δ(fN,gN)
(2.23)
where f = (f1, f2, f3, ... , fN) ∈ F, gF, and fnFn. δ is the discrete metric (Example 2.10).

3. Biometric Recognition in Metric Spaces

3.1 Recognition

Assuming perfect uniqueness and permanence of biometric features and the absence of measurement errors we can expect that each sample of the same biometric characteristic delivers exactly the same feature vector, while the samples of different characteristics result in different feature vectors. That is, each characteristic is represented by a distinct point in the feature space. When comparing two features to examine their origin, only two results are possible:
1. the features are equal and belong to the same characteristic or
2. the features are unequal and thus belong to different characteristics.
Although no failures can occur in the ideal case, it would be of interest to learn more about the distribution of, say N, features within the feature space F. This can best be done by comparing each feature with each other feature, denoting the results as dij ≔ d(fi,fj) which can be shown in matrix form:
d11
d12
d13
d14
...
d1N
d21
d22
d23
d24
...
d2N
d31
d32
d33
d34
...
d3N
d41
d42
d43
d44
...
d4N
.
.
.
.
...
.
dN1
dN2
dN3
dN4
...
dNN
(3.1)
If the distance measure d fulfills requirement DIS4, we get dij = dji. As a result, the gray values in the matrix do not carry additional information. The blue values in the main diagonal represent comparisons of two equal samples from the same biometric characteristic. If DIS2 holds, these values must all be zero. All values above the main diagonal will be > 0 if conditions DIS1 and DIS3 hold and fifj. If the triangle inequality DIS5 holds, the values of the first upper secondary diagonal (d12, d23, d34,..., dN-1,N) represent the only independent values. All other values are bound by the triangle inequality like d13 ≤ d12 + d23, d14 ≤ d13 + d34, and so on.
Example 3.1: Human Height: Dependence of distance values using the Euclidean metric
The following example of a feature space F = {h1, h2, h3, h4, h5} shows the effect of different height arrangements combined with the Euclidean metric
dij ≔ d(hi,hj) = |hi − hj|
(3.2)
Of special interest are the restrictions of the triangle inequality on the distance distributions. As we can see, the dependencies are only tight since the triangle inequality only puts an upper bound on the growths of values in direction of the upper right (resp. low left) corner.
h1
h2
h3
h4
h5
=
190
175
183
183
155
d11
d12
d13
d14
d15
d21
d22
d23
d24
d25
d31
d32
d33
d34
d35
d41
d42
d43
d44
d45
d51
d52
d53
d54
d55
=
0
15
17
17
35
15
0
8
8
20
17
8
0
0
28
17
8
0
0
28
35
20
28
28
0
h1
h2
h3
h4
h5
=
190
175
155
183
190
d11
d12
d13
d14
d15
d21
d22
d23
d24
d25
d31
d32
d33
d34
d35
d41
d42
d43
d44
d45
d51
d52
d53
d54
d55
=
0
15
35
7
0
15
0
20
8
15
35
20
0
28
35
7
8
28
0
7
0
15
35
7
0
The following example is a worst case, where all inequalities become equalities. As a result the maximum growth is achieved:
h1
h2
h3
h4
h5
=
140
150
160
170
180
d11
d12
d13
d14
d15
d21
d22
d23
d24
d25
d31
d32
d33
d34
d35
d41
d42
d43
d44
d45
d51
d52
d53
d54
d55
=
0
10
20
30
40
10
0
10
20
30
20
10
0
10
20
30
20
10
0
10
40
30
20
10
0
or using the reverse order
h1
h2
h3
h4
h5
=
180
170
160
150
140
d11
d12
d13
d14
d15
d21
d22
d23
d24
d25
d31
d32
d33
d34
d35
d41
d42
d43
d44
d45
d51
d52
d53
d54
d55
=
0
10
20
30
40
10
0
10
20
30
20
10
0
10
20
30
20
10
0
10
40
30
20
10
0
In higher dimensional feature spaces equality is rather an exceptional case. Nevertheless, a type of dependence is introduced which may lead to the unpleasant effect that statistical significance does not increase as expected if the results of all cross-compared distances with i ≠ j are averaged. That is, statistical significance will not increase 1:1 with the total number of comparisons (generally N(N−1) for N features). In practice, it will rather be ruled by a quantity between N−1 and N(N−1). That is, it would be better to increase N to N² than to count on depending N(N−1) cross-comparisons (which is much cheaper, of course).
Example 3.2: Human height: Absolute Values
Fig. 3.1: Biometric height features as points in a 1D feature space (F = )

One of the most interesting properties of the space of features are the distributions of distances between the features. To build a statistics one may start with a certain feature and determine the distances to its neighbor features. In an N dimensional feature space, all feature points within an N-dimensional ball around the actual feature are to be counted. With increasing radius of the ball, the number of features included may increase faster or slower, depending on their local distribution. This is similar to stars in universe. Stars may be crowded in galaxies or the space may be nearly empty over large distances.
Example 3.3: Human height: Distances from a gregarious feature
When the one-dimensional (1D) feature space F contains N features, 2N distances (with N−1 non-trivial distances) from one feature to all other features can be calculated. (Non-trivial means: excluding the zero distance of a feature to itself and assuming the two back and forth distance values between two different features being equal according to DIS4.) Depending on which feature is selected as point of origin, different distance distributions may result. In this example, a feature amid an aggregation of features is chosen, thus delivering a lot of small distance values. Figure 3.2 shows the distribution which represents points in the distance space D ⊂ ℝ.

Fig. 3.2: Distance distribution of the height features Fig. 3.1 from a gregarious feature
Example 3.4: Human height: Distances from a lonely feature
If we start with the same feature space F as in Example 3.3, but choose a lonely feature as origin, a completely different distribution of distance values may result which are far from zero. This is an expression of the fact that lonely features are easier to separate than features within a crowd and confirms the practical observation that different features show different recognition failure behavior.
Fig. 3.3: Distance distribution of the height features Fig. 3.1 from a lonely feature

Intuitively, the closer two features are located, the easier it may be to "confuse" the two in the presence of errors, e.g., errors produced during the determination of the feature's location or if features do slightly change. There may be singular features with large distance to their neighbors and other features which are densely crowded. In the first case, these may be candidates for 'sheep', i.e., features which are easily recognizable  [Doddington, Sutton]. The distances alone and the number after tightly neighboring features do not allow a classification due to Doddington's zoo. This situation changes, if one adds individual error balls around each feature point which characterize all errors due to non-ideal behavior of the features and their measurement. This will be considered later.

Fig. 3.4: Biometric features as points in a 2D feature space

If the number of dimensions of the feature space is larger than the number of feature components, it is even possible that each feature has the same distance to all of its neighbors. In this case the geometrics can be described by an equilateral K-simplex. This is one case of perfect equality between all features. Using the discrete metric is another method to create equality between all features, irrespective of dimensionality, but possibly accompanied with other drawbacks.

If the dimension is small compared to the number of features and the Euclidean metric is chosen, at least an approximate constant distance to the nearest neighbors should be possible. In this case the number of features in a K-dimensional ball increases with the power of K of the ball radius.

3.2 Enrolment - Verification - Identification

In the ideal case, without any error, enrolment simply means to store the desired biometric feature, extracted from an individual's biometric characteristic, as a reference in a database. During a recognition transaction the reference is compared with the actually measured feature. If the distance is zero, we assume identity between the subject enrolled and the subject to be recognized. If more than one reference is available in the feature space, theoretically two ways for recognition may be performed. During verification, the index (= address) of the reference used to distinguish all members of a set is known a priori so that only this reference is to be compared with the feature of the associated subject. If this index is not known, an identification is to be performed, i.e., the feature is to be compared with all references until a match (i.e., distance = 0) is found. The result of this first method is either no identification or the index of the identified subject.

Throughout this paper, we approach the reality by a second way which may meet practice more consequently. Instead of using the virtual index, a user ID is assigned and stored during enrolment. This user ID is nothing but an additional feature component belonging to a subject, however with different usage. For example, user IDs may be public and usually are unique. This way every verification may be performed at least partly as an identification: The subject has to present the user ID as first and the biometric feature as second component. For entering the user ID, a keyboard or a smartcard may be used among other ways. Then the user ID is compared with all stored reference user IDs to see if one matches. If so, in the next step the accompanying biometric part of the reference is compared with the presented biometric feature component to verify that the biometric feature components do match, too. In practice, most verifications are performed this way. The main reason for this fact is that text strings for user IDs are found and checked much faster than the biometric part of a feature. For that reason, a verification is usually much faster than a pure biometric identification where the biometric part of all reference features is to be compared with a candidate until a match is found. [Bromba2007]

In the presence of errors, enrolment may be more sophisticated than simply storing the first available feature sample. Especially stochastical errors may lead to the circumstance that all captured features from the same biometric characteristic are more and less different. As a consequence, the question arises which of the incoming features of subsequent capturings will deliver the best recognition rates. This question is treated later.

The most important property of a (digitally stored) reference feature is its stability over time (unless update strategies are used to improve the reference with each recognition). While stochastical errors in new samples of a characteristic will change with each capturing, the stochastical errors in the reference are stored and thus fixed. For that reason, optimal enrolment strategies try to find the stochastically best reference, typically by processing a series of capturings. As a consequence, enrolment transactions will nearly always take more time than a recognition transaction. For the feature space F of all possible features the consequence is the introduction of the subset F' of reference features. Each reference feature should be as near as possible to the true ("error-free") feature of a certain characteristic. To indicate that a feature is a reference, we sign it or its index by an apostrophe, e.g., fi' or di'j where feature no. i' is the reference belonging to characteristic no. i. Generally, the following relations hold:
fiFF
(3.3)
fi' ∈ F' ⊂ F
(3.4)
where F is the set of all used features. Only in the ideal case we have F = F'.

4. Biometric Performance Measures

4.1 Definition of Match and Non-Match

For a biometric recognition system we define a match as successful recognition if the distance between two  separately measured features fi, fjF falls below a certain threshold th ∈ ℝ such that dij ≔ d(fi,fj) ≤ th. Otherwise we have a rejection, i.e., if dij > th. In the ideal case, the threshold may be between 0 and the minimum distance between two features, provided the minimum is greater than 0.

F is the space of all features being comparable by a specific recognition system as defined above. d is any distance measure over F with values in D. Subjects si S shall have the same index i as the corresponding feature fiF when defining an individual ⓘi, see (2.3). Finally, let F' be the space of all enrolled features and F the space of all features, enrolled subjects are able to assume.

As bilateral match rate MRij we define the probability of a successful recognition when comparing features fi1) and fj2) where ω1 and ω2 are independent outcomes of the sample space Ω [Wikipedia].
MRij ≡  MR(fi,fj) ≡ P{dij ≔ d(fi1), fj2)) ≤ th | fi1), fj2) ∈ F; ω1, ω2Ω
(4.1)
In the absence of any errors ("noise"), all distances are deterministic, i.e., the probabilities are either 1 or 0:
MRij = 1 ⇔ dij ≤ th
(4.2)
MRij = 0 ⇔ dij > th
(4.3)
Generally, we have symmetry:
MRij = MRji
(4.4)
MR11
MR12
MR13
MR14
...
MR1N
MR21
MR22
MR23
MR24
...
MR2N
MR31
MR32
MR33
MR34
...
MR3N
MR41
MR42
MR43
MR44
...
MR4N
.
.
.
.
...
.
MRN1
MRN2
MRN3
MRN4
...
MRNN
=
1
0
0
0
...
0
0
1
0
0
...
0
0
0
1
0
...
0
0
0
0
1
...
0
.
.
.
.
...
.
0
0
0
0
...
1
(4.5)
Until now we have considered the term match rate without interpretation. There are four possibilities to define:

- If two samples of the feature of the same subject match, this is a correct match
- If two features of two different subjects match, this is a false match
- If two samples of the feature of the same subject do not match, this is a false non-match
- If two features of two different subjects do not match, this is a correct non-match

The terms false and correct obviously do not mainly depend on the features but on the subjects, i.e., the subjects belonging to the features compared. Let us assume that for individual ⓘm ≔ (sm,f'm) the reference feature f'm is stored during enrolment. We have 4 possibilities for comparisons (⇆ means that the corresponding subject on the left tries a recognition with the stored subject's feature on the right):

1. (sm,fm) ⇆ (sm,f'm) - fmF, f'mF', subject sm tries to get recognized
2. (sm,fn) ⇆ (sm,f'm) - fnF, f'mF', subject sm tries to get not recognized (hiding)
3. (sn,fm) ⇆ (sm,f'm) - fmF, f'mF', subject sn tries to get recognized (counterfeiting, impersonation)
4. (sn,fn) ⇆ (sm,f'm) - fnF, f'mF', subject sn tries to get recognized (using his genuine feature for the "wrong account")

The following definitions only refer to the cases 1 and 4!

Using the previous definitions, if i=j, the bilateral match rate is equivalent to the individual correct match rate CMRii of characteristic no. i. If i≠j, an acceptance would be a failure and thus the probability will be called bilateral false match rate FMRij.
CMRii ≡ MRii
(4.6)
FMRij ≡ MRij if i ≠ j
(4.7)
As bilateral non-match rate NMRij we define the probability of a failing recognition when comparing features i and j. If i=j, we have a failure and the bilateral non-match rate is equivalent to the individual false non-match rate FNMRii of characteristic i. If i≠j, a non-match is expected and thus will be called bilateral correct non-match rate CNMRij.

If we take the mean value over the individual rates of all N comparisons, the global correct match rate CMR and the global non-match rate FNMR results.
CMR ≡
1
N
N
∑
i=1
CMRii
(4.8)
FNMR ≡
1
N
N
∑
i=1
FNMRii
(4.9)
It makes no sense to choose N (in this case the number of features compared) larger than the number of possible features (|F|). On the other hand, N should be much larger than the number of subjects enrolled to the recognition system (|S|) since we are also interested in false matches caused by the features of "external" subjects:
|S| N ≤ |F|
(4.10)
Typically, the size N will be chosen differently for the determination of CMR and FNMR. Since we have defined the match rates as probabilities, noisy features require a practical estimation. This is done by independently repeating a comparison as often as reasonable, say K times, and build the ratio between the number of matches, say M, and the number of trials (K) such that
MRij ~ M/K
(4.11)
If we take the mean value with respect to j over the bilateral rates of all N-1 independent comparisons with different index i≠j, the individual false match rate FMRi of characteristic i and the individual correct non-match rate CNMRi of characteristic i results.
FMRi
1
N1
N
∑
j=1,ij
FMRij
(4.12)
CNMRi
1
N1
N
∑
j=1,ij
CNMRij
(4.13)
The sums can be simplified, if we set CNMRii = FMRii = 0.

Finally, a global false match rate FMR and a global correct non-match rate CNMR can be defined by taking the mean value over all N individual false match and correct non-match rates.
FMR ≡
1
N
N
∑
i=1
FMRi
(4.14)
CNMR ≡
1
N
N
∑
i=1
CNMRi
(4.15)
Generally, we have the relationships
FNMRii = 1 CMRii
FNMR = 1 CMR
(4.16)
FMRij = 1 − CNMRij
FMRi = 1 − CNMRi
FMR = 1 − CNMR
(4.17)
In the case of stochastical behavior of a feature, enrolment may have the effect to create a new element in the feature space. The main property of this element is that it usually is purely deterministic and thus may have completely different stochastical properties. As a result, the match rate definition (4.1) must be modified such that
MRi'j ≡  MR(fi',fj) ≔ P{di'j ≔ d(fi', fj(ω)) ≤ th | ω ∈ Ω }
(4.18)
The trick is to pretend that fi' is deterministic since it does not change during normal operation of the biometric recognition system. However, for comparing different systems this may lead to misinterpretations since every new enrolment may deliver new probabilities MRi'j. Thus, when estimating the failure rates, this effect must not be forgotten. A generalization can be given by the following definition:
MRi'j ≡  MR(fi',fj) ≡ P{di'j ≔ d(fi'(ω1), fj2)) ≤ th | ω1Ω1Ω2, ω2Ω2
MRij' ≡  MR(fi,fj') ≡ P{dij' ≔ d(fi1), fj'(ω2)) ≤ th | ω1Ω1, ω2Ω2Ω1}
(4.19)
For MRi'j, (4.1) is covered by the situation that Ω1 = Ω2, while (4.18) matches the case whereΩ1 only contains one element of Ω2. Note that the probability MRi'j itself is stochastic if Ω1 is a random subset of Ω2!

To determine the failure rates in ordinary biometric recognition systems, the distance between enrolled reference (i.e., fixed) feature and request feature is measured as in Example 3.1. This yields a different distance matrix with di'j ≔ d(fi',fj), where we mark the reference feature by '. fi' and fj are the reference and actually measured features, respectively.
d1'1
d1'2
d1'3
d1'4
...
d1'N
d2'1
d2'2
d2'3
d2'4
...
d2'N
d3'1
d3'2
d3'3
d3'4
...
d3'N
d4'1
d4'2
d4'3
d4'4
...
d4'N
.
.
.
.
...
.
dN'1
dN'2
dN'3
dN'4
...
dN'N
(4.20)
In contrast to matrix (3.1), di'j will generally only be equal to dj'i (i.e., symmetric) if fi' = fi and fj' = fj. Hence, in the stochastical case, the distance matrix will not be symmetric. As a result, the match rate matrix will generally not be symmetric, too:
MR1'1
MR1'2
MR1'3
MR1'4
...
MR1'N
MR2'1
MR2'2
MR2'3
MR2'4
...
MR2'N
MR3'1
MR3'2
MR3'3
MR3'4
...
MR3'N
MR4'1
MR4'2
MR4'3
MR4'4
...
MR4'N
.
.
.
.
...
.
MRN'1
MRN'2
MRN'3
MRN'4
...
MRN'N
(4.21)
This is a consequence of the general case which need not be symmetric in this sense:
MRji' = MRi'j ≠ MRj'i = MRij'
(4.22)
To keep our individual and global failure rate definitions unique, we always assume the first index to be enrolled (if at all).

In the following examples we show the impact of non-ideal biometric features regarding properties BCR1 - BCR4.

Example 4.1: Weak uniquenessTwins
If the biometric characteristic is completely genotypic [BioFAQ], monozygotic (identical) twins theoretically may show exactly the same features. This could especially be the case in DNA investigations. For the failure rates this may have the following impact. Let SN be the set of N subjects SN ≔ {s1, s2, s3, ..., sN} and si and sj, i ≠ j, being monozygotic twins with the same feature ffi = fj and without random errors such that ⓘi = (si, f) and ⓘj = (sj, f). Then the distance dij d(fi,fj) = d(f,f) is equal to zero although a non-zero result would be required for authentication systems. That is, even with a threshold of th=0, ⓘi and ⓘj are indistinguishable and we get a false match case with a bilateral FMRij = FMRji = 1! All other bilateral false match rates are zero. If N subjects are expected, the individual false match rate of twin ⓘi would obviously be FMRi = 1/(N-1). The same holds true for twin ⓘj. If N approaches large values, FMRi will approach zero.

To get the global FMR, the mean value over all N individual false match rates has to be taken. The result is FMR = 2/(N(N-1)) if there are only two twins (i.e., one twin pair) in the ensemble. If there are more twins, the result will differ. If we expect 1 average monozygotic twin pair among 243 people (this seems to be a good estimation to real life [Daugman]), there are 2N/243 twins in the ensemble which all have an individual FMRi of 1/(N-1). This results in a global FMR of (2N/243)/(N(N-1)) = 2/(243(N-1)). That is, for large N, FMR again approaches 0 although not with order N² as in the single-twin pair case. This is in contrast to the minimum FMR of 0.82% stated in [Daugman].

Example 4.2: Weak permanence
If the condition of ideal permanence is replaced by time-dependence this means that when capturing a biometric characteristic at different time instants even without measurement failures, more and less slightly different features may result. If so, the biometric recognition system may fail to recognize the characteristic if the threshold is selected too low. On the other hand, other characteristics may also vary. This will have two effects: 1. The minimum  threshold should be larger than the greatest variation of the feature of a certain biometric characteristic and 2. the minimum distance between features of different characteristics will increase. If both begin to overlap, no threshold can be chosen anymore without producing recognition errors. Permanence properties may differ for different characteristics.
Fig. 4.1: Stochastical fluctuations of the features in a 2D feature space
Example 4.3: Weak measurability
The most apparent impact of non-perfect measurability is the introduction of errors, usually considered as stochastical errors. Such errors introduce stochastical position errors of a feature point in a K-dimensional feature space F, just in the same way as time-depending features will do. Such errors can be described by a K-dimensional cloud which may follow, e.g., a K-dimensional Gaussian distribution. Rather than a point, a feature now should be visualized as a cloud with declining intensity. The intensity then will be a measure for the probability of presence of the feature point at a certain location in F. Such a cloud may overlap with the clouds of other features. This way, no threshold can be found without producing non-zero recognition errors.
Fig. 4.2: When averaged over measurements, feature locations may be represented as spatial distributions

Furthermore, the distribution will depend on the individual characteristic. This will lead to clouds of different size and shape, as Figure 4.3 indicates. This is the worst case with respect to examination of failure rates since most failure rate definitions or at the least the approximation of their statistical significance make idealizing assumptions which do not hold in reality.

Fig. 4.3: Different features may show different distributions

Compared to weak permanence, weak measurability does not depend deterministically on time. That is, for weak permanence, two measurements at the same time instant will exactly show the same result while for weak measurability this need not be the case. However, a real biometric characteristic is affected by both shortcomings. Additionally, measurability may change over time. All combined effects can be described by a time-varying multi-dimensional joint probability distribution.

Example 4.4: Human height: Distance errors between two individuals
Let us assume two individuals with different heights. The feature space F can be defined as F = . The distance space D, assuming the Euclidean metric, also equals or is a subset of . For determining the FNMR, the first characteristic is captured twice. For determining the FMR, the second one is captured once. The measurements of the height features h1 and h2 are error-prone and show a continuous joint probability density fh1,h2 and the continuous singular densities fh1 and fh2, respectively. We assume stochastical independence, such that fh1,h2(x,y)= fh1(x)fh2(y).
fh1 fh2
0
h1
h2
Fig. 4.4: Heights of two individuals with feature densities and mean values h1, h2

We consider two cases: 

1. One sample each of two different characteristics h1 and h2 are compared to determine the bilateral FMR12
2. Two samples each of the same characteristic, respectively their features h1 and h2, are compared to yield the FNMR. 
Both cases assume that enrolment is repeated with each measurement and does not differ from sampling for recognition. In the case of different characteristics, the distance between the two samples is given by the random variable 
d12≔ d(h1,h2) ≔ |h11)−h22)| = |h22)−h11)| = d(h2,h1)
(4.23)
The bilateral FMR12 is then given by
FMR12(th)  ≔ P{d12 ≔ |h11)−h22)| ≤ th | ω1, ω2Ω }
≔ P{d12 = h11)−h22) ≤ th | ω1, ω2Ω} − P{d12 = h11)−h22) ≤ −th | ω1, ω2Ω }
= ]-∞,th]fh1-h2]-∞,-th]fh1-h2
= (]-∞,th]]-∞,-th])(fh1 ∗ f-h2)
= ]-th,th](fh1 ∗ f-h2)
= ]-th,th](fh1fh2)
(4.24)
where ∗ denotes the convolution of the two distance density functions fh1 and f-h2 and is the mirror operator such that f(x) ≔ f(-x) for every f and x. As a consequence, FMR12(0) = 0 and FMR12(∞) = 1. Case 2 is treated in Example 4.5.

Example 4.4 shows us a possibility how to reduce failure rates. Using the same threshold th, FMR12 can be minimized, if the densities fh1 and fh2 of the features h1 and h2 are as narrow as possible and the threshold is smaller than die distance of the true values of the features.

Example 4.5: Human height: Distance errors between two measurements of the same individual
In the same way as shown in Example 4.4, the individual Correct Accept Rate CMRi may be determined under the assumption that two subsequently taken samples h11) and h12) of the same characteristic are stochastically independent with probability density fh1,h1' (x,y) = fh1(x)fh1'(y) and fh1 = fh1'. This corresponds to the case where the first sample is taken during enrolment and the second one for authentication. However, measurements and processing for enrolment and recognition are exactly equal and are repeated both for each comparison. Then
1 − FNMR11(th) ≕ CMR11(th)  = P{d11 ≔ |h11) − h12)| ≤ th | ω1, ω2Ω}
= ]-th,th](fh1 ∗ f-h1)
= ]-th,th](fh1fh1)
(4.25)
As a result, FNMR11(0) = 1 and FNMR11(∞) = 1. A similar result may be derived for h2, which need not have the same density as h1.

Since usually one of the features stems from enrolment, an optimized enrolment would be able to reduce the FNMR significantly, if the enrolled feature is nearby the "true" feature.

Example 4.6: Human height: Enrolment
In Example 4.4 and 4.5, no explicit enrolment was assumed, i.e., two arbitrary stochastically independent samples were directly compared and no special enrolment processing was performed. For each measurement experiment completely new samples were taken. This is not common practice in biometric systems. Most biometric systems use one specific initial enrolment which is retained for every new comparison. To model this procedure we assume that a single sample of h1, say h'1, is taken as reference and use this specific sample for each new comparison. As a consequence, the reference h'1 must be regarded as deterministic. Nevertheless, its stable value is arbitrary and cannot be predetermined before enrolment. As a consequence, its distribution can be approximated by the density distribution
fh'1 = δh'1
(4.26)
where δh'1 is a delta distribution with peak at location h'1. Recycling the results from Example 4.4, we get
FMR1'2(th) = ]-th,th](δh'1 ∗ f-h2)
(4.27)
where h'1 and h2 stem from different individuals. For the special case that the mean value E{-h2} expresses as translation of the density f (example: Gaussian density), we get
FMR1'2(th) = ]-th,th](δh'1 ∗ f-h2) =]-th,th]fh'1-h2
(4.28)
If we furthermore assume that h1 and h2 come from the same individual, i.e., h2 = h1, the result would be
1 − FNMR1'1(th) = ]-th,th](δh'1 ∗ f-h1) =]-th,th]fh'1-h1
(4.29)
From (4.29) we can conclude at least for Gaussian densities that FNMR1'1 can be minimized if h'1 equals the expectation of h1 (= E{h1}).
Example 4.7: Human height: Enrolment: Monozygotic twins
For monozygotic twins we expect exactly the same height features and feature densities. From (4.24) and (4.25) we then can conclude that the bilateral FMR12(th) is equal to CMR11(th) ≔ 1−FNMR11(th) if all measurements are stochastically independent. In the case of a continuous error density, FMR12(th) will not only assume the discrete values 0 and 1 as for the error-free situation but any value between 0 and 1 depending on threshold.

Now let us assume that individual ⓘ1 has been enrolled using a fixed sample h'1. If we take one sample each from individual ⓘ1 and ⓘ2 to determine FNMR1'1(th) and FMR1'2(th), we see from Example 4.6 that the relationship FMR1'2(th) = 1−FNMR1'1(th) remains valid. Although the FMR will change with varying threshold, FNMR will change in the opposite direction. At least for this case, enrolment will have no general influence in the case of monozygotic twins!

Generally, if we choose a threshold unequal 0 for the monozygotic twin case, realistic densities yield a FMR smaller than 1. This may lead to the wrong interpretation that under certain circumstances, individuals can even be separated if their true features are equal.

From the definition of equal error rate (EER12 = FMR12(the) = FNMR1(the), where the is the unique threshold where equality appears, if possible) we conclude that EER always is ½. This is indeed the worst case EER for recognition, see [BioFAQ].

Example 4.8: Feature space with discrete metrics: Passwords as limit case of a behavioral biometric characteristic
Let S be the set of all possible and S the subset of enrolled subjects. F shall be the feature space of all possible passwords of a desired class and F' ⊂ F the subset of all enrolled passwords.

The feature space F of passwords is a discrete space which can be equipped here with the discrete metrics
δ(hi,hj) ≔
{
0 if hi = hj
1
if hihj
(4.30)
Passwords may be unique or not, depending on enrolment. There are different enrolment methods:

1. Passwords may be selected randomly by the user and then be passed to the service provider
2. Passwords may be assigned to the user by the service provider
3. Passwords may be selected randomly by the user and are then checked for uniqueness by the service provider

Obviously, method 1 may create multiple occurrences of the same password such that hi = hj for subjects si ≠ sj, while methods 2 and 3 can simply avoid this situation, provided the number of possible features |F| is equal to or larger than the number of enrolled subjects |S|.

In the case of unique passwords, no failures are introduced by enrolment. However, failures may occur during identification. There are two types of failures:

  • The authorized subject (genuine) sn mistypes or confuses the password hn
  • A non-authorized subject (impostor) tries a random or "stolen" password
If the genuine mistypes a password, the desired access may fail, producing a false non-match. In a scenario where the password is the only identifier without user ID, the false password at the same time may match a foreign one, leading to a false match with another genuine. That is, a genuine may be an "impostor", too, even during the same transaction. Obviously, a system without user IDs which is only based on unique passwords will be more reliable, if the number of possible subjects is much smaller than the number of possible passwords: |S| ≕ K ≪ |F| (where F is the feature space of all possible passwords with certain properties such as length or character set).

In the hypothetic error-free case, the failure rates for enrolment situation 2 (passwords are uniquely assigned to the user by the service provider) would be given by
FMRij(th) = 0 if th < 1, FMRij(th) = 1 if th ≥ 1
(4.31)
FMRi(th) = 0 if th < 1, FMRi(th) = 1 if th ≥ 1
(4.32)
FMR(th) = 0 if th < 1, FMR(th) = 1 if th ≥ 1
(4.33)
Since D = {0,1}, the best (and only useful) threshold th here is 0 which is selected from now on. In practice, several FMR failure types are to be considered, e.g.:

2A. One of |S| unique passwords becomes known to one impostor
2B. All impostors select a random password fnF to match a certain user ⓘm = (sm, fm) where fmF'

In case 2A, we denote the impostor as Ⓘnm ≔  (sn, fm) with n ≠ m. I.e., subject snS is "stealing" feature fm which actually belongs to subject smS, resp., genuine Ⓖm ≔ ⓘm = (sm, fm).  As a result, the bilateral false match rate FMRij(0) is easily seen to be 1 if (i = m, j = n, i ≠ j) and 0 else. For the individual and global false match rates this means
FMRi(0) = 1/(N-1) if i = m and 0 else
(4.34)
FMR(0) = 1/(N(N-1))
(4.35)
where we choose N ≔ |S| = |F'| ≤ |F|. The resulting failure rates only meet the special case 2A and cannot be generalized. This is easily seen when trying to understand the FMR definition in this case. For example, FMRi expects that password fi is tested against all other genuine passwords fj with i ≠ j. Only if i coincides with the index of the stolen password (m) a match happens. In this scenario we obviously expect one impostor with stolen password and all other impostors with their genuine password. This situation can be compared with the example of non-distinguishable twins where only one twin pair is present in the whole ensemble S (Example 4.1). The difference is a factor 2 which comes from the assumption that the impostor has stolen one password while the victim has not done the same with the password of the identity thief.

In situation 2B, randomness is introduced by guessing passwords. The match rates are then given by the probability that two passwords coincide. Especially,
FMRij(0) = p(fi = fj)
(4.36)
This probability is equal to zero if either fi or fjF'. If both passwords belong to F', it depends on the way how the service provider has assigned the passwords and what the impostor knows about it. If we expect a uniform distribution, the probability is simply given by
FMRij(0) = p(fi = fj) = |F'|-1 = 1/K if fi and fjF'
FMRij(0) = 0 else
(4.37)
To determine the individual and global failure rates, we have first to determine the number N of comparisons used for taking the average. If the number of possibilities is finite, N should be set equal to the number of possibilities which is defined by |F|. For example, if F is the space of 4-digit numbers, then N ≔ |F| = 10000. Generally, we get for enrolled individuals
FMRi(0) = (1-1/K)/(N-1) if fiF' and
FMRi(0) = 0 else
(4.38)
If K = N, the well-known result FMRi(0) = 1/N is given.
FMR(0) = (K-1)/(N(N-1))
(4.39)
If K = N, again FMR(0) = 1/N results.

Refinements of this consideration encompass cases where attacking passwords are not selected purely random but with prior knowledge about the probability of being used or not. (We had assumed a uniform  distribution.)
 

Example 4.9: Simple feature space with stochastical features: Hat
Let us assume two individuals 1 and 2 with the following properties: 1 carries a hat with 40% probability and 2 carries a hat with 30% probability. This may be interpreted as a kind of behavioral biometric characteristic. We try to distinguish these two individuals solely by the feature components "carrying a hat" and "carrying no hat". Furthermore we use the discrete metric and consider all four enrolment states for the reference features f1' and f2'
 
  f1'
f2'
Reference Distance d(f1',f2')
E1
NH
NH
0
E2
H
NH
1
E3
NH
H
1
E4
H
H
0

The two individuals share both features with different probabilities which are represented by two points in a one dimensional feature space F which contains two values: "hat" and "no hat".
 

  f1'
f2'
p1'1(0) p1'2(0) p2'1(0) p2'2(0) p1'1(1) p1'2(1) p2'1(1) p2'2(1)
E1
NH
NH
0.6
0.7
0.6
0.7
0.4
0.3
0.4
0.3
E2
H
NH
0.4
0.3
0.6
0.7
0.6
0.7
0.4
0.3
E3
NH
H
0.6
0.7
0.4
0.3
0.4
0.3
0.6
0.7
E4
H
H
0.4
0.3
0.4
0.3
0.6
0.7
0.6
0.7
  f1'
f2'
pG(0) pG(1) pI(0) pI(1) FNMR(0) FNMR(1) FMR(0) FMR(1)
E1
NH
NH
0.65
0.35
0.65
0.35
0.35
0
0.65
1
E2
H
NH
0.55
0.45
0.45
0.55
0.45
0
0.45
1
E3
NH
H
0.45 0.55 0.55 0.45
0.55
0
0.55
1
E4
H
H
0.35 0.65 0.35 0.65
0.65
0
0.35
1

This example also shows the large influence of enrolment. Intuitively, E2 delivers the best enrolment when regarding the failure rates. From the reference distances in cases E1 and E4, no selectivity is to be expected because both subjects are enrolled with the same feature vector. E2 tries to consider the fact that 1 is more often met with hat than 2. E3 is something like the converse of E2.

Example 4.10: Simple feature space with stochastical features: Jacket
Let us assume two individuals 1 and 2 with the following properties: 1 carries a jacket with 10% probability and 2 carries a jacket with 80% probability. This may be interpreted as a kind of behavioral biometric characteristic. We try to distinguish these two individuals solely by the feature components "carrying a jacket" and "carrying no jacket". Furthermore we use the discrete metric and consider all four enrolment states for the reference features f1' and f2'
 
  f1'
f2'
Reference Distance d(f1',f2')
E1
NJ
NJ
0
E2
J
NJ
1
E3
NJ
J
1
E4
J
J
0

The two individuals share both features with different probabilities which are represented by two points in a one dimensional feature space F which contains two values: "jacket" and "no jacket".
 

  f1'
f2'
p1'1(0) p1'2(0) p2'1(0) p2'2(0) p1'1(1) p1'2(1) p2'1(1) p2'2(1)
E1
NJ
NJ
0.9
0.2
0.9
0.2
0.1
0.8
0.1
0.8
E2
J
NJ
0.1
0.8
0.9
0.2
0.9
0.2
0.1
0.8
E3
NJ
J
0.9
0.2
0.1
0.8
0.1
0.8
0.9
0.2
E4
J
J
0.1
0.8
0.1
0.8
0.9
0.2
0.9
0.2
  f1'
f2'
pG(0) pG(1) pI(0) pI(1) FNMR(0) FNMR(1) FMR(0) FMR(1)
E1
NJ
NJ
0.55
0.45
0.55
0.45
0.45
0
0.55
1
E2
J
NJ
0.15
0.85
0.85
0.15
0.85
0
0.85
1
E3
NJ
J
0.85 0.15 0.15 0.85
0.15
0
0.15
1
E4
J
J
0.45 0.55 0.45 0.55
0.55
0
0.45
1

This example also shows the large influence of enrolment. Intuitively, E3 delivers the best enrolment when regarding the failure rates. From the reference distances in cases E1 and E4, no selectivity is to be expected because both subjects are enrolled with the same feature vector. E3 tries to consider the fact that 1 is less often met with a jacket than 2. E3 is something like the converse of E2.

4.2 Separability

In the ideal case without errors, separability is perfect, i.e., each characteristic from different subjects maps to a unique point in the feature space F. In the presence of noise, the features of all subject may vary such that their possible location can only stochastically be described by something like a multidimensional cloud which is represented by the multidimensional probability density. If these densities overlap, failures may be introduced and separability may suffer.

Since multidimensional probability density functions are more difficult to handle, we focus on the one-dimensional densities of the distance between two features. The probability density function of the distances depend on the character of the noise, the distance measure, the number of features, and the feature dimension.

We start with the case of finite many different distances in F. For example, when using the discrete metric, only the values 0 and 1 can appear. For each of these values their frequency can be determined either by knowledge of the stochastical properties or by measurements.

Let d: F x F ↦ DN ⊂ ℝ be a distance measure and DN be a discrete space with N different positive real numbers. Then the bilateral distance density functions are defined by
pi'j(x) ≡  P{ω1, ω2Ω | di'j ≔ d(fi'(ω1), fj2)) = x ∈ DN}
(4.40)
If i'=j, pi'j defines the bilateral distance density function for the feature outcomes of the same characteristic. The case i'≠j represents the probabilities between the two features i' (reference) and j (request) of disjoint characteristics. From the bilateral densities the global genuine and global impostor distance densities can be derived:
pI(x) ≡ 
1
N(N−1)
N
∑
i=1
 
N
∑
j=1
j≠i
pi'j(x)
(4.41)
pG(x) ≡ 
1
N
N
∑
i=1
pi'i(x)
(4.42)
Generally, the failure rates can be calculated from these densities. For the global FNMR and FMR we get
FNMR(th)
 = 1 −
 
 
∑
x∈DN
x≤th
pG(x) with
 
∑
x∈DN
pG(x) = 1
(4.43)
and
FMR(th)
 =
 
 
∑
x∈DN
x≤th
pI(x) with
 
∑
x∈DN
pI(x) = 1
(4.44)
The bilateral and individual rates can be defined correspondingly. If D is a continuous set D = [0,K] with K ∈ ℝ, such that pG, pI, FNMR, FMR are functions D → [0,1], the sums have to be replaced by integrals:
FNMR(th)
D= 1 − [0,th] pG with DDpG = 1 and th ∈ D
(4.45)
and
FMR(th)
D= [0,th] pI with DDpI = 1 and th ∈ D
(4.46)
The two FNMR(th) and FMR(th) curves can be combined into one curve when using the threshold variable th as common implicit parameter. A plot of FNMR against FMR is usually called Detection Error Tradeoff (DET) curve. If CMR is plotted against FMR, the diagram is usually called Receiver Operating Characteristic [Wikipedia]. This term has originally been created in radar signal detection theory. The advantage of the DET is that it shows a global view of the performance of a recognition system without scaling effects of the threshold variable.

An important feature of the DET is its monotonicity which is a direct consequence of the monotonicity of FMR and FNMR and the positivity of the integrated probabilities:
x ≥ y FNMR(x) ≥ FNMR(y)
(4.47)
x ≥ y FMR(x) ≤ FMR(y)
(4.48)
FMR(x) ≤ FMR(y) ⇒x ≥ y ⇒ FNMR(x) ≥ FNMR(y)
(4.49)
That is, the DET curve decreases monotonically.

4.2.1 Separability measures

Since FNMR and FMR monotonically depend on a threshold parameter, different requirements can be met by changing the threshold. In most applications, e.g., it is more important to have a small FMR than a small FNMR. To describe the system behavior, always the two failure rates are to be considered. In certain cases, however, one looks for a single number which describes some global performance independent of the threshold. The DET already delivers a tool to judge the performance globally. If one of two DET curves is lower than the other one nearly everywhere, it can be claimed that for the corresponding system separability is always better - provided no FTE and FTA failures occur. In practice, DET curves of different systems tend to intersect anywhere so that other comparison measures are to be used.

From the DET curve two such measures can be defined: The Equal Error Rate (EER) and the area under the curve which has been called ROCA in [BioFAQ]. Within the DET where FNMR is plotted as "function" of FMR, EER is defined by:
FMNR(EER) = EER
(4.50)
If FNMR and FMR are both functions of the threshold th, the definition of EER is given by
FNMR(thEER) = FMR(thEER)
(4.51)
thEER is the point where FMR and FNMR intersect. Unfortunately, this point usually will not exist in every case when D is a discrete set. In such cases, linear interpolation might help:
EERint =
FNMRFMR+ − FNMR+ FMR
FNMR + FMR+ − FNMR+ − FMR
(4.52)
Here, the superscript + denotes the values at the right side of the intersection and the superscript - the values at the left side.

Since the DET curve completely eliminates the threshold parameter, notation could be simplified in the discrete case without changing the curve. This can be achieved by replacing the set of distances DK by an index set comprising only natural numbers without gaps. This would enable us to use a simple sum notation even in the case where the discrete distance values are discrete real numbers with unequal distances. Furthermore, we introduce an negative threshold value for symmetry reasons. Although this value is impossible and thus virtual, it will help us to simplify the discrete case formulas and to show the relationship to the continuous case if we define the output of a negative threshold in such a way that it approximates the continuous case most naturally. This can be attained when defining the match rate MRij : D ⋃ {-ε} → [0,1] by
MRij(-ε) ≡ 0 for every ε > 0 and any feature combination in F
(4.53)
As a result we get
pI(-ε) = 0
pG(-ε) = 0
FMR(-ε) = 0
FNMR(-ε) = 1
(4.54)
The justification comes from the fact that the continuous case confirms the formulas as limit case for threshold 0. Generally, the value of a threshold does in no way contribute to separability. In fact, only the values of FNMR and FMR at a certain threshold are of practical importance. The statement "The systems shows an FMR of 0.0001% at an FMR of 10%" is completely describing separability in one operating point. Threshold values are only needed for the operator of a specific system to adjust the operating point. However, the threshold value will also depend on the distance measure and thus differ between systems of different origin. Hence, every bijective map between different threshold scalings should be allowed without changing appropriate separability measures. This especially affects transforms between scores and distances for which the relation holds: the higher the score, the lower the distance (and vice versa).

Occasionally, we will not plot the distance against the values of D. Instead, we use partly a bijective map between D and an index set I which otherwise shows all features of D. In the discrete case, the index set usually will be a set of natural numbers {0,1,2,3, ... ,K} such that 0 represents the virtual threshold -ε and the other numbers from 1 to K represent the K different distance values in DK in increasing order, e.g.,
{-ε} ∪ DK ≔ {-ε} ∪ {0, α2, α3, α4,,,αK} →  IK ≡  {0, 1, 2, 3,,,K}
(4.55)
where 0 < α1 < α2 < α3 <<<< αK and -ε ↦ 0, 0 ↦ 1, α2 ↦ 2, α3 ↦ 3,,,, αK ↦ K. We will use this definition throughout the paper. The corresponding functions will be underlined to indicate that they are different:
pI({-ε} ∪ DK) ≡ pI(IK)
pG({-ε} ∪ DK) ≡ pG(IK)
FMR({-ε} ∪ DK) ≡ FMR(IK)
FNMR({-ε} ∪ DK) ≡ FNMR(IK)
(4.56)
resulting in
pI(0) = pI(-ε) = 0
pG(0) = pG(-ε) = 0
FMR(0) = FMR(-ε) = 0
FNMR(0) = FNMR(-ε) = 1
(4.57)
with K being the smallest threshold such that
FMR(K) = 1
FNMR(K) = 0
(4.58)
With these conventions, Roca (ROCA) can be written for discrete D as
Roca
 
K
∑
n=1
FNMR(n-1)pI(n) with FNMR(0) 1 and FNMR(K) = 0
(4.59)
All other functions affected will be redefined accordingly:
FNMR(th)
 = 1 −
 
 
∑
x∈DK
x≤th
pG(x) = FNMR(n) = 1
n
∑
k=1
pG(k)
(4.60)
FMR(th)
 =
 
 
∑
x∈DN
x≤th
pI(x) = FMR(n) = 
n
∑
k=1
pI(k)
(4.61)
Independent of the use of direct thresholds or threshold indices, the DET curve will always be the same! From (4.61) and (4.60) it can be deduced that
pI(n) = FMR(n) - FMR(n-1)
(4.62)
and
pG(n) = FNMR(n-1) - FNMR(n)
(4.63)
In the continuous case there is no significant difference between the two notations, except perhaps if one chooses the common interval [0,1] with a bijective map D → I1 ≔ [0,1[. The interval may be half open in those cases where D = . Equations (4.62) and (4.63) are written as
pI = FMR'
(4.64)
pG = − FNMR'
(4.65)
in the continuous case where ' denotes the derivative. The continuous Roca is given by
Roca
  II1FNMRpI with FNMR(0) = 1 and FNMR(1) ≔ 0
(4.66)
Another measure for separability can be derived when using linear interpolation instead of calculating the ROC area by rectangles. We will call this measure DET area or shortly Deta. The desired interpolation is achieved when replacing FNMR(n-1) in (4.59) by the mean value of neighboring FNMR values, i.e., by ½(FNMR(n) + FNMR(n-1)):
Deta  ≡ ½
K
∑
k=1
(FNMR(k)+FNMR(k-1))pI(k) with FNMR(0) ≡ 1 and FNMR(K) = 0
(4.67)
As a result, Deta delivers the same value in all cases where all the points in a DET curve lie on a straight line such as in the trivial case of non-separability. In all other cases, Deta delivers more comparable values for differing number of distance points. If D is continuous, Deta is always equal to Roca (4.66). For that reason, no extra definition is required in the continuous case.
Example 4.11: DET formulas for the continuos case DK = [0,K]
All proofs: Integration by parts
DKFNMRn ⋅ pG = DKFMRn ⋅ pI
1
n+1
(4.68)
DKFNMR² ⋅ pI = 2DKFNMR ⋅ FMR ⋅ pG
(4.69)
DKFNMR² ⋅ pG = 2DKFNMR ⋅ FMR ⋅ pI
(4.70)
Roca = Deta = DKFNMR ⋅ pI = DKFMR ⋅ pG
(4.71)
DKFNMR² ⋅ FMR ⋅ pI = DKFNMR ⋅ FMR² ⋅ pG
(4.72)
"⋅" denotes the pointwise multiplication between functions.
Example 4.12: DET formulas for the discrete case IK = {0, ... ,K}
Deta  ≡ ½
K
∑
k=1
(FNMR(k)+FNMR(k-1))pI(k) = ½
K
∑
k=1
(FMR(k)+FMR(k-1))pG(k)
(4.73)
Roca  ≡ 
K
∑
k=1
FNMR(k-1)pI(k) = 
K
∑
k=1
FMR(k-1)pG(k)
(4.74)
Deta = Roca − ½
K
∑
k=1
pI(k)pG(k)
(4.75)
K
∑
k=1
(FNMR(k)+FNMR(k-1))pI(k) = 
K
∑
k=1
(FMR(k)+FMR(k-1))pG(k) = 1
(4.76)
Proof 1: Writing FNMR and  FMR as sum of pI resp. pG and rearranging the sum such that a product of sums over all density values appears which are 1 by definition.
Proof 2: The case of perfect non-separability is characterized by Deta = ½ (4.88) and pI = pG . Inserting this into the definition for Deta (4.67) immediately shows the result.
The relationship between EER and Deta is best given by the inequalities
EER² < Deta < (2 EER)EER
(4.77)
This can be shown by constructing the two worst case DET curves which are compliant with monotonicity and which meet the same EER. As a result, the smallest area DET curve belongs to an area of EER² and the largest area DET curve to an area of (2-EER)EER. If the DET curve even does not exceed the straight lines between (FMR,FNMR) = (0,1) and EER and between EER and (FMR,FNMR) = (1,0), an even tighter approximation is possible:
EER² < Deta ≤ EER
(4.78)
A further measure which is able to describe separability via the sum or mean of errors, namely
SOE ≡ FNMR+FMR
(4.79)
MOE ≡ ½(FNMR+FMR)
(4.80)
In the case of ideal separability, both SOE and MOE are zero in at least one point. Ideal Non-separability is characterized by SOE(x) = 1 and MOE(x) = ½, respectively, for every x ∈ D.

In the continuous case, it can be shown that a minimum of SOE or MOE at a point min implies pI(min) = pG(min). The reverse is not true. Especially, there may be multiple local minima. This behavior is unknown to the EER which is unique in value but may be spread over a  neighboring subset of D. The advantage of SOE and MOE however is their existence for all distances x ∈ D. If there is a global minimum, this minimum ("MSOE" and "MMOE") will be a global performance or separability measure like Deta.
MSOE ≡ min {SOE(x) | x ∈ D}
(4.81)
MMOE ≡ min {MOE(x) | x ∈ D}
(4.82)
Using the same worst case DET curves as for (4.77)  we get similar inequalities for MOE:
½EER < MOE < EER
(4.83)
In those cases where MMOE = EER, a further refinement of inequality (4.78) is possible:
MMOE = EER 2EER² < Deta ≤ EER
(4.84)
In the case of perfect non-separability, we have the functional equation FNMR = 1 - FMR where 1 denotes the identity function 1(x) = 1 for every x. As a result we get
FNMR = 1 − FMR SOE = 1 MSOE = 1
(4.85)
FNMR = 1 − FMR MOE = ½1 MMOE = ½
(4.86)
If the equal error rate point xEER exits, the MOE at this point is equal to EER:
FNMR(xEER) = FMR(xEER) = EER MOE(xEER) = EER
(4.87)
Example 4.13: Ideal non-separability
To see how the different separability measures behave, we consider the case of complete non-separability which is defined by the situation where the genuine and impostor densities completely overlap: pg = pI. If the densities are both continuous, which allows us to replace the sums by integrals, the results are easily seen to be:
Continuous case: EER = Roca = Deta = MMOE = ½
(4.88)
since FNMR = 1 FMR in the case of ideal non-separability (Example 4.7). The situation changes when D is a discrete finite set of values. Then no closed curve can be drawn, the DET curve is given as a sequence of points. 
FMR
 
EER   F
N
M
R
Fig. 4.5: Detection Error Tradeoff curve with discrete index
FNMR(i) FMR(i)
i
Fig. 4.6: Discrete FNMR and FMR versus threshold index

In the cases where continuous functions are sampled, the ROC area (Roca) defined in [BioFAQ] is only an approximation to the continuous case since it defines the area as sum of rectangles. If the DET diagram only comprises a finite set of points, the area may be "overestimated", i.e., the area is greater than ½. Let N+1 be the number of points including FNMR=1. If we assume that the FNMR/FMR points are equidistant, the ROC area (4.59) can be calculated as
Roca = ½(1 + 1/N)
(4.89)
This expression belongs to the distance densities pg(n) = pI(n) = 1/N for every n ∈ D. As we can see, Roca approaches the desired value ½ slowly and never reaches it, while the (interpolated) EER always assumes the value ½. The following diagram (Figure 4.7) shows two DET curves with the same Roca and the same Deta. If calculating Roca of the sampled DET points on the blue curve which represents non-separability, see Fig. 4.7, its area is exactly equal to the area under the red curve which comes from Fig. 4.8 and which does not represent exact non-separability. That is, we have the same Roca for significantly different areas. This is a result of disregarding the sampling theorem [Wikipedia] for continuous curves. The blue curve stands for exact non-separability in the continuous case. The corresponding area is exactly what we expect: ½. The EER is ½ in all three cases although the red curve has deserved a value larger than ½. Only Deta behaves "reasonable" in all three cases since Deta matches the continuous case exactly if the continuous curve connects the sample points by straight lines.

FMR
 
EER   F
N
M
R
Fig. 4.7: Detection Error Tradeoff curve with discrete D

We get

Blue line cont. Blue line sampled Red line cont. Red line sampled
EER
0.5
0.5
0.5
0.5
Roca
0.5
0.55
0.55
0.55
Deta
0.5
0.5
0.55
0.55
MMOE
0.5
0.5
0.5
0.5

Only bold values are in accordance with intuition. Since natural curves usually cannot be represented by piecewise straight lines, artifacts for all  three separability measures can be constructed.

FNMR(i) FMR(i)
i
Fig. 4.8: Discrete FNMR and FMR versus threshold index (constructed example)

In this constructed example we can observe the rare case where Roca and Deta coincide in the discrete case. According to (4.75) from Example 4.12, this would require the product of all density values being zero. This is indeed the case since pG and pI are zero where the other one is nonzero and vice versa.

Example 4.14: Simple feature space with stochastical features - cont'd
If we calculate EER, Roca, and Deta from Example 4.9, we see that EER in the nonseparable case E1 and E4 can only be calculated by interpolation. Linear interpolation indeed delivers the expected values ½. For E2 and E3, EER delivers the only reasonable values. However, Roca is larger than ½ in all cases although it confirms the tendency of EER that smaller values yield better systems.
 
  f1'
f2'
pG(0) pG(1) pI(0) pI(1) FNMR(0) FNMR(1) FMR(0) FMR(1) EER Roca Deta
E1
NH
NH
0.65
0.35
0.65
0.35
0.35
0
0.65
1
(0.5) 0.7725 0.5
E2
H
NH
0.55
0.45
0.45
0.55
0.45
0
0.45
1
0.45 0.6975 0.45
E3
NH
H
0.45 0.55 0.55 0.45
0.55
0
0.55
1
0.55 0.7975 0.55
E4
H
H
0.35 0.65 0.35 0.65
0.65
0
0.35
1
(0.5) 0.7725 0.5
FNMR(i)
FMR(i)
i
Fig. 4.9: Discrete FNMR and FMR versus threshold index (hat)
FMR
 
EER   F
N
M
R
Fig. 4.10: Detection Error Tradeoff curve (hat)
  f1'
f2'
pG(0) pG(1) pI(0) pI(1) FNMR(0) FNMR(1) FMR(0) FMR(1) EER Roca Deta
E1
NJ
NJ
0.55
0.45
0.55
0.45
0.45
0
0.55
1
(0.5) 0.7525 0.5
E2
J
NJ
0.15
0.85
0.85
0.15
0.85
0
0.85
1
0.85 0.9775 0.85
E3
NJ
J
0.85 0.15 0.15 0.85
0.15
0
0.15
1
0.15 0.2775 0.15
E4
J
J
0.45 0.55 0.45 0.55
0.55
0
0.45
1
(0.5) 0.7525 0.5
FNMR(i)
FMR(i)
i
Fig. 4.11: Discrete FNMR and FMR versus threshold index (jacket)
FMR
 
EER   F
N
M
R
Fig. 4.12: Detection Error Tradeoff curve (jacket)

The reason for this unwanted behavior which may prevent comparability between different systems is the discrete nature, i.e., the difference between sums and integrals. The situation can be mitigated, although not completely solved by using Deta instead of Roca. At least in this example, EER and Deta deliver quite satisfying values.

Example 4.15: Simple feature space with stochastical features and different enrolments (generalization)
Let us assume two individuals ⓘ1 and ⓘ2 with the following properties: ⓘ1 shows a feature with probability ψ1 and ⓘ2 carries the same feature with probability ψ2. We try to distinguish these two individuals solely by the availability of the feature. Furthermore we use the discrete metric and consider all four enrolment states for the reference features f1' and f2'.
  f1'
f2'
Reference Distance d(f1',f2')
E1
NA
NA
0
E2
A
NA
1
E3
NA
A
1
E4
A
A
0
(4.90)
The two individuals share the same feature with different probabilities which are represented by two points in a one dimensional feature space
F = {A, NA}
(4.91)
which contains the values: A = "feature present" and NA = "feature not present". The set of distances and indices are then given by
D = {0,1}
I = {0,1,2}
(4.92)
With this knowledge, the probabilities for equality and inequality can be determined straightforward:
 
  f1'
f2'
p1'1(1) p1'2(1) p2'1(1) p2'2(1) p1'1(2) p1'2(2) p2'1(2) p2'2(2)
E1
NA
NA
1-ψ1
1-ψ2
1-ψ1
1-ψ2
ψ1
ψ2
ψ1
ψ2
E2
A
NA
ψ1
ψ2
1-ψ1
1-ψ2
1-ψ1
1-ψ2
ψ1
ψ2
E3
NA
A
1-ψ1
1-ψ2
ψ1
ψ2
ψ1
ψ2
1-ψ1
1-ψ2
E4
A
A
ψ1
ψ2
ψ1
ψ2
1-ψ1
1-ψ2
1-ψ1
1-ψ2

The pi'j(0) are 0 by definition. Correspondingly, pi'j(1) = pi'j(0) and pi'j(2) = pi'j(1), see (4.55).
 

 
f1'
f2'
pG(1)
pG(2)
pI(1)
pI(2)
FNMR(0)
FNMR(1)
FNMR(2)
FMR(0)
FMR(1)
FMR(2)
E1
NA
NA
1-(ψ12)/2
12)/2
1-(ψ12)/2
12)/2
1
12)/2
0
0
1-(ψ12)/2
1
E2
A
NA
(1+ψ12)/2
(1-ψ12)/2
(1-ψ12)/2
(1+ψ12)/2
1
(1-ψ12)/2
0
0
(1-ψ12)/2
1
E3
NA
A
(1-ψ12)/2
(1+ψ12)/2
(1+ψ12)/2
(1-ψ12)/2
1
(1+ψ12)/2
0
0
(1+ψ12)/2
1
E4
A
A
12)/2
1-(ψ12)/2
12)/2
1-(ψ12)/2
1
1-(ψ12)/2
0
0
12)/2
1

This example also shows the large influence of enrolment. The red entries belong to a reference distance of zero. As a result, no selectivity can be expected. This is clearly shown by the fact that pI = pG which is characteristic for the case of ideal non-separability (Example 4.13). We get the equal error rates:
EER(E2) = (1 − ψ1 + ψ2)/2
(4.93)
EER(E3) = (1 + ψ1 − ψ2)/2
(4.94)
Since it is our interest to keep the EER below ½, the following restrictions hold for the corresponding enrolment:
E2: ψ1> ψ2
(4.95)
E3: ψ1 < ψ2
(4.96)
These are the conditions for the optimum enrolment. If we assume, without loss of generality, that E3 is the optimum enrolment, then the conditions for ideal separability are
EER = 0 ψ1= 0 and ψ2 = 1
(4.97)
If ψ2 = 1 − ψ1, ERR simplifies to
ψ2 = 1 − ψ1 EER = ψ1
(4.98)
The other separability measures are seen to be (E3)
MMOE = EER
Deta = EER
(4.99)

4.2.2 Weak Universality

Not every biometric characteristic will be available for each individual all the time. In such cases, no feature can be extracted and comparisons cannot take place. This leads to a new kind of error, called Failure to Acquire (FTA) [BioFAQ]. FTAs cannot be represented directly in the feature space.

4.3 Intra- Versus Inter-Characteristic Failure Rates

The preceding inter-characteristic approach to define biometric failure rates is completely different from the intra-characteristic approach in [Bromba2009]. In [Bromba2009] probabilities for the comparison of two samples from the same characteristic and from two different characteristics are considered. Here, we define the failure rates as mean values of the results of N² comparisons in a feature space with N features. In the first instance, assuming ideal biometric characteristics, the comparison results are deterministic and no failures occur. Only if we introduce a threshold, deterministic failures may be produced if the distance between two different features becomes smaller than the threshold.

Depending on the biometric modality, features may be distributed randomly or according to certain rule - at least theoretically. In the case of random distributions as suggested in Fig. 3.4, the distances will also behave random. Otherwise it should be possible to consider only a small number of feature pairs to evaluate the whole failure behavior, e.g., when the features are arranged in a regular grid.

In the intra-characteristic approach, only 1 pair of characteristics (same or different) is considered, well knowing that in practice the result is only valid for this specific pair at a certain time instant. From this view, the bilateral failure rates defined above correspond to the intra-characteristic failure rates. Thus, the natural extension would be to combine bilateral failure rates according to the definition above to get individual and global failure rates. For illustration, Figs. 3.4 and 4.3 may help. In Fig. 3.4, the features are distributed randomly. Each distance is more and less different and follows no rule except for the triangle inequality. This case refers to inter-characteristic variations. In Fig. 4.2, if we only consider one feature pair, there is a nonzero fluctuation of the distance for each measurement. This is the intra-characteristic case.

When using the score value statistics, both cases (intra- and inter-characteristic) cannot be separated in practice. However, by repeated measurements of the same features, mean values can be built such that the sum statistics (intra- and inter-characteristic) can be reduced to an inter-characteristic statistics. This has been shown in [Bromba2001, Bromba2002] for fingerprint. This way, the two cases can be separated under certain assumptions.
Example 4.16: Equidistant 1D features: Inter-characteristic FMR
Let us assume N+1 one-dimensional features such as human height but with constant Euclidean distance 1 between neighboring features, such that h0 = 0, h1=1, ..., hN=N. Then the (N+1)x(N+1) distance matrix with the bilateral distances dij is given by
 
h0
h1
h2
.
hN
=
0
1
2
.
N
d00
d01
d02
...
d0N
d10
d11
d12
...
d1N
d20
d32
d22
...
d2N
.
.
.
...
.
dN0
dN1
dN2
...
dNN
=
0
1
2
...
N
1
0
1
...
N-1
2
1
0
...
N-2
.
.
.
...
.
N
N-1
N-2
...
0

Now, in analogy to the failure rates, we may calculate the individual and global "distances" di and d, where we average over all available distances
di ≔ 
1
N
N
∑
j=1
dij = ½ (N+1) − i +
N
(4.100)
d ≔ 
1
N+1
N
∑
j=1
di = (N+1) +
2
3
(4.101)
As the number of features (N+1) increases, the mean individual and global distance will also increase. More interesting is the development of failure rates in dependence on the threshold th. Obviously, if th=0, no failures will occur. If th=1, neighboring features cannot be distinguished any more. The following match rate matrix shows this case:
 

MR00
MR01
MR02
MR03
...
MR0N
MR10
MR11
MR12
MR13
...
MR1N
MR20
MR21
MR22
MR23
...
MR2N
MR30
MR31
MR32
MR33
...
MR3N
.
.
.
.
...
.
MRN0
MRN1
MRN2
MRN3
...
MRNN
=
1
1
0
0
...
0
1
1
1
0
...
0
0
1
1
1
...
0
0
0
1
1
...
1
.
.
.
.
...
.
0
0
0
0
...
1

The bilateral FMRij for this case is equal to 1 if j=i±1 and 0 else. (The cases i=j represent the CMRi which are always 1 in this case.) Then the individual FMRi are given by FMRi = 1/N if i = 1 or N and 2/N else. The reason for this behavior is that the outer features only have 1 direct neighbor and thus are preferred. If the threshold increases, more and more neighbors are to be considered. A few simple-to-calculate general individual FMRs are given for i=0, N/2 (provided N is odd), or N . Then we get:
FMR0(th) = FMRN(th) = 
th
N
(4.102)
FMRN/2(th) = min{2
th
N
,1}
(4.103)
FMRN/2(th) is also the largest value among all individual FMRs. Note the linear growth with the normalized parameter th/N.

For the global FMR we get
FMR(th) = 
th(2N+1− th)
N(N+1)
(4.104)
For large N, the following approximation holds:
FMR(th) ~ 
th
N
(2 − 
th
N
)
(4.105)

Fig. 4.13: Continuous inter-characteristic FMR versus threshold

Figure 4.13 shows the extreme cases with maximum and minimum individual FMR. The border features (i=0 and i=N) show the lowest individual FMR and the center feature at i=N/2 the largest one. As expected, the global FMR which is the mean value of all individual FMRs, is lying between these two extremes. The FNMR is 0 for every threshold (except for the virtual threshold -ε) unless we expect stochastical errors.

Example 4.4 (human height) shows a typical example for a pure intra-characteristic FMR and FNMR. As usual, no threshold value can be chosen such that both errors becomes zero. This is due to overlapping of the probability densities for both height features.

The examples given above could be extended to show the mixed case which would be the best approximation to reality. Instead of fixed feature points in Fig. 3.4, we allow all features to behave like a cloud (Fig. 4.2) as shown for one pair in Fig. 4.4. Then we built the mean value to get the inter-characteristic representations.

5. Performance Parameters

This chapter discusses which influences are able to improve the recognition performance, i.e., reduce recognition failures. Most results are based on examples only, hence no general statements can be derived. The best way to find an optimum is to model reality stochastically and then trying to minimize failures. However, in many cases nature cannot be simulated exactly or is too difficult to be described by a mathematical model. In this cases heuristic methods are a common way to find a solution. To support this trial-and-error process, the following hints may be helpful. Additionally, the following considerations may help to find a deeper understanding of optimum solutions if available [Zhang].

5.1 Enrolment strategy

Example 5.1: Human height: Optimum enrolment
From Example 4.6 the question arises, how FMR and FNMR depend on the reference feature h'1. Obviously, FMR will be lowest, if h'1 has the largest possible distance to the mean value of h2 and if we regard the bilateral false match rate. If we consider the individual FMR, we have to take account all the other features which may be randomly distributed around feature h2. This could have the effect that a h'1, which has maximum distance to h2, will become unnecessarily close to a third feature, say h3. As a result, depending on the feature distribution in F, one cannot compulsorily expect an optimal value of h'1.

For FNMR the situation is easier to manage.  Obviously, the best h'1will be that one which maximizes the integral (4.29). At least if we have a symmetric probability density fh'1-h̃2 , this is the case if  fh'1-h̃2 has its maximum at zero and h'1 is equal to the mean value of 2. This suggests the following optimizing enrolment procedure: Take N samples of h1, say h1i (i=1,...,11), and calculate the mean value (as far as possible) to get the best reference:
h'1 ≔ 
1
N
N
∑
i=1
h1i
(5.1)
In many realistic cases, h'1 should minimize the individual FNMR while being neutral with respect to global FMR.

Example 5.2: Human height: Measurements with repeated enrolments
This example, which is based on Example 4.4, shows the relationship between the one-dimensional feature densities, the feature distance densities, and the failure rates depending on the decision threshold. No explicit enrolment is expected. This is equivalent to repeating enrolment with each new measurement.
p11=p22
fh1 p12=p21 fh2
Fig. 5.1: Density functions of the height features fh1 and fh2 and their distances p12 and p11

Fig. 5.1 shows the symmetric densities of the two height features fh1 and fh2. As a result the densities p12 and p21, which represent the broader distance densities, are equal and their peak locations represent the true difference h1−h2. Since the area of the densities must be 1, its peak height is smaller than the height of the feature densities. The densities p11=p22 represent the failure densities of two subsequent independent measurements of h1 or h2, respectively. Their height is twice as large as of p12=p21 as only positive values can occur.

From p11 and p12 FMR and FNMR can be calculated using formulae (4.24) and (4.25). We get the following diagram which shows the dependence of FMR and FNMR of the threshold value th.

FNMR(th) FMR(th)
th
Fig. 5.2: FNMR and FMR versus threshold th

The DET in Fig. 5.3 shows the corresponding dependence of FMR from FNMR with eliminated parameter th. As we can see, no threshold value can be chosen such that the error becomes zero. This is due to the overlap of the feature densities for both height features.

F
N
M
R
FMR
Fig. 5.3: Detection Error Tradeoff curve
Example 5.3: Human height: Measurements with optimum one-time enrolment
Based on Example 4.4, the effect of optimum enrolment is shown here. We assume a one-time enrolment with fixed reference feature h1' or h2' such that both values represent the peak location of their corresponding feature densities. Since the references are deterministic, the resulting distance densities have the same width as the corresponding feature densities. As a result, the overlap between p11 and p12 is smaller than in Example 5.2.
p1'1=p2'2
h1 p1'2=p2'1 h2
Fig. 5.4: Feature densities and distance densities for optimum enrolment

Smaller overlap, however, is equivalent to smaller failure rates, as is shown in the next diagram and the following DET curve.

FNMR(th) FMR(th)
th
Fig. 5.5: FNMR and FMR versus threshold th for optimum enrolment
F
N
M
R
FMR
Fig. 5.6: Detection Error Tradeoff curve for optimum enrolment

Comparing Figure 5.6 with Figure 5.3, an optimized enrolment strategy shows a significantly improved recognition performance!

Characteristic-dependent thresholds

Figure 4.3 shows a typical behavior of the features of different biometric characteristics. There may be characteristics which show large stochastical deviations while other behave very reproducible. If it is possible to estimate a characteristics' feature distribution during enrolment, it may suggest to introduce a variable threshold to decrease FNMR in the first case by increasing the threshold or FMR in the second one by decreasing it. The threshold is then to be stored together with the reference feature vector and can be used for verification as well as for identification. In the first case, however, this should only be done if additionally the distance to other features (genuine or impostors) is large enough. This has also to be checked during enrolment, provided a sufficiently large feature database is available. Otherwise the threshold decrease will increase FMR and thus may deteriorate security.

5.2 Adding feature components

Example 5.4: Combination of hat (Example 4.9) with jacket (Example 4.10)
This example treats the case of two individuals with two feature components each. As we shall see, the result greatly depends on the distance measure.

Discrete Metric δ

D3 = {0, 1} I3 = {0, 1, 2}
 
f1'
f2'
pG(1) pG(2) pI(1) pI(2) FNMR(1) FMR(1) EER Roca Deta
(H,NJ)
(NH,J)
0.46
0.54
0.06
0.94
0.54
0.06
(0.36̅4̅8̅) 0.5676
0.3
FNMR(i)
FMR(i)
i
Fig. 5.7: Discrete FNMR and FMR versus threshold index (hat & jacket, discrete metric)
FMR
 
EER   F
N
M
R
Fig. 5.8: Detection Error Tradeoff curve (hat & jacket, discrete metric)

Hamming metric d(f1,f2) ≔ δ(g1,g2) + δ(h1,h2), see (2.23)

f ≔ (g,h), g ∈ {H,NH}, h ∈ {J,NJ}
D3 = {0, 1, 2} I3 = {0, 1, 2, 3}
 
f1'
f2'
pG(1) pG(2) pG(3) pI(1) pI(2) pI(3) FNMR(1) FNMR(2) FMR(1) FMR(2) EER Roca Deta
(H,NJ)
(NH,J)
0.46
0.48
0.06
0.06
0.48
0.46
0.54
0.06
0.06
0.54
(0.3) 0.3468 0.204
FNMR(i)
FMR(i)
i
Fig. 5.9: Discrete FNMRand FMR versus threshold index (hat & jacket, Hamming metric)
FMR
 
EER   F
N
M
R
Fig. 5.10: Detection Error Tradeoff curve (hat & jacket, Hamming metric)

4-valued metric d1(f1,f2) ≔ αδ(g1,g2) + (1-α)δ(h1,h2), 0 < α < ½

f ≔ (g,h), g ∈ {H,NH}, h ∈ {J,NJ}
D3 = {0, α, 1-α, 1} I3 = {0, 1, 2, 3, 4}
 
f1'
f2'
pG(1) pG(2) pG(3) pG(4) pI(1) pI(2) pI(3) pI(4) FNMR(1) FNMR(2) FNMR(3) FMR(1) FMR(2) FMR(3) EER Roca Deta
(H,NJ)
(NH,J)
0.46
0.39
0.09
0.06
0.06
0.09
0.39
0.46
0.54
0.15
0.06
0.06
0.15
0.54
0.15
0.1947
0.132
FNMR(i)
FMR(i)
i
Fig. 5.11: Discrete FNMR and FMR versus threshold index (hat & jacket, 4-valued metric d1)
FMR
 
EER   F
N
M
R
Fig. 5.12: Detection Error Tradeoff curve (hat & jacket, 4-valued metric d1)

4-valued metric d2(f1,f2) ≔ αδ(g1,g2) + (1-α)δ(h1,h2), ½ < α < 1

f ≔ (g,h), g ∈ {H,NH}, h ∈ {J,NJ}
D3 = {0, 1-α, α, 1} I3 = {0, 1, 2, 3, 4}

This case corresponds to non-optimal weighting. i.e., the less distinctive feature component (hat) is weighted stronger.
 

f1'
f2'
pG(1) pG(2) pG(3) pG(4) pI(1) pI(2) pI(3) pI(4) FNMR(1) FNMR(2) FNMR(3) FMR(1) FMR(2) FMR(3) EER Roca Deta
(H,NJ)
(NH,J)
0.46
0.09
0.39
0.06
0.06
0.39
0.09
0.46
0.54
0.45
0.06
0.06
0.45
0.54
0.45
0.3387
0.276
FNMR(i)
FMR(i)
i
Fig. 5.13: Discrete FNMR and FMR versus threshold index (hat & jacket, 4-valued metric d2)
FMR
 
EER   F
N
M
R
Fig. 5.14: Detection Error Tradeoff curve (hat & jacket, 4-valued metric d2)

     
EER   F
N
M
R
FMR
Fig. 5.15: Detection Error Tradeoff curve (comparison, β ≔ 1-α)
Obviously, a weighted distance measure which favors the most distinctive feature component delivers the best results which is the jacket in this case, see Example 4.14:
 
Method EER MMOE Deta
Hat only 0.45 0.45 0.45
Jacket only 0.15 0.15 0.15
Hat & jacket, discrete metric (0.36̅4̅8̅) 0.3 0.3
Hat & jacket, Hamming metric (0.3) 0.3 0.204
Hat & jacket, weighted metric d1 0.15 0.15 0.132
Hat & jacket, weighted metric d2 0.45 0.3 0.276

Interestingly, only Deta sees an advantage between the jacket-only case and the hat & jacket combination with optimal metric and delivers a unique minimum!

Example 5.5: 2-component features space with stochastical features and weighted distance measures (generalization)
Let us assume two individuals ⓘ1 and ⓘ2 which distinguish by two stochastical properties c1 and c2 ("components") each of which are either available (A) or not (NA) with different probabilities 
ψij ≔ P(property cj is available for individual ⓘi)
c1 c2
1 ψ11 ψ12
2 ψ21 ψ22
(5.2)
The feature space now has two "dimensions" and can be composed such that 
F = F1 x F2 =  {fi = (c1i,c2i)| i ∈ {0,1}} where
F1= {c1| c1 ∈ {A, NA}} where A denotes "available" and NA "not available"
F2= {c2| c2 ∈ {A, NA}} where A denotes "available" and NA "not available"
(5.3)
Hence, F can be represented by its elements:
F = {(NA,NA), (NA,A), (A,NA), (A,A)}
(5.4)
As distance measure we choose the composite measure 
d(f1,f2) ≔ αd1(c11,c21) + βd2(c12,c22)
(5.5)
where d1 and d2 are the discrete metrics on F1 and F2, respectively, with f1, f2F, c11, c21F1, and c12, c22F2.

For the distance and index sets we get, depending on the weighting constants α and β:
α<β   D = {0, α, β, α+β}, I = {0,1,2,3,4}
(5.6)
α=β   D = {0,α,2α}, I = {0,1,2,3}
(5.7)
α>β   D = {0, β, α, α+β}, I = {0,1,2,3,4}
(5.8)
As in Example 5.4, we try to distinguish the two individuals solely by the presence of the feature components or properties. For enrolment, 16 possibilities exist, starting with f1' = (NA,NA) and f2' = (NA,NA) and ending with f1' = (A,A) and f2' = (A,A). We exclude all trivial enrolments, i.e., enrolments with f1' = f2'. Among the remaining 12 possibilities we intuitively select those ones which represent the largest probability, e.g.,
f1' = (A,NA) and f2' = (NA,A) if ψ11 > ψ21 and  ψ22 > ψ12
(5.9)
The two individuals 1 and ⓘ2 share the two properties with different probabilities and are represented by four points in the two-dimensional feature space F. To be more concrete, each reference feature yields one point in F, as well as each measurement will be represented by one point. But this may not always be the same point; each of the available four points may be affected with different probability.

As results for the bilateral distance densities pi'j(n), n∈D, (which are the probabilities that a certain distance value occur), we get under the assumption of stochastical independence:
p1'1(0) =
ψ11(1−ψ12)
p1'1(α) = (1−ψ11)(1−ψ12)
p1'1(β) =
ψ11ψ12
p1'1(α+β) =
(1−ψ1112
p1'2(0) =
ψ21(1−ψ22)
p1'2(α) = (1−ψ21)(1−ψ22)
p1'2(β) =
ψ21ψ22
p1'2(α+β) =
(1−ψ2122
p2'1(0) =
(1−ψ1112
p2'1(α) =
ψ11ψ12
p2'1(β) = (1−ψ11)(1−ψ12)
p2'1(α+β) =
ψ11(1−ψ12)
p2'2(0) =
(1−ψ2122
p2'2(α) =
ψ21ψ22
p2'2(β) = (1−ψ21)(1−ψ22)
p2'2(α+β) =
ψ21(1−ψ22)
(5.10)
Using formula (4.41) and (4.42), the global densities can be calculated as mean values:
Distance
Index
pG(Index)
pI(Index)
0
0
0
0
1
½(ψ1122−ψ11ψ12−ψ21ψ22)
½(ψ1221−ψ11ψ12−ψ21ψ22)
α
2
½(1−ψ11−ψ1211ψ1221ψ22)
½(1−ψ21−ψ2211ψ1221ψ22)
β
3
½(1−ψ21−ψ2211ψ1221ψ22)
½(1−ψ11−ψ1211ψ1221ψ22)
α+β
4
½(ψ1221−ψ11ψ12−ψ21ψ22)
½(ψ1122−ψ11ψ12−ψ21ψ22)
(5.11)
The calculation of FNMR and FMR depends on the order of the distance values. We distinguish the three cases α<β, α>β, and α=β and start with
α<β
n
FNMR(n)
FMR(n)
1
0
0
1−½(ψ1122−ψ11ψ12−ψ21ψ22)
½(ψ1221−ψ11ψ12−ψ21ψ22)
α
½ − ½(ψ22 − ψ12)
½ − ½(ψ22 − ψ12)
β
½(ψ1221−ψ11ψ12−ψ21ψ22)
1−½(ψ1122−ψ11ψ12−ψ21ψ22)
α+β
0
1
(5.12)
At the distance α, FNMR and FMR become equal and thus define the EER, i.e.,
EER = ½ − ½(ψ22 − ψ12)
(5.13)
It is striking that EER exclusively depends on the probabilities of the second component c2! Since β > α, this may be due to the stronger weight of the second component. The assumption ψ22 > ψ12 takes care for the requirement that EER < ½. A more differentiating view of the capability of multicomponent recognition delivers Deta which after some heavy, but straightforward calculations, is seen to be
Deta = ¼(2+ψ21−ψ11) + ¼(ψ12−ψ22)(3 11122122) + 2(ψ11ψ1221ψ22))
(5.14)
Here, also the other probabilities have an effect. This effect can be studied when making one component ideal, i.e., unique for each individual. This can be done by giving the first component of the two individuals 100% certainty for availability, resp. non-availabilty. That is we assume ψ11 = 1 and ψ21 = 0. This would deliver
EER =  ½(1 + ψ12 − ψ22)
(5.15)
Deta = (½(1+ψ12−ψ22))² = EER² ≤ EER
(5.16)
As already stated, EER is not affected although both individuals are perfectly unique and thus EER should be zero! This is an impressive example how comparison algorithms may behave suboptimal. Comparing (5.15) with the one-dimensional case (4.94), we see that no improvement is delivered by the additional component! When calculating
Deta = (½(1 + ψ12− ψ22))² = EER² ≤ EER
(5.17)
and comparing this with the one-dimensional case (4.99) at least an improvement can be stated which is expressed by the reduction effect of the squaring operation. But this result is not satisfying since we again expect a Deta of zero. The solution is to transfer the property of perfect uniqueness to the second component such that ψ12 = 0 and  ψ22 = 1. Only in this case we get what we expect:
Deta = 0 = EER
(5.18)
The systems behaves best, when ψ12 − ψ22 < ψ21 − ψ11, combined with a stronger weighting of the second component (β > α). This situation changes if α > β. In that case, the system becomes optimal if ψ12 − ψ22 > ψ21 − ψ11.
α>β
n
FNMR(n)
FMR(n)
1
0
0
1−½(ψ1122−ψ11ψ12−ψ21ψ22)
½(ψ1221−ψ11ψ12−ψ21ψ22)
β
½ − ½(ψ11ψ21)
½ − ½(ψ11ψ21)
α
½(ψ1221−ψ11ψ12−ψ21ψ22)
1−½(ψ1122−ψ11ψ12−ψ21ψ22)
α+β
0
1
(5.19)
For completeness, also the special case α=β is given here:
α=β
n
FNMR(n)
FMR(n)
1
0
0
1−½(ψ1122−ψ11ψ12−ψ21ψ22)
½(ψ1221−ψ11ψ12−ψ21ψ22)
α
½(ψ1221−ψ11ψ12−ψ21ψ22)
1−½(ψ1122−ψ11ψ12−ψ21ψ22)
0
1
(5.20)
A simple calculation delivers the interpolated equal error rate:
EERint = ½(1 + ½12− ψ22) + ½21 − ψ11))
(5.21)
This case corresponds to a metric without weighting. It seems to behave optimal when ψ12−ψ22 and ψ21−ψ11 are nearly equal but without being better than the weighted cases, provided the mismatch is small. However, it should be kept in mind that the interpolated EER need not be a reliable measure for comparing separability.

The following two examples show the influence of feature probabilities on Deta:
ψ11= 0.7, ψ12= 0.3, ψ21= 0.3, ψ22= 0.7 Deta = 0.216
(5.22)
ψ11= 0.4, ψ12= 0.1, ψ21= 0.3, ψ22= 0.8 Deta = 0.132
(5.23)

Example 5.6: Maximum distance features with N feature components
We consider a feature space F where the features fi of two individuals comprise N feature components cn such that 
f = (c1, c2, c3, ..., cN)
(5.24)
Each feature component cn may randomly assume the values 0 and 1 with probability 1- ψ and ψ, respectively, for each component of individual ⓘ1, and ψ and 1- ψ for individual ⓘ2, i.e.,
ψ ≤ ½
ci ∈ {0,1}
1: P{ci = 1} = 1 − P{ci = 0} = ψ
2: P{ci= 1} = 1 − P{ci = 0} = 1 − ψ
(5.25)
Since ψ ≤ ½,  we try as reference features
f1' ≔ (0, 0, 0, ..., 0)
f2' ≔ (1, 1, 1, ..., 1)
(5.26)
which allow a simple calculation of the corresponding error rates. As metric we take the Hamming distance d which counts the number of component disagreements between two features. The individuals ⓘ1 and ⓘ2 show the maximum possible distance of their references:
d(f1',f2') = N
(5.27)
D is seen to be D = {0, 1, 2, 3, ..., N} and the index set I = {0, 1, 2, 3, ..., N+1}. The density functions P{d(f,g) = k} are given by the binomial expression [Wikipedia]
kp1'1(k) = p2'2(k) ≔ P{d(f2',f2) = k} =  (
N
k
) ψk(1 − ψ)N-k ≕ p(k)
(5.28)
kp1'2(k) = p1'2(k) ≔ P{d(f1',f2) = k} =  (
N
k
) (1 − ψ)kψN-k ≕ q(k)
(5.29)
We then get
p1'1(k) ≔ P{d(f1',f1) = k} = p(k)
p1'2(k) ≔ P{d(f1',f2) = k} = q(k)
p2'1(k) ≔ P{d(f2',f1) = k} = q(k)
p2'2(k) ≔ P{d(f2',f2) = k} = p(k)
(5.30)
pG(k) = p(k)
pI(k) = q(k)
(5.31)
kFNMR(n) = 1 − 
n
∑
k=1
(
N
k
) ψk(1 − ψ)N-k
(5.32)
kFMR(n) =
n
∑
k=1
(
N
k
) (1 − ψ)kψN-k
(5.33)
FNMR(i) FMR(i)
i
Fig. 5.16: Discrete FNMR and FMR versus threshold index for 7 feature components
FMR
 
EER   F
N
M
R
Fig. 5.17: Detection Error Tradeoff curve for 7 feature components
The following animations show the behavior for varying N.
FNMR(i) FMR(i)
i
Fig. 5.18: Discrete FNMR and FMR versus threshold index
   
EER   F
N
M
R
FMR
Fig. 5.19: Detection Error Tradeoff curve with discrete index
Feature Components
ψ
EER
MMOE
Deta
1
0.3
0.3
0.3
0.3
2
0.3
(0.3)
0.3
0.216
3
0.3
0.216
0.216
0.16308
4
0.3
(0.216)
0.216
0.126036
5
0.3
0.16308
0.16308
0.09880866
6
0.3
(0.16308)
0.16308
0.078224791
7
0.3
0.126036
0.126036
0.062375212
Feature Components
ψ
EER
MMOE
Deta
7
0.5
0.5
0.5
0.5
7
0.4
0.289792
0.289792
0.228843953
7
0.3
0.216
0.216
0.16308
7
0.2
0.033344
0.033344
0.007003561
7
0.1
0.002728
0.002728
9.92855E-05
7
0.05
0.000193578
0.000193578
1.02554E-06
7
0.01
3.4167E-07
3.4167E-07
1.62789E-11
7
0.001
3.49161E-11
3.49161E-11
-

As expected, the recognition performance increases with increasing number of feature components, provided we consider the maximum distance features. Furthermore, while EER and MMOE change with the same order for increasing number of feature components or "individuality" (ψ vs. 1−ψ), Deta vanishes more rapidly.

Example 5.7: Minimum distance features with N feature components
Just as in Example 5.6 we consider a system where the features f of two individuals comprise N feature components ci such that 
f = (c1, c2, c3, ..., cN)
(5.34)
Each feature component ci may randomly assume the values 0 and 1 with probability 1- ψ and ψ, respectively, for each component of individual ⓘ1. Individual ⓘ2 changes probabilities for the first component c1, i.e.,
ψ ≤ ½
ci ∈ {0,1}
1: P{ci = 1} = 1 − P{ci = 0} = ψ
2: P{ci = 1} = 1 − ψ for i = 1 and P{ci = 1} = ψ for all i > 1
(5.35)
Since ψ ≤ ½,  we try as reference features
f1' ≔ (0, 0, 0, ..., 0)
f2' ≔ (1, 0, 0, ..., 0)
(5.36)
As metric we take the Hamming distance d which counts the number of component disagreements between two features. The individuals ⓘ1 and ⓘ2 show the minimum reasonable distance of their references:
d(f1',f2') = 1
(5.37)
D is given as D = {0, 1, 2, 3, ..., N} and the index set I = {0, 1, 2, 3, ..., N+1}. The density functions P{d(f,g) = k} are given by the binomial expression
kp1'1(k) = p2'2(k) ≔ P{d(f2',f2) = k} =  (
N
k
) ψk(1 − ψ)N-k ≕ p(k)
(5.38)
kp1'2(k) = p1'2(k) ≔ P{d(f1',f2) = k} =  (
N−1
k
) ψk+1(1 − ψ)N-k-1 + ( (
N
k
) + (
N
k
) k-1(1 − ψ)N-k+1  ≕ q(k)
(5.39)
We then get
p1'1(k) ≔ P{d(f1',f1) = k} = p(k)
p1'2(k) ≔ P{d(f1',f2) = k} = q(k)
p2'1(k) ≔ P{d(f2',f1) = k} = q(k)
p2'2(k) ≔ P{d(f2',f2) = k} = p(k)
(5.40)
pG(k) = p(k)
pI(k) = q(k) 
(5.41)
kFNMR(n) = 1 − 
n
∑
k=1
p(k)
(5.42)
kFMR(n) =
n
∑
k=1
q(k)
(5.43)
FNMR(i) FMR(i)
i
Fig. 5.20: Discrete FNMRand FMR versus threshold index
FMR
 
EER   F
N
M
R
Fig. 5.21: Detection Error Tradeoff curve with discrete index
FNMR(i)
FMR(i)
i
Fig. 5.22: Discrete FNMR and FMR versus threshold index (hat & jacket)
   
EER   F
N
M
R
FMR
Fig. 5.23: Detection Error Tradeoff curve with discrete index
Feature Components
ψ
EER
MMOE
Deta
1
0.3
0.3
0.3
0.3
2
0.3
(0.384)
0.36
0.342
3
0.3
(0.40984615)
0.402
0.36636
4
0.3
(0.414975)
0.4118
0.382341
5
0.3
(0.42238435)
0.41768
0.393758532
6
0.3
(0.43264246)
0.42797
0.402413305
7
0.3
(0.43630738)
0.435173
0.409257623

In contrast to Example 5.6 where we consider two features with maximum distance, this case shows the effect that the recognition performance decreases with the number of feature components. 

5.3 Multiple sampling during recognition

The same procedure which is common for enrolment can also be taken to improve recognition performance. Typically, the capturing process is repeated until recognition is successful or a limit is reached. The limit is required to prevent brute force attacks by repeatedly trying new characteristics. In the ideal case (the characteristic has not changed since enrolment and the trials are stochastically independent), FNMR will significantly be improved. At the same time FMR may increase, too. If the genuine repeats a trial, statistics is only affected by the intra-characteristic probability density. Since this density does not overlap too much with other densities, FMR will only be slightly affected. However, multiple recognition trials can also be used by impostors to try multiple characteristics. For that reason, other methods may be preferable. For example, one may try to build the mean value of all repetitions and then use it for comparison, unless variation is too large. This is excessively used by variable characteristics such as key stroke biometrics where it is essential to hit enough feature components, and if possible, to decrease their variance by averaging [Bartmann].

Generally, multiple trials can be considered as information fusion, using different samples of the same biometric characteristic. This fusion can obviously be performed at different stages of the system (Figure 2.1):

Depending on which method is used and on the quality of enrolment, different improvements are expected. An improvement, however is not self-evident: While FMR may decrease, FNMR may increase and vice versa. This will be illustrated with several examples.

To simplify the calculation, stochastical independence is expected. But it should be noted that most practical situations do not imply independence. Evaluations may force special measures to keep independence for the sake of comparability. This may, however, not necessarily boost usability in practice!

5.3.1 Feature layer fusion

Manipulations in the feature space mostly encompass linear and non-linear averaging operations on the individual feature components. This may require an extension of the feature space.
Example 5.8: Jacket with Euclidean metric
Instead of using the discrete metric as in Example 4.10 we now try the Euclidean metric for determining the distance between reference features and sample features.

As references we do not choose the symbols J and NJ as in Example 4.10, but numbers such as 1 for "jacket" and 0 for "no jacket". As we use the same stochastical conditions as in Example 4.10, we are now able to use a kind of optimal enrolment which corresponds to the mean value of the features. The feature samples f1 and f2 can assume the values 0 and 1. Hence, we try

f1' ≔ 0.1 and f2' ≔ 0.8 

as values for the references since these are the probabilities for the individuals ⓘ1 and ⓘ2 to wear a jacket. The feature space is then given by F = {0, 0.1, 0.8, 1} and the space of enrolled features by F' = {0.1, 0.8}.

D = {0, 0.1, 0.2, 0.8, 0.9}
I = {0, 1, 2, 3, 4}
Since the distance value 0 does only appear in theory here, we can omit the virtual distance -ε and map 0 ∈ D to 0 ∈ I.

dij ≔ |fi - fj| (Euclidean metric)

A general improvement by using the "optimized" enrolment cannot be expected here. Indeed, the general situation does not change since there are fixed relationships between the distances such that the most probable feature meets the shortest distances. This should only change the distances, not their probabilities. There is one difference, however: The probabilities are spread over more different distance values! This may slightly change our separability values.
 

p1'1(0.1) = 0.9
p1'1(0.9) = 0.1
p1'2(0.1) = 0.2
p1'2(0.9) = 0.8
p2'1(0.2) = 0.1
p2'1(0.8) = 0.9
p2'2(0.2) = 0.8
p2'2(0.8) = 0.2
Distance Index pG FNMR pI FMR MOE
0 0 0 1 0 0 0.5
0.1 1 0.45 0.55 0.1 0.1 0.325