For
the proper display of all symbols in this document the font "Arial Unicode
MS" must be installed
Performance Measures for Biometric Recognition
Manfred U. A. Bromba
First release: 2010-08-31 Status: 2012-01-15
| This
paper describes the performance of biometric recognition of individuals
with the aid of distance measures in metric spaces. The difference between
intra- and inter-characteristic definitions is discussed as well as its
influence on the definition of biometric failure rates. On the basis of
these definitions it is shown with the help of numerous examples how recognition
performance can be influenced by proper enrolment, by adding individuality
information (fusion), and by multiple sampling during recognition. |
|
1. Introduction
One of the main tasks
of biometric recognition systems is the comparison of biometric features.
If the distance, measured between two features, is small enough, the features
are attributed to the same individual. This suggests to represent biometric
features as (multidimensional) points in metric spaces and to use the well-established
term metric as measure for the distance (or inversely: similarity)
between two features.
This paper extends
the elementary intra-characteristic derivation of biometric failure rates
to the case where the pseudo-statistical properties of an ensemble of biometric
features from different biometric subjects are considered. While the intra-characteristic
consideration only treats the stochastical properties of single comparisons
between sample pairs of the same or two different characteristics, the
inter-characteristic analysis extends the definitions to all pairs of different
characteristics within the ensemble. This greatly helps to understand the
fact that different pairs of features show quite different properties in
real life. The main targets are to prepare the basics for optimum biometric
recognition procedures and the cost-efficient evaluation of their performance.
2. Biometric Features
and Metric Spaces
Automated recognition using biometric characteristics
is based on the availability of biometric features which fulfill the following
requirements as far as possible [BioFAQ]:
|
BCR1. |
|
Uniqueness (distinctness between individuals) |
|
BCR2. |
|
Measurability |
|
BCR3. |
|
Universality |
|
BCR4. |
|
Permanence |
A typical biometric recognition system
can be separated into four function blocks:
-
Capturing using an electronic sensor
-
Modality detection
-
Feature extraction
-
Feature comparison with reference features
|
Biometric
sample
|
|
Biometric
feature
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Biometric
enrolment database
|
Biometric
templates
|
|
|
|
|
|
|
|
|
|
Fig.
2.1: Typical biometric recognition system
|
|
During capturing a sample of the biometric
characteristic is recorded which must "contain" the biometric feature.
The Modality Detection is responsible for the availability of the desired
modality (finger, face, iris, etc.) and may refuse images which do not
meet predefined quality and other requirements, such as liveness. During
feature extraction those information is omitted as far as possible which
does not contribute to the requirements of uniqueness, universality, and
permanence. The extracted feature is the basis for comparison. It can be
expressed mathematically as a multi-dimensional vector or at least as a
list of property parameters.
The main task of feature extraction can
alternatively be described as a method to reduce dimensionality. For example,
many biometric systems start with capturing a 2D image of the biometric
characteristic using a digital black & white camera. If the image sensor
has N x M pixels, ideally 1 dimension per pixel, i.e., NM dimensions, can
be considered, where each dimension typically may represent 256 gray values.
If the image represents a fingerprint, the dimension will reduce, e.g.,
to the number of extracted minutia multiplied by the number of minutia
properties. The number of minutia properties is typically 4: horizontal
position, vertical position, direction, and type (ending or bifurcation).
An important measure to reduce dimensionality
is alignment. Alignment can be expressed as a projection operator with
the property that already aligned data will not change if alignment is
repeated. For example, the most common 1-dimensional feature of human is
height. In a 3-dimensional Euclidean space, an advantageous alignment is
the projection of the feet to the coordinate (x1,y1,z1)=(0,0,0)
and the top of the head to (x2,y2,z2)=(0,0,h)
where h is the height of the person. In this case, the coordinate z bears
all information required. This alignment can be attained in at least two
ways: Either the person is asked to stand upright at a certain point or
the coordinates of feet and head are measured and then projected to (0,0,h)
by using two translations and one rotation. This way the number of relevant
dimensions reduces from 6 to 1. This example also demonstrates that alignment
is not only a matter of feature extraction but also of the capturing process.
In cases where alignment is not completely possible, even the comparison
function is affected. For example, while iris images can simply be aligned
in horizontal and vertical directions, rotation cannot be compensated for,
especially if only one eye is captured. In such cases, the comparison stage
has to try all rotations until the "distance" between the features approaches
a minimum.
Sometimes, the camera itself delivers
unwanted projections. For example, faces are embedded in the 3 dimensional
Euclidean space. If the image is taken by a 2D camera, certain rotations
are difficult to compensate, if the face is not aligned mechanically in
front of the camera. Using 3D cameras, such unintentional rotations can
be simply compensated in the digital domain. Additionally, the features
to be extracted are hardly affected by illumination in the 3D case [3DFaceProject].
For fingerprints, alignment in the third
direction is done mechanically by pressing the finger on the 2D sensor
surface. During feature extraction, alignment is more difficult. Horizontal
and vertical alignment is possible, if the fingerprint pattern would show
a common property for orientation. Such a
property
could be the core of the ridge pattern flow. Unfortunately, small sensors
may not show the center at all and many fingerprints
show more than 1 center.
But even if a core is found, this only
allows translational alignment. Rotations remain a problem which can be
solved by finding the best orientation during comparison, e.g., by trial
and error. This will, however, increase computation time significantly.
Beside their distinctive
properties, most biometric characteristics also show common properties
which can be used to detect their presence after capturing. Such properties
are only distinctive between different biometric modalities, not between
different characteristics of the same modality. Although these properties
cannot be used for recognition of individuals they are advantageous especially
for alignment. Examples are the ridge structure of fingerprint, the presence
of eyes, nose and mouth in faces, the general outline of a hand with five
fingers and the special shape of an eye with pupil, iris and surrounding.
Ultimately, detection and recognition use the same mathematical procedures.
In this context, we will understand Modality Detection as recognition
based on known common properties with respect to different individuals
of a biometric characteristic (which define the modality), whereas Feature
Detection is based on the known distinct properties with respect
to different individuals of the same biometric characteristic. If said
common properties are not present, no feature processing need follow, and
the recognition fails anyway.
Since property detection
and property estimation are mathematically similar tasks, detection may
be viewed as an add-on of estimation. (Usually, detection requires that
some measure of the parameters exceeds a certain threshold.) Generally,
it must be regarded that the functional blocks capturing, detection, extraction,
and comparison cannot always be separated exactly.
2.1 Feature Spaces
We start with the set
of subjects which shall be distinguished only by their biometric (or other)
features. This set shall contain K ∈ ℕ subjects sn
with 1 ≤ n ≤ K and is written
| S
≔ {s1, s2, s3,
..., sK} |
|
(2.1)
|
|
such that |S|
= K. From this set S which defines the set of all possible
subjects we can build the subset S of subjects which shall be known
to the biometric system ("subjects under consideration"). For simplicity
we assume that just the first N members of S are also the
members of S such that |S| = N ≤ K:
| S ≔
{s1, s2, s3,
..., sN} |
|
(2.2)
|
|
Usually,
the subjects sn are indistinguishable (except for
the virtual indices n), but not the same. To make them distinguishable,
we next consider the space A ("attributes") of measurable properties
which can be attributed to a subject. More concretely, A shall not
only contain all possible elementary properties but also all combinations
of properties as ordered n-tuples [Wikipedia],
usually created as Cartesian products [Wikipedia]
of sets. A contains all possible properties, i.e., properties which
are required to distinguish individuals, properties which define a biometric
or non-biometric modality, and so on.
To be able to distinguish
individuals, uniqueness is required. For this purpose we attribute
certain properties an ∈ A to a subject sn
∈ S such that an individual ⓘn is created:
| ⓘn
≔ (sn, an) ∈ ℑ ("I
Fraktur") |
|
(2.3)
|
|
where
is a set of individuals.
The trick is now that we claim all elements of A to be measurable.
As a result, we are able to determine the "difference" between two individuals
ⓘn and ⓘm by measuring the
distance d between their attributes as d(an, am).
Only if their attributes are different, subjects can be distinguished and
become individuals. In this sense, uniqueness is only defined for individuals,
not for attributes and means that there is no second individual with the
same attributes. This requires, of course, that there are at least as many
different attributes as subjects.
Here we understand
an individual to be separable, not necessarily to be unique as this is
defined, e.g., in [Wikipedia].
Separability is guaranteed by the subject's virtual index only. This index
is not measurable. For that reason we need measurable attributes to distinguish
individuals practically.
The pairing (sn,
an) may be of different strength and of different
origin. Generally, individuality may be awarded artificially (password,
RFID, clothes) or may be a naturally fixed part of the subject's body (face,
iris, fingerprints).
As already mentioned,
the space A itself may be a Cartesian product of elementary attribute
spaces. This allows us to describe an attribute an
as vector with arbitrary many components, e.g.,
| an
= (color, height, name, history, ...) |
|
(2.5)
|
|
helping us to create
more combinations so that we have enough different attributes an
to distinguish all individuals in ℑ.
On the other hand,
attributes which are common for a non-empty set of subjects, may be of
interest. If ℑ is the set of all individuals, e.g., all living
humans, then a population is a subset ℘ (P script) of ℑ
which is united by certain common properties of its members. Such properties
could be profession, gender, race, hobby, name, etc. Of special interest
are populations which have a measurable influence on biometric recognition.
Regarding non-idealities, special populations may show better or worse
failure rates, such as the population of bricklayers which may have problems
with fingerprint recognition. When determining failure rates for biometric
recognition systems, the properties of a population are essential when
comparing the rates between different applications.
Common properties
can also be used to define biometric modalities like behavior type, face
geometrics, fingerprint pattern, iris pattern, or any other property like
knowledge (passwords), possession (smartcard, key, wear), or marks like
scars and tattoos. Here the focus is on common properties: What is common
to all faces or to all fingerprints or to all passwords? The question,
"What distinguishes all password from all faces or from noise?" can
be used for modality detection to decide whether the desired biometric
(or other) characteristic has been measured or not. For modality detection,
the distinctive properties of a characteristic are irrelevant.
The distinctive
properties required for recognition are collected as feature vector fn.
As feature space F we define a set of those attributes fn
which can be used for comparison with other members of F.
What can be compared is only depending on the capabilities of the comparison
stage in Figure 2.1. Consequently, the feature space
F
is a subspace of all attributes and their combinations
A:
To prevent loss of
generality, we also allow raw signals, coming directly from the sensor,
to be compared by an appropriate comparison stage and to define a generalized
feature space. The raw or sample signals create the sample space Σ.
However, using Σ as generalized feature space may not necessarily
help vividness. The intention is to allow the comparison function to provide
optimum processing in a mathematical sense, at least theoretically.
In practice, not
enough a priori knowledge about the biometric characteristic and the inherent
character of the noise which deteriorates the ideal biometric feature may
be available to define the optimum algorithm or the problem is too complicated
to result in a simple solution. For that reason, most practical systems
follow the block diagram of Figure 2.1. If so, the
feature space contains all biometric features which come from a biometric
characteristic or are stored as biometric references, resp. templates.
Each feature fn as an element of F
may comprise subfeatures (more elementary properties) or feature components.
Nevertheless, if we talk about features as plural, we don't mean the subfeatures
of one feature vector but different elements of F. We note
that
F need not be a vector space. Generally,
More precisely, we
regard F as the space of all possible features within
a specific comparison-oriented representation. That is, if the comparison
block of Figure 2.1 cannot process a feature, the
feature cannot be a member of F.
From F
the subset F of all used features may be separated.
"Used" means, each feature is used by at least one subject from S.
In the ideal case, for each known subject sn ∈ S
there is exactly one feature fn and a bijective
map ⓘn ∈ ℑ : sn
↔
fn called individual. In this case
If features are noisy,
F
becomes a random variable with usually different realizations
F(ω)
for different ω. In this case we get
since different features
may accidentally fall together. If individuals are not unique even in the
absence of noise, we may even get
For example, purely
genotypic features may be equal for monozygotic twins, making them indistinguishable.
The case |S|
< |F| would mean that there are more used features than subjects.
This will be excluded by the requirement that at most one feature from
F
is assigned to one subject and is equivalent to the definition of functions
[Wikipedia].
Example 2.1: Human height embedded in three
dimensions
Each individual is characterized by two points:
the top of the head and the bottom of the feet. In the 3D Euclidean space
this requires two points with three coordinates each, resulting in 6 numbers
per individual. If we assume no quantization in the lengths, F
comprises infinite many 6-tuples, which can be compared using a suitable
comparison algorithm. Alternatively, an alignment can be performed by preprocessing
as indicated in Figure 2.1.
Then F will be the space of all human heights, which can
be compared with each other by very simple methods (metrics). |
Example 2.2: Human height, aligned
If the individuals are already aligned in
3D space, one individual can be characterized by a single number, the distance
between the two 3D points given in Example 2.1.
This can be done by direct measurement or by calculating the Euclidean
distance between the two points given in Example
2.1. In this example, alignment is performed as part of the capturing
process, while in the previous one, alignment is done after capturing,
either by the comparison stage or by the preprocessing steps. |
Example 2.3: Fingerprint images
If fingerprints are compared without preprocessing,
the raw images form the feature space F. Its dimension is
given by the number of pixels. The range of each pixel is typically given
by the size of one Byte, e.g., 0...255. The number of elements is easily
computed, e.g., as |F| = L x M x 255 where L and M are the
number of pixels in horizontal and vertical direction. In comparison to
other representations, this one is computationally more intensive. One
reason for this fact is that the reference must also be stored as image.
This may require numerous computation steps to be repeated for each new
comparison. |
Example 2.4: Fingerprint minutiae
Since it is well known that the minutiae
patterns are the carrier of uniqueness, most systems only store the minutiae
properties as reference. This requires preprocessing to extract the minutiae
from the fingerprint image. Remaining is a list of a variable number of
minutiae which are each described by their horizontal and vertical position,
the minutia type (ridge bifurcation or ending), and the angle of the ridge
structure at the location of a minutia. The dimension
of F is 4K if K is the maximum number of minutiae of one
feature. While position and angle are expressed as a quantized real
number, the minutia type only requires 1 bit. In this case, the feature
components need not have the same structure. |
Example 2.5: Hair color
Though hair color of an individual may not
be a very distinctive feature, it may nonetheless serve as a good example.
This feature is by far not perfect. It may change over time, can simply
be altered artificially, is not available for all individuals, may change
locally etc. The feature space may be a set of hair colors such as F
≔ {brown, black, blond, auburn, red, white}. More technically, F
may be defined as the space of all optical hair spectra or more practically
as a 3-component color vector. |
Example 2.6: Keystroke
For keystroke recognition, the raw information
are the timing components (and sometimes also the pressure) of certain
key successions. The resulting features, together with their stochastical
properties, are very distinctive between different persons. |
2.1.1 Combination
of Features
It has already been
mentioned that features can be decomposed into subfeatures until elementary
features remain. Conversely, there may be different feature spaces, e.g.,
F1
and F2 with different biometric modalities
which may be combined to a new feature space such that
| F ≔ F1
x F2 ≡ { f1 ∈
F1,
f2
∈ F2 | f = (f1,
f2)} |
|
(2.12)
|
|
Alternatively, F1
and F2 may come from different statistical
measurements
| Fa
≔ F(ω1) x F(ω2)
= { f1 ∈ F(ω1),
f2
∈ F(ω2) |
f = (f1,
f2)} |
|
(2.13)
|
|
or from different time
instants
| Ft
≔ F(t1) x F(t2)
= { f1 ∈ F(t1),
f2
∈ F(t2) |
f = (f1,
f2)} |
|
(2.14)
|
|
In certain cases it may be very useful to
combine two (or more) features in such a way that the resulting feature
remains a member of the original feature space F. Especially,
if features of the same individual are varying due to noise, noise reducing
processing may help to improve recognition performance significantly. Such
processing methods are building the mean or median value. To keep the result
of such processing in the original feature space, the feature space must
be equipped with an appropriate structure. For example, to build mean values
requires that the sum and the multiplication by a scalar is allowed in
the feature space. In this case, a vector space [Wikipedia]
would meet the requirements. In other cases where the components of a feature
only indicate the presence (=1) or absence (=0) of a binary property of
an individual, a mean value may not be appropriate. In such cases the median
value may deliver improved features since only the most frequent value
is to be taken and this value is in the feature space by definition.
Unfortunately, real-life situations are
more complicated. For example, a fingerprint feature may have continuous
and binary components. The location of a minutia is such a continuous component
which may easily be treated as vector space component. On the other hand,
minutia type is a binary feature component which requires different treatment.
Often, certain minutiae are not available. In this case it makes no sense
to replace them by a null vector since then averaging would deliver useless
results.
Furthermore, features should be aligned
before averaging. Practically, this can be handled by aligning two features
by calculating the minimum distance for all possible alignments. If this
minimum is zero or below a threshold, it is indicated that both features
come from the same biometric characteristic. If so, a mean value may be
taken.
Example 2.7: Binary features and median
Let F be a feature space such
that
F = {0,1}. Let fn ∈ F(ωn)
be a random feature with realizations
| f1 = 0, f2
= 0, f3 = 1, f4 = 0, f5
= 1 |
|
(2.15)
|
|
The median [Wikipedia]
value is seen to be
| median{f1, f2
, f3, f4, f5}
= median{0, 0, 1, 0, 1} = median{0, 0, 0, 1, 1} = 0 |
|
(2.16)
|
|
That is, the value with the higher frequency
is taken. Note: If we have an even number of elements to be combined, the
median is not unique as the number of 0s and 1s may be equal. In such a
case, any value (0 or 1) may be taken as result. Statistically this should
have little impact. (Sometimes, for the non-unique case, ½ is defined
as median. We avoid this since ½ is not an element of F
and thus may confuse the biometric comparison operation). |
Example 2.8: Adding a unique ID as "feature
component"
If biometric features are not sufficiently
unique (this may also hold for passwords), some unification may be advised.
This can most favorably be attained by adding a unique feature component
to the other components of a biometric feature. The trick is that the unique
component may be public while the non-unique one may represent a secret
or something that can only be delivered by the dedicated individual. (Otherwise
the unique component would be sufficient!) For passwords this method is
very common since the very beginning of its use by combining the password
with a user ID which can be measured easily with little errors. A user
ID may also be a solution for biometric features since user IDs can be
more rapidly be identified than biometric features.
It is important to use an appropriate
distance measure for the new combination to get all the advantages one
expects. Otherwise, the unification may fizzle out as we shall see in Example
5.5. |
2.2 Distance measurements
To check whether a biometric sample belongs
to the desired modality or whether two features are coming from the same
biometric characteristic, distances have to be determined. In the case
of comparing features, a general distance measure d over the space
F
of features which is a functional d: F x F
→ D ⊂ ℝ⊕ should be defined for every f,
g,
h
∈ F by the following properties:
|
DIS1. |
|
(i) |
|
d(f,g) ≥ 0 |
|
non-negativity |
|
DIS2. |
|
(ii) |
|
f = g ⇒ d(f,g)
= 0 |
|
|
|
DIS3. |
|
(iii) |
|
d(f,g) = 0 ⇒ f
= g |
|
|
|
DIS4. |
|
(iv) |
|
d(f,g) = d(g,f) |
|
symmetry |
|
DIS5. |
|
(v) |
|
d(f,g) ≤ d(f,h)
+ d(h,g) |
|
triangle inequality |
|
DIS6. |
|
(vi) |
|
d(f,g) ≤ max{d(f,h),d(h,g)} |
|
|
Note that DIS1 is not independent of
the other conditions. D is a set of non-negative numbers which may be countable
or uncountable, finite or infinite but always being a subset of ℝ⊕.
Depending on
which conditions hold, different definitions can be given [Wikipedia]:
|
DIS1
|
DIS2
|
DIS3
|
DIS4
|
DIS5
|
DIS6
|
| premetric |
✓
|
✓
|
|
|
|
|
| pseudoquasimetric |
✓
|
✓
|
|
|
✓
|
|
| semimetric |
✓
|
✓
|
✓
|
✓
|
|
|
| pseudometric |
✓
|
✓
|
|
✓
|
✓
|
|
| quasimetric |
✓
|
✓
|
✓
|
|
✓
|
|
| metric |
✓
|
✓
|
✓
|
✓
|
✓
|
|
| ultrametric |
✓
|
✓
|
✓
|
✓
|
✓
|
✓
|
Which distance measure is taken depends
on the concrete realization. For modality detection,
the feature space
F should be replaced by a suitable subspace
of A. The distance measure over this subspace may be completely
different from that over F.
Example 2.9: Human height: Euclidean metric
After having reduced the dimensions to 1,
a suitable distance measure for measuring the difference of two heights
f and g is the Euclidean metric d(f,g) ≔ |f − g| where
f, g ∈ ℝ and D = ℝ⊕. |
Example 2.10: Discrete metric
Let F be the space of all PINs
including zero: F = ℕ0 ≡ ℕ⋃{0}.
Then the distance measure
| δ(fi,fj)
≔ |
|
| 0 |
|
if fi = fj |
|
1
|
|
if fi ≠
fj |
|
|
(2.17)
|
|
is a metric called discrete metric
δ: F x F → D = {0,1} |
Example 2.11: Correlation coefficient
Let F be a Hilbert space [Wikipedia]
with scalar product <•,•> and norm ‖•‖= √<•,•>. Then
the correlation coefficient χ(f,g) can be defined
as
for all nonzero f, g ∈ F.
χ is a measure for the similarity of two vectors f and g
which is independent of the amplitude since χ(αf,βg) =
χ(f,g). χ can be written as
If f = αg for any α≠0,
then χ(f,αg) = 1 which represents the maximum due to Schwartz's
inequality. In the case of minimum similarity we get χ(f,g)
= 0 which is reached if f is a symmetric vector and g an
antisymmetric one. Since similarity intuitively behaves inversely
to distance we define a distance measure as
This measure is only
a pseudometric since DIS3
is not fulfilled, e.g., in the case g ≔ αf ≠ f
where 0≠α∈ℝ, we get d(f,g) = 0. |
2.2.1 Composite distance
measures
Let F1
and F2 be two feature spaces with distance
measures d1 and d2 such that
d1: F1 x F1
→ D1 and d2: F2
x F2 → D2. Furthermore
, let α and β be real numbers with α > 0 and β > 0. Finally let F
be a composite feature space such that F ≔ F1
x F2. Then d: F x F → D = αD1⊕
βD2 with
| d(z1,z2)
≔ αd1(x1,x1)
+ βd2(y2,y2) |
|
(2.21)
|
|
is a distance measure:
|
COMPDIS1. |
|
d(z1,z2)
≥ 0 |
|
if d1
and d2 fulfill DIS1 |
|
COMPDIS2. |
|
z1
= z2 ⇒ d(z1,z2)
= 0 |
|
if d1
and d2 fulfill DIS2 |
|
COMPDIS3. |
|
d(z1,z2)
= 0 ⇒ z1 = z2 |
|
if d1
and d2 fulfill DIS3 |
|
COMPDIS4. |
|
d(z1,z2)
= d(z2,z1) |
|
if d1
and d2 fulfill DIS4 |
|
COMPDIS5. |
|
d(z1,z2)
≤ d(z1,z3) +
d(z3,z2) |
|
if d1
and d2 fulfill DIS5 |
The weighting factors α
and β may have influence on D. For α > β, the ordering of the corresponding
distance values in a discrete D is different from α < β. If α = β,
this may reduce |D|. Both has a strong influence on the failure rates to
be defined as we shall see!
Composite distance
measures may be required when a feature comprises feature components of
different type, for example real components such as length and binary components
such as "available" and "missing".
Example 2.12: Hamming metric
The Hamming metric is well known as Hamming
distance [Wikipedia]
and describes the number of different symbols when comparing two strings.
Since the equality of two symbols can be measured by the discrete metric
δ (Example 2.10), we can describe the Hamming
metric as composite metric using the discrete metric. Let F
be a feature space with N different feature components. Each feature component
is characterized by one symbol (which may be a number or a character etc.)
and is represented by a feature subspace
Fn
such that
| F = F1
x F2 x F3
x ... x FN |
|
(2.22)
|
|
Then the Hamming metric d can be written
as
| d(f,g) = δ(f1,g1)
+ δ(f2,g2) + δ(f3,g3)
+ ... + δ(fN,gN) |
|
(2.23)
|
|
where f = (f1,
f2, f3, ... , fN)
∈ F, g ∈ F, and fn
∈ Fn. δ is the discrete metric (Example
2.10). |
3. Biometric
Recognition in Metric Spaces
3.1 Recognition
Assuming perfect uniqueness
and permanence of biometric features and the absence of measurement errors
we can expect that each sample of the same biometric characteristic delivers
exactly the same feature vector, while the samples of different characteristics
result in different feature vectors. That is, each characteristic is represented
by a distinct point in the feature space. When comparing two features to
examine their origin, only two results are possible:
1. the features are equal and
belong to the same characteristic or
2. the features are unequal and thus
belong to different characteristics.
Although no failures can occur in the ideal
case, it would be of interest to learn more about the distribution of,
say N, features within the feature space F. This can best be done
by comparing each feature with each other feature, denoting the results
as dij ≔ d(fi,fj)
which can be shown in matrix form:
|
d11
|
d12
|
d13
|
d14
|
...
|
d1N
|
|
d21
|
d22
|
d23
|
d24
|
...
|
d2N
|
|
d31
|
d32
|
d33
|
d34
|
...
|
d3N
|
|
d41
|
d42
|
d43
|
d44
|
...
|
d4N
|
|
.
|
.
|
.
|
.
|
...
|
.
|
|
dN1
|
dN2
|
dN3
|
dN4
|
...
|
dNN
|
|
|
(3.1)
|
|
If the distance measure d fulfills requirement
DIS4,
we get dij = dji. As a result,
the gray values in the matrix do not carry additional information. The
blue values in the main diagonal represent comparisons of two equal samples
from the same biometric characteristic. If DIS2 holds,
these values must all be zero. All values above the main diagonal will
be > 0 if conditions DIS1 and DIS3
hold and fi ≠ fj.
If the triangle inequality DIS5 holds, the values of
the first upper secondary diagonal (d12, d23,
d34,..., dN-1,N)
represent the only independent values. All other values are bound
by the triangle inequality like d13 ≤ d12
+ d23, d14 ≤ d13
+ d34, and so on.
Example 3.1: Human Height: Dependence of
distance values using the Euclidean metric
The following example of a feature space
F
=
{h1, h2, h3,
h4, h5}
shows the effect of different height arrangements combined with the Euclidean
metric
| dij ≔ d(hi,hj)
= |hi − hj| |
|
(3.2)
|
|
Of special interest are the restrictions
of the triangle inequality on the distance distributions. As we can see,
the dependencies are only tight since the triangle inequality only puts
an upper bound on the growths of values in direction of the upper right
(resp. low left) corner.
|
|
= |
|
⇒ |
|
d11
|
d12
|
d13
|
d14
|
d15
|
|
d21
|
d22
|
d23
|
d24
|
d25
|
|
d31
|
d32
|
d33
|
d34
|
d35
|
|
d41
|
d42
|
d43
|
d44
|
d45
|
|
d51
|
d52
|
d53
|
d54
|
d55
|
|
= |
|
0
|
15
|
17
|
17
|
35
|
|
15
|
0
|
8
|
8
|
20
|
|
17
|
8
|
0
|
0
|
28
|
|
17
|
8
|
0
|
0
|
28
|
|
35
|
20
|
28
|
28
|
0
|
|
|
|
= |
|
⇒ |
|
d11
|
d12
|
d13
|
d14
|
d15
|
|
d21
|
d22
|
d23
|
d24
|
d25
|
|
d31
|
d32
|
d33
|
d34
|
d35
|
|
d41
|
d42
|
d43
|
d44
|
d45
|
|
d51
|
d52
|
d53
|
d54
|
d55
|
|
= |
|
0
|
15
|
35
|
7
|
0
|
|
15
|
0
|
20
|
8
|
15
|
|
35
|
20
|
0
|
28
|
35
|
|
7
|
8
|
28
|
0
|
7
|
|
0
|
15
|
35
|
7
|
0
|
|
The following example is a worst case, where
all inequalities become equalities. As a result the maximum growth is achieved:
|
|
= |
|
⇒ |
|
d11
|
d12
|
d13
|
d14
|
d15
|
|
d21
|
d22
|
d23
|
d24
|
d25
|
|
d31
|
d32
|
d33
|
d34
|
d35
|
|
d41
|
d42
|
d43
|
d44
|
d45
|
|
d51
|
d52
|
d53
|
d54
|
d55
|
|
= |
|
0
|
10
|
20
|
30
|
40
|
|
10
|
0
|
10
|
20
|
30
|
|
20
|
10
|
0
|
10
|
20
|
|
30
|
20
|
10
|
0
|
10
|
|
40
|
30
|
20
|
10
|
0
|
|
or using the reverse order
|
|
= |
|
⇒ |
|
d11
|
d12
|
d13
|
d14
|
d15
|
|
d21
|
d22
|
d23
|
d24
|
d25
|
|
d31
|
d32
|
d33
|
d34
|
d35
|
|
d41
|
d42
|
d43
|
d44
|
d45
|
|
d51
|
d52
|
d53
|
d54
|
d55
|
|
= |
|
0
|
10
|
20
|
30
|
40
|
|
10
|
0
|
10
|
20
|
30
|
|
20
|
10
|
0
|
10
|
20
|
|
30
|
20
|
10
|
0
|
10
|
|
40
|
30
|
20
|
10
|
0
|
|
In higher dimensional feature spaces equality
is rather an exceptional case. Nevertheless, a type of dependence is introduced
which may lead to the unpleasant effect that statistical significance does
not increase as expected if the results of all cross-compared distances
with i ≠ j are averaged. That is, statistical significance will not increase
1:1 with the total number of comparisons (generally N(N−1) for N features).
In practice, it will rather be ruled by a quantity between N−1 and N(N−1).
That is, it would be better to increase N to N² than to count on depending
N(N−1) cross-comparisons (which is much cheaper, of course). |
Example 3.2: Human height: Absolute Values
Fig. 3.1: Biometric
height features as points in a 1D feature space (F = ℝ⊕)
|
|
One of the most
interesting properties of the space of features are the distributions of
distances between the features. To build a statistics one may start with
a certain feature and determine the distances to its neighbor features.
In an N dimensional feature space, all feature points within an N-dimensional
ball around the actual feature are to be counted. With increasing radius
of the ball, the number of features included may increase faster or slower,
depending on their local distribution. This is similar to stars in universe.
Stars may be crowded in galaxies or the space may be nearly empty over
large distances.
Example 3.3: Human
height: Distances from a gregarious feature
When the one-dimensional (1D) feature space
F
contains N features, 2N distances (with N−1 non-trivial distances) from
one feature to all other features can be calculated. (Non-trivial means:
excluding the zero distance of a feature to itself and assuming the two
back and forth distance values between two different features being equal
according to DIS4.) Depending on which feature is selected
as point of origin, different distance distributions may result. In this
example, a feature amid an aggregation of features is chosen, thus delivering
a lot of small distance values. Figure 3.2 shows the distribution which
represents points in the distance space D ⊂ ℝ⊕.
Fig. 3.2: Distance
distribution of the height features Fig. 3.1 from a gregarious feature
|
|
Example 3.4: Human
height: Distances from a lonely feature
If we start with the same feature space F
as in Example 3.3, but choose a lonely feature
as origin, a completely different distribution of distance values may result
which are far from zero. This is an expression of the fact that lonely
features are easier to separate than features within a crowd and confirms
the practical observation that different features show different recognition
failure behavior.
Fig. 3.3: Distance
distribution of the height features Fig. 3.1 from a lonely feature
|
|
Intuitively, the
closer two features are located, the easier it may be to "confuse" the
two in the presence of errors, e.g., errors produced during the determination
of the feature's location or if features do slightly change. There may
be singular features with large distance to their neighbors and other features
which are densely crowded. In the first case, these may be candidates for
'sheep', i.e., features which are easily recognizable [Doddington,
Sutton].
The distances alone and the number after tightly neighboring features do
not allow a classification due to Doddington's zoo. This situation changes,
if one adds individual error balls around each feature point which characterize
all errors due to non-ideal behavior of the features and their measurement.
This will be considered later.
Fig. 3.4: Biometric
features as points in a 2D feature space
|
If the number of
dimensions of the feature space is larger than the number of feature components,
it is even possible that each feature has the same distance to all of its
neighbors. In this case the geometrics can be described by an equilateral
K-simplex. This is one case of perfect equality between all features. Using
the discrete metric is another method to create equality between all features,
irrespective of dimensionality, but possibly accompanied with other drawbacks.
If the dimension
is small compared to the number of features and the Euclidean metric is
chosen, at least an approximate constant distance to the nearest neighbors
should be possible. In this case the number of features in a K-dimensional
ball increases with the power of K of the ball radius.
3.2 Enrolment - Verification
- Identification
In the ideal case, without any error, enrolment
simply means to store the desired biometric feature, extracted from an
individual's biometric characteristic, as a reference in a database. During
a recognition transaction the reference is compared with the actually measured
feature. If the distance is zero, we assume identity between the subject
enrolled and the subject to be recognized. If more than one reference is
available in the feature space, theoretically two ways for recognition
may be performed. During verification, the index (= address)
of the reference used to distinguish all members of a set is known a priori
so that only this reference is to be compared with the feature of the associated
subject. If this index is not known, an identification is
to be performed, i.e., the feature is to be compared with all references
until a match (i.e., distance = 0) is found. The result of this first method
is either no identification or the index of the identified subject.
Throughout this paper, we approach the
reality by a second way which may meet practice more consequently. Instead
of using the virtual index, a user ID is assigned and stored during enrolment.
This user ID is nothing but an additional feature component belonging to
a subject, however with different usage. For example, user IDs may be public
and usually are unique. This way every verification may be performed at
least partly as an identification: The subject has to present the user
ID as first and the biometric feature as second component. For entering
the user ID, a keyboard or a smartcard may be used among other ways. Then
the user ID is compared with all stored reference user IDs to see if one
matches. If so, in the next step the accompanying biometric part of the
reference is compared with the presented biometric feature component to
verify that the biometric feature components do match,
too. In practice, most verifications are performed this way. The main reason
for this fact is that text strings for user IDs are found and checked much
faster than the biometric part of a feature. For that reason, a verification
is usually much faster than a pure biometric identification where the biometric
part of all reference features is to be compared with a candidate until
a match is found. [Bromba2007]
In the presence of errors, enrolment may
be more sophisticated than simply storing the first available feature sample.
Especially stochastical errors may lead to the circumstance that all captured
features from the same biometric characteristic are more and less different.
As a consequence, the question arises which of the incoming features of
subsequent capturings will deliver the best recognition rates.
This question is treated later.
The most important property of a (digitally
stored) reference feature is its stability over time (unless update strategies
are used to improve the reference with each recognition). While stochastical
errors in new samples of a characteristic will change with each capturing,
the stochastical errors in the reference are stored and thus fixed. For
that reason, optimal enrolment strategies try to find the stochastically
best reference, typically by processing a series of capturings. As a consequence,
enrolment transactions will nearly always take more time than a recognition
transaction. For the feature space
F
of all possible features the consequence is the introduction of the subset
F'
of reference features. Each reference feature should be as near as possible
to the true ("error-free") feature of a certain characteristic. To indicate
that a feature is a reference, we sign it or its index by an apostrophe,
e.g.,
fi' or di'j where
feature no. i' is the reference belonging to characteristic no. i. Generally,
the following relations hold:
where F is the set of all used features.
Only in the ideal case we have F = F'.
4. Biometric Performance
Measures
4.1
Definition of Match and Non-Match
For a biometric recognition system we define
a
match as successful recognition if the distance between two
separately measured features fi, fj
∈ F falls below a certain threshold th ∈ ℝ⊕
such that dij ≔ d(fi,fj)
≤ th. Otherwise we have a rejection, i.e., if dij
> th. In the ideal case, the threshold may be between 0 and the minimum
distance between two features, provided the minimum is greater than 0.
F is the space of all features
being comparable by a specific recognition system as defined above. d is
any distance measure over F with values in D. Subjects si
∈
S
shall have the same index i as the corresponding feature fi
∈ F when defining an individual ⓘi,
see (2.3). Finally, let F' be the space of all
enrolled features and F the space of all features, enrolled subjects
are able to assume.
As bilateral match rate
MRij we define the probability of a successful recognition
when comparing features fi(ω1)
and fj(ω2) where ω1
and ω2 are independent outcomes of the sample space
Ω
[Wikipedia].
| MRij ≡ MR(fi,fj)
≡ P{dij ≔ d(fi(ω1),
fj(ω2))
≤
th | fi(ω1),
fj(ω2)
∈
F; ω1, ω2
∈
Ω} |
|
(4.1)
|
|
In the absence of any errors ("noise"), all
distances are deterministic, i.e., the probabilities are either 1 or 0:
Generally, we have symmetry:
|
MR11
|
MR12
|
MR13
|
MR14
|
...
|
MR1N
|
|
MR21
|
MR22
|
MR23
|
MR24
|
...
|
MR2N
|
|
MR31
|
MR32
|
MR33
|
MR34
|
...
|
MR3N
|
|
MR41
|
MR42
|
MR43
|
MR44
|
...
|
MR4N
|
|
.
|
.
|
.
|
.
|
...
|
.
|
|
MRN1
|
MRN2
|
MRN3
|
MRN4
|
...
|
MRNN
|
|
= |
|
1
|
0
|
0
|
0
|
...
|
0
|
|
0
|
1
|
0
|
0
|
...
|
0
|
|
0
|
0
|
1
|
0
|
...
|
0
|
|
0
|
0
|
0
|
1
|
...
|
0
|
|
.
|
.
|
.
|
.
|
...
|
.
|
|
0
|
0
|
0
|
0
|
...
|
1
|
|
|
(4.5)
|
|
Until now we have considered the term match
rate without interpretation. There are four possibilities to define:
- If two samples of the feature of the
same subject match, this is a correct match
- If two features of two different subjects
match, this is a false match
- If two samples of the feature of the
same subject do not match, this is a false non-match
- If two features of two different subjects
do not match, this is a correct non-match
The terms false and correct
obviously do not mainly depend on the features but on the subjects, i.e.,
the subjects belonging to the features compared. Let us assume that for
individual ⓘm ≔ (sm,f'm)
the reference feature f'm is stored during enrolment.
We have 4 possibilities for comparisons (⇆ means that the corresponding
subject on the left tries a recognition with the stored subject's feature
on the right):
1. (sm,fm)
⇆ (sm,f'm) - fm
∈
F,
f'm ∈
F',
subject
sm tries to get recognized
2. (sm,fn)
⇆ (sm,f'm) - fn
∈
F,
f'm
∈ F', subject sm tries to get not
recognized (hiding)
3. (sn,fm)
⇆ (sm,f'm) - fm
∈
F,
f'm ∈
F',
subject
sn tries to get recognized (counterfeiting, impersonation)
4. (sn,fn)
⇆ (sm,f'm) - fn
∈
F,
f'm
∈ F', subject sn tries to get recognized
(using his genuine feature for the "wrong account")
The following definitions only refer to
the cases 1 and 4!
Using the previous definitions, if i=j,
the bilateral match rate is equivalent to the individual correct
match rate CMRii of characteristic no. i.
If i≠j, an acceptance would be a failure and thus the probability will
be called bilateral false match rate FMRij.
As bilateral non-match
rate NMRij we define the probability of a
failing recognition when comparing features i and j. If i=j, we have a
failure and the bilateral non-match rate is equivalent to the individual
false non-match rate FNMRii of characteristic
i. If i≠j, a non-match is expected and thus will be called bilateral
correct non-match rate CNMRij.
If we take the mean value over the individual
rates of all N comparisons, the global correct match rate
CMR and the global non-match rate FNMR results.
| CMR ≡ |
|
|
N
|
|
|
i=1
|
|
CMRii |
|
(4.8)
|
|
| FNMR ≡ |
|
|
N
|
|
|
i=1
|
|
FNMRii |
|
(4.9)
|
|
It makes no sense to
choose N (in this case the number of features compared) larger than the
number of possible features (|F|). On the other hand, N should
be much larger than the number of subjects enrolled to the recognition
system (|S|) since we are also interested in false matches caused
by the features of "external" subjects:
Typically, the size
N will be chosen differently for the determination of CMR and FNMR. Since
we have defined the match rates as probabilities, noisy features
require a practical estimation. This is done by independently repeating
a comparison as often as reasonable, say K times, and build the ratio between
the number of matches, say M, and the number of trials (K) such that
If we take the mean value with respect to
j over the bilateral rates of all N-1 independent comparisons with different
index i≠j, the individual false match rate FMRi
of characteristic i and the individual correct non-match rate
CNMRi of characteristic i results.
| FMRi ≡ |
|
|
N
|
|
|
j=1,i≠j
|
|
FMRij |
|
(4.12)
|
|
| CNMRi ≡ |
|
|
N
|
|
|
j=1,i≠j
|
|
CNMRij |
|
(4.13)
|
|
The sums can be simplified, if we set CNMRii
= FMRii = 0.
Finally, a global false match rate
FMR and a global correct non-match rate CNMR can be defined
by taking the mean value over all N individual false match and correct
non-match rates.
| FMR ≡ |
|
|
N
|
|
|
i=1
|
|
FMRi |
|
(4.14)
|
|
| CNMR ≡ |
|
|
N
|
|
|
i=1
|
|
CNMRi |
|
(4.15)
|
|
Generally, we have the relationships
|
FNMRii
= 1 − CMRii
|
|
FNMR = 1 −
CMR
|
|
(4.16)
|
|
|
FMRij = 1 −
CNMRij
|
|
FMRi = 1 −
CNMRi
|
|
FMR = 1 − CNMR
|
|
(4.17)
|
|
In the case of stochastical
behavior of a feature, enrolment may have the effect to create a new element
in the feature space. The main property of this element is that it usually
is purely deterministic and thus may have completely different stochastical
properties. As a result, the match rate definition (4.1)
must be modified such that
| MRi'j
≡ MR(fi',fj)
≔ P{di'j ≔ d(fi',
fj(ω))
≤ th | ω ∈ Ω } |
|
(4.18)
|
|
The trick is to pretend
that fi' is deterministic since it does not
change during normal operation of the biometric recognition system. However,
for comparing different systems this may lead to misinterpretations since
every new enrolment may deliver new probabilities MRi'j.
Thus, when estimating the failure rates, this effect must not be forgotten.
A generalization can be given by the following definition:
| MRi'j
≡ MR(fi',fj)
≡ P{di'j ≔ d(fi'(ω1),
fj(ω2))
≤ th | ω1∈Ω1⊂Ω2,
ω2∈Ω2} |
| MRij'
≡ MR(fi,fj')
≡ P{dij' ≔ d(fi(ω1),
fj'(ω2))
≤ th | ω1∈Ω1, ω2∈Ω2⊂Ω1} |
|
(4.19)
|
|
For MRi'j,
(4.1) is covered by the situation that Ω1
= Ω2, while (4.18) matches
the case whereΩ1 only contains one element
of Ω2. Note that the probability MRi'j
itself is stochastic if Ω1
is a random subset of Ω2!
To determine the
failure rates in ordinary biometric recognition systems, the distance between
enrolled reference (i.e., fixed) feature and request feature is measured
as in Example 3.1. This yields a different distance
matrix with di'j ≔ d(fi',fj),
where we mark the reference feature by '. fi'
and fj are the reference
and actually measured features, respectively.
|
d1'1
|
d1'2
|
d1'3
|
d1'4
|
...
|
d1'N
|
|
d2'1
|
d2'2
|
d2'3
|
d2'4
|
...
|
d2'N
|
|
d3'1
|
d3'2
|
d3'3
|
d3'4
|
...
|
d3'N
|
|
d4'1
|
d4'2
|
d4'3
|
d4'4
|
...
|
d4'N
|
|
.
|
.
|
.
|
.
|
...
|
.
|
|
dN'1
|
dN'2
|
dN'3
|
dN'4
|
...
|
dN'N
|
|
|
(4.20)
|
|
In contrast to matrix
(3.1), di'j will generally
only be equal to dj'i (i.e., symmetric) if fi'
=
fi and fj'
= fj. Hence, in
the stochastical case, the distance matrix will not be symmetric. As a
result, the match rate matrix will generally not be symmetric, too:
|
MR1'1
|
MR1'2
|
MR1'3
|
MR1'4
|
...
|
MR1'N
|
|
MR2'1
|
MR2'2
|
MR2'3
|
MR2'4
|
...
|
MR2'N
|
|
MR3'1
|
MR3'2
|
MR3'3
|
MR3'4
|
...
|
MR3'N
|
|
MR4'1
|
MR4'2
|
MR4'3
|
MR4'4
|
...
|
MR4'N
|
|
.
|
.
|
.
|
.
|
...
|
.
|
|
MRN'1
|
MRN'2
|
MRN'3
|
MRN'4
|
...
|
MRN'N
|
|
|
(4.21)
|
|
This is a consequence
of the general case which need not be symmetric in this sense:
| MRji' = MRi'j
≠ MRj'i = MRij' |
|
(4.22)
|
|
To keep our individual
and global failure rate definitions unique, we always assume the first
index to be enrolled (if at all).
In the following examples we show the
impact of non-ideal biometric features regarding properties BCR1
- BCR4.
Example 4.1: Weak uniqueness: Twins
If the biometric characteristic is completely
genotypic [BioFAQ],
monozygotic (identical) twins theoretically may show exactly the same features.
This could especially be the case in DNA investigations. For the failure
rates this may have the following impact. Let SN
be the set of N subjects SN
≔ {s1, s2, s3,
..., sN} and si and sj,
i ≠ j, being monozygotic twins with the same feature f≔fi
= fj and without random errors such that ⓘi
= (si,
f) and ⓘj = (sj,
f).
Then the distance dij ≔
d(fi,fj) = d(f,f)
is equal to zero although a non-zero result would be required for authentication
systems. That is, even with a threshold of th=0, ⓘi
and ⓘj are indistinguishable and we get a false
match case with a bilateral FMRij = FMRji
= 1! All other bilateral false match rates are zero. If N subjects are
expected, the individual false match rate of twin ⓘi
would obviously be FMRi = 1/(N-1). The same holds
true for twin ⓘj. If N approaches large values,
FMRi will approach zero.
To get the global FMR, the mean value
over all N individual false match rates has to be taken. The result is
FMR = 2/(N(N-1)) if there are only two twins (i.e., one twin pair) in the
ensemble. If there are more twins, the result will differ. If we expect
1 average monozygotic twin pair among 243 people
(this seems to be a good estimation to real life [Daugman]),
there are 2N/243 twins in the ensemble which all have an individual FMRi
of 1/(N-1). This results in a global FMR of (2N/243)/(N(N-1)) = 2/(243(N-1)).
That is, for large N, FMR again approaches 0 although not with order
N² as in the single-twin pair case. This is in contrast to the minimum
FMR of 0.82% stated in [Daugman]. |
If the condition of ideal permanence is replaced
by time-dependence this means that when capturing a biometric characteristic
at different time instants even without measurement failures, more and
less slightly different features may result. If so, the biometric recognition
system may fail to recognize the characteristic if the threshold is selected
too low. On the other hand, other characteristics may also vary. This will
have two effects: 1. The minimum threshold should be larger than
the greatest variation of the feature of a certain biometric characteristic
and 2. the minimum distance between features of different characteristics
will increase. If both begin to overlap, no threshold can be chosen anymore
without producing recognition errors. Permanence properties may differ
for different characteristics.
Fig. 4.1: Stochastical
fluctuations of the features in a 2D feature space
|
|
The most apparent impact of non-perfect measurability
is the introduction of errors, usually considered as stochastical errors.
Such errors introduce stochastical position errors of a feature point in
a K-dimensional feature space F, just in the same way as
time-depending features will do. Such errors can be described by a K-dimensional
cloud which may follow, e.g., a K-dimensional Gaussian distribution. Rather
than a point, a feature now should be visualized as a cloud with declining
intensity. The intensity then will be a measure for the probability of
presence of the feature point at a certain location in F.
Such a cloud may overlap with the clouds of other features. This way, no
threshold can be found without producing non-zero recognition errors.
Fig. 4.2: When
averaged over measurements, feature locations may be represented as spatial
distributions
|
Furthermore, the distribution will depend
on the individual characteristic. This will lead to clouds of different
size and shape, as Figure 4.3 indicates. This is the worst case with respect
to examination of failure rates since most failure rate definitions or
at the least the approximation of their statistical significance make idealizing
assumptions which do not hold in reality.
Fig. 4.3: Different
features may show different distributions
|
Compared to weak permanence, weak measurability
does not depend deterministically on time. That is, for weak permanence,
two measurements at the same time instant will exactly show the same result
while for weak measurability this need not be the case. However, a real
biometric characteristic is affected by both shortcomings. Additionally,
measurability may change over time. All combined effects can be described
by a time-varying multi-dimensional joint probability distribution. |
Example 4.4: Human height: Distance errors
between two individuals
Let us assume two individuals with different
heights. The feature space F can be defined as F
= ℝ⊕. The distance
space D, assuming the Euclidean metric, also equals or is a subset of ℝ⊕.
For determining the FNMR, the first characteristic is captured twice. For
determining the FMR, the second one is captured once. The measurements
of the height features h1 and h2
are error-prone and show a continuous joint probability density fh1,h2
and the continuous singular densities fh1 and fh2,
respectively. We assume stochastical independence, such that fh1,h2(x,y)=
fh1(x)fh2(y).
|
|
|
Fig.
4.4: Heights of two individuals with feature densities and mean values
h1,
h2
|
|
We consider two
cases:
1. One
sample each of two different characteristics h1 and
h2 are compared to determine the bilateral FMR12.
2. Two samples
each of the same characteristic, respectively their features h1
and h2, are compared to yield the FNMR.
Both cases assume that
enrolment is repeated with each measurement and does not differ from sampling
for recognition. In the case of different characteristics, the distance
between the two samples is given by the random variable
| d12≔
d(h1,h2) ≔ |h1(ω1)−h2(ω2)|
= |h2(ω2)−h1(ω1)|
= d(h2,h1) |
|
(4.23)
|
|
The bilateral FMR12
is then given by
| FMR12(th) |
≔ P{d12
≔ |h1(ω1)−h2(ω2)|
≤ th | ω1, ω2 ∈ Ω
} |
|
≔ P{d12
= h1(ω1)−h2(ω2)
≤ th | ω1, ω2 ∈ Ω}
− P{d12 = h1(ω1)−h2(ω2)
≤ −th | ω1, ω2 ∈ Ω
} |
|
= ∫]-∞,th]fh1-h2
− ∫]-∞,-th]fh1-h2 |
|
= (∫]-∞,th]
− ∫]-∞,-th])(fh1
∗ f-h2) |
|
= ∫]-th,th](fh1
∗ f-h2) |
|
= ∫]-th,th](fh1
∗ Ⓢfh2) |
|
(4.24)
|
|
where ∗ denotes the
convolution of the two distance density functions fh1
and f-h2 and Ⓢ is the mirror operator
such that Ⓢf(x) ≔ f(-x) for every f and x. As a consequence,
FMR12(0) = 0 and FMR12(∞) =
1. Case 2 is treated in Example 4.5. |
Example
4.4 shows us a possibility how to reduce failure rates. Using the same
threshold th, FMR12 can be minimized, if the densities
fh1 and fh2 of the features h1
and h2 are as narrow as possible and the threshold
is smaller than die distance of the true values of the features.
Example 4.5: Human height: Distance errors
between two measurements of the same individual
In the same way as
shown in Example 4.4, the individual
Correct
Accept Rate CMRi may be determined under the assumption
that two subsequently taken samples h1(ω1)
and h1(ω2) of the
same
characteristic are stochastically independent with probability density
fh1,h1' (x,y) = fh1(x)fh1'(y)
and fh1 = fh1'. This corresponds
to the case where the first sample is taken during enrolment and the second
one for authentication. However, measurements and processing for enrolment
and recognition are exactly equal and are repeated both for each comparison.
Then
| 1 − FNMR11(th)
≕ CMR11(th) |
= P{d11
≔ |h1(ω1) − h1(ω2)|
≤ th | ω1, ω2 ∈ Ω} |
|
= ∫]-th,th](fh1
∗ f-h1) |
|
= ∫]-th,th](fh1∗
Ⓢfh1) |
|
(4.25)
|
|
As a result, FNMR11(0)
= 1 and FNMR11(∞) = 1. A similar
result may be derived for h2, which need not have
the same density as h1. |
Since usually one
of the features stems from enrolment, an optimized enrolment would be able
to reduce the FNMR significantly, if the enrolled feature is nearby the
"true" feature.
Example 4.6: Human height: Enrolment
In Example 4.4
and 4.5, no explicit enrolment was assumed, i.e.,
two arbitrary stochastically independent samples were directly compared
and no special enrolment processing was performed. For each measurement
experiment completely new samples were taken. This is not common practice
in biometric systems. Most biometric systems use one specific initial enrolment
which is retained for every new comparison. To model this procedure we
assume that a single sample of h1,
say h'1, is taken as
reference and use this specific sample for each new comparison. As a consequence,
the reference h'1 must
be regarded as deterministic. Nevertheless, its stable value is arbitrary
and cannot be predetermined before enrolment. As a consequence,
its distribution can be approximated by the density distribution
where δh'1
is a delta distribution with peak at location h'1.
Recycling the results from Example 4.4, we get
| FMR1'2(th)
= ∫]-th,th](δh'1
∗ f-h2) |
|
(4.27)
|
|
where h'1
and h2 stem from different individuals. For the special
case that the mean value E{-h2} expresses as translation
of the density f (example: Gaussian density), we get
| FMR1'2(th)
= ∫]-th,th](δh'1
∗ f-h2) =∫]-th,th]fh'1-h2 |
|
(4.28)
|
|
If we furthermore assume
that h1 and h2 come from the
same individual, i.e., h2 = h1,
the result would be
| 1 − FNMR1'1(th)
=
∫]-th,th](δh'1
∗ f-h1) =∫]-th,th]fh'1-h1 |
|
(4.29)
|
|
From (4.29)
we can conclude at least for Gaussian densities that FNMR1'1
can be minimized if h'1 equals the expectation
of h1 (= E{h1}). |
Example 4.7: Human height: Enrolment: Monozygotic
twins
For monozygotic twins we expect exactly the
same height features and feature densities. From (4.24)
and (4.25) we then can conclude that the bilateral
FMR12(th)
is equal to CMR11(th) ≔ 1−FNMR11(th)
if all measurements are stochastically independent. In the case
of a continuous error density, FMR12(th) will not
only assume the discrete values 0 and 1 as for the error-free situation
but any value between 0 and 1 depending on threshold.
Now let us assume that individual ⓘ1
has been enrolled using a fixed sample h'1. If we
take one sample each from individual ⓘ1 and ⓘ2
to determine FNMR1'1(th) and
FMR1'2(th), we see from Example
4.6 that the relationship FMR1'2(th) = 1−FNMR1'1(th)
remains valid. Although the FMR will change with varying threshold,
FNMR will change in the opposite direction. At least for this case, enrolment
will have no general influence in the case of monozygotic twins!
Generally, if we choose a threshold unequal
0 for the monozygotic twin case, realistic densities yield a FMR smaller
than 1. This may lead to the wrong interpretation that under certain circumstances,
individuals can even be separated if their true features are equal.
From the definition of equal error rate
(EER12 = FMR12(the)
= FNMR1(the), where the
is the unique threshold where equality appears, if possible) we conclude
that EER always is ½. This is indeed the worst case EER for recognition,
see [BioFAQ]. |
Example 4.8: Feature
space with discrete metrics: Passwords as limit case of a behavioral biometric
characteristic
Let S be the set of all possible
and S the subset of enrolled subjects. F shall be
the feature space of all possible passwords of a desired class and F'
⊂
F the subset of all enrolled passwords.
The feature space
F of passwords
is a discrete space which can be equipped here with the discrete
metrics
| δ(hi,hj)
≔ |
|
| 0 |
|
if hi = hj |
|
|
|
|
1
|
|
if hi ≠
hj |
|
|
(4.30)
|
|
Passwords may be unique
or not, depending on enrolment. There are different enrolment methods:
1. Passwords may
be selected randomly by the user and then be passed to the service provider
2. Passwords may
be assigned to the user by the service provider
3. Passwords may
be selected randomly by the user and are then checked for uniqueness by
the service provider
Obviously, method
1 may create multiple occurrences of the same password such that hi
= hj for subjects si ≠
sj, while methods 2 and 3 can simply avoid this situation,
provided the number of possible features |F|
is equal to or larger than the number of enrolled subjects |S|.
In the case of unique
passwords, no failures are introduced by enrolment. However, failures may
occur during identification. There are two types of failures:
-
The authorized subject
(genuine) sn mistypes or confuses the password hn
-
A non-authorized subject
(impostor) tries a random or "stolen" password
If the genuine mistypes
a password, the desired access may fail, producing a false non-match. In
a scenario where the password is the only identifier without user ID, the
false password at the same time may match a foreign one, leading to a false
match with another genuine. That is, a genuine may be an "impostor", too,
even during the same transaction. Obviously, a system without user IDs
which is only based on unique passwords will be more reliable, if the number
of possible subjects is much smaller than the number of possible passwords:
|S| ≕ K ≪ |F|
(where F is the feature space of all possible passwords with
certain properties such as length or character set).
In the hypothetic
error-free case, the failure rates for enrolment situation 2 (passwords
are uniquely assigned to the user by the service provider) would be given
by
| FMRij(th) = 0 if
th < 1, FMRij(th) = 1 if th ≥ 1 |
|
(4.31)
|
|
| FMRi(th) = 0 if
th < 1, FMRi(th) = 1 if th ≥ 1 |
|
(4.32)
|
|
| FMR(th) = 0 if
th < 1, FMR(th) = 1 if th ≥ 1 |
|
(4.33)
|
|
Since D = {0,1}, the best (and only useful)
threshold th here is 0 which is selected from now on. In practice, several
FMR failure types are to be considered, e.g.:
2A. One of |S|
unique passwords becomes known to one impostor
2B. All impostors
select a random password fn ∈ F
to match a certain user ⓘm = (sm,
fm)
where fm ∈ F'
In case 2A, we
denote the impostor as Ⓘnm ≔ (sn,
fm)
with n ≠ m. I.e., subject sn ∉ S is "stealing"
feature fm which actually belongs to subject
sm ∈ S, resp., genuine Ⓖm
≔ ⓘm = (sm, fm).
As a result, the bilateral false match rate FMRij(0)
is easily seen to be 1 if (i = m, j = n, i ≠ j) and 0 else. For the individual
and global false match rates this means
| FMRi(0)
= 1/(N-1) if i = m and 0 else |
|
(4.34)
|
|
where we choose N ≔
|S| = |F'| ≤ |F|.
The resulting failure rates only meet the special case 2A and cannot be
generalized. This is easily seen when trying to understand the FMR definition
in this case. For example, FMRi expects that password
fi
is tested against all other genuine passwords
fj
with i ≠ j. Only if i coincides with the index of the stolen password
(m) a match happens. In this scenario we obviously expect one impostor
with stolen password and all other impostors with their genuine password.
This situation can be compared with the example of non-distinguishable
twins where only one twin pair is present in the whole ensemble S
(Example 4.1). The difference is a factor 2 which
comes from the assumption that the impostor has stolen one password while
the victim has not done the same with the password of the identity thief.
In situation 2B,
randomness is introduced by guessing passwords. The match rates are then
given by the probability that two passwords coincide. Especially,
This probability is
equal to zero if either fi or fj
∉ F'. If both passwords belong to F', it depends
on the way how the service provider has assigned the passwords and what
the impostor knows about it. If we expect a uniform distribution, the probability
is simply given by
FMRij(0)
= p(fi = fj) =
|F'|-1 = 1/K if fi
and fj ∈ F'
FMRij(0)
= 0 else |
|
(4.37)
|
|
To determine the individual
and global failure rates, we have first to determine the number N of comparisons
used for taking the average. If the number of possibilities is finite,
N should be set equal to the number of possibilities which is defined by
|F|. For example, if F is the space of 4-digit
numbers, then N ≔ |F| = 10000. Generally, we get for enrolled
individuals
FMRi(0)
= (1-1/K)/(N-1) if fi ∈ F' and
FMRi(0)
= 0 else |
|
(4.38)
|
|
If K = N, the well-known
result FMRi(0) = 1/N is given.
If K = N, again FMR(0)
= 1/N results.
Refinements of this
consideration encompass cases where attacking passwords are not selected
purely random but with prior knowledge about the probability of being used
or not. (We had assumed a uniform distribution.)
|
Example 4.9: Simple feature space with stochastical
features: Hat
Let us assume two individuals ⓘ1
and ⓘ2 with the following
properties: ⓘ1 carries
a hat with 40% probability and ⓘ2
carries a hat with 30% probability. This may be interpreted as a kind of
behavioral biometric characteristic. We try to distinguish these two individuals
solely by the feature components "carrying a hat" and "carrying no hat".
Furthermore we use the discrete metric and consider all four enrolment
states for the reference features f1' and f2'
| |
f1' |
f2'
|
Reference Distance
d(f1',f2') |
|
E1
|
NH
|
NH
|
0
|
|
E2
|
H
|
NH
|
1
|
|
E3
|
NH
|
H
|
1
|
|
E4
|
H
|
H
|
0
|
The two individuals share both features
with different probabilities which are represented by two points in a one
dimensional feature space F which contains two values: "hat" and
"no hat".
| |
f1' |
f2'
|
p1'1(0) |
p1'2(0) |
p2'1(0) |
p2'2(0) |
p1'1(1) |
p1'2(1) |
p2'1(1) |
p2'2(1) |
|
E1
|
NH
|
NH
|
0.6
|
0.7
|
0.6
|
0.7
|
0.4
|
0.3
|
0.4
|
0.3
|
|
E2
|
H
|
NH
|
0.4
|
0.3
|
0.6
|
0.7
|
0.6
|
0.7
|
0.4
|
0.3
|
|
E3
|
NH
|
H
|
0.6
|
0.7
|
0.4
|
0.3
|
0.4
|
0.3
|
0.6
|
0.7
|
|
E4
|
H
|
H
|
0.4
|
0.3
|
0.4
|
0.3
|
0.6
|
0.7
|
0.6
|
0.7
|
| |
f1' |
f2'
|
pG(0) |
pG(1) |
pI(0) |
pI(1) |
FNMR(0) |
FNMR(1) |
FMR(0) |
FMR(1) |
|
E1
|
NH
|
NH
|
0.65
|
0.35
|
0.65
|
0.35
|
0.35
|
0
|
0.65
|
1
|
|
E2
|
H
|
NH
|
0.55
|
0.45
|
0.45
|
0.55
|
0.45
|
0
|
0.45
|
1
|
|
E3
|
NH
|
H
|
0.45 |
0.55 |
0.55 |
0.45 |
0.55
|
0
|
0.55
|
1
|
|
E4
|
H
|
H
|
0.35 |
0.65 |
0.35 |
0.65 |
0.65
|
0
|
0.35
|
1
|
This example also shows the large influence
of enrolment. Intuitively, E2 delivers the best enrolment when regarding
the failure rates. From the reference distances in cases E1 and
E4,
no selectivity is to be expected because both subjects are enrolled with
the same feature vector.
E2 tries to consider the fact that
ⓘ1
is more often met with hat than ⓘ2.
E3
is something like the converse of E2. |
Example 4.10: Simple feature space with stochastical
features: Jacket
Let us assume two individuals ⓘ1
and ⓘ2 with the following
properties: ⓘ1 carries
a jacket with 10% probability and ⓘ2
carries a jacket with 80% probability. This may be interpreted as a kind
of behavioral biometric characteristic. We try to distinguish these two
individuals solely by the feature components "carrying a jacket" and "carrying
no jacket". Furthermore we use the discrete metric and consider all four
enrolment states for the reference features f1'
and f2'
| |
f1' |
f2'
|
Reference Distance d(f1',f2') |
|
E1
|
NJ
|
NJ
|
0
|
|
E2
|
J
|
NJ
|
1
|
|
E3
|
NJ
|
J
|
1
|
|
E4
|
J
|
J
|
0
|
The two individuals share both features
with different probabilities which are represented by two points in a one
dimensional feature space F which contains two values: "jacket"
and "no jacket".
| |
f1' |
f2'
|
p1'1(0) |
p1'2(0) |
p2'1(0) |
p2'2(0) |
p1'1(1) |
p1'2(1) |
p2'1(1) |
p2'2(1) |
|
E1
|
NJ
|
NJ
|
0.9
|
0.2
|
0.9
|
0.2
|
0.1
|
0.8
|
0.1
|
0.8
|
|
E2
|
J
|
NJ
|
0.1
|
0.8
|
0.9
|
0.2
|
0.9
|
0.2
|
0.1
|
0.8
|
|
E3
|
NJ
|
J
|
0.9
|
0.2
|
0.1
|
0.8
|
0.1
|
0.8
|
0.9
|
0.2
|
|
E4
|
J
|
J
|
0.1
|
0.8
|
0.1
|
0.8
|
0.9
|
0.2
|
0.9
|
0.2
|
| |
f1' |
f2'
|
pG(0) |
pG(1) |
pI(0) |
pI(1) |
FNMR(0) |
FNMR(1) |
FMR(0) |
FMR(1) |
|
E1
|
NJ
|
NJ
|
0.55
|
0.45
|
0.55
|
0.45
|
0.45
|
0
|
0.55
|
1
|
|
E2
|
J
|
NJ
|
0.15
|
0.85
|
0.85
|
0.15
|
0.85
|
0
|
0.85
|
1
|
|
E3
|
NJ
|
J
|
0.85 |
0.15 |
0.15 |
0.85 |
0.15
|
0
|
0.15
|
1
|
|
E4
|
J
|
J
|
0.45 |
0.55 |
0.45 |
0.55 |
0.55
|
0
|
0.45
|
1
|
This example also shows the large influence
of enrolment. Intuitively, E3 delivers the best enrolment when regarding
the failure rates. From the reference distances in cases E1 and
E4,
no selectivity is to be expected because both subjects are enrolled with
the same feature vector.
E3 tries to consider the fact that
ⓘ1
is less often met with a jacket than ⓘ2.
E3
is something like the converse of E2. |
4.2 Separability
In the ideal case without errors, separability
is perfect, i.e., each characteristic from different subjects maps to a
unique point in the feature space F. In the presence of noise, the
features of all subject may vary such that their possible location can
only stochastically be described by something like a multidimensional cloud
which is represented by the multidimensional probability density. If these
densities overlap, failures may be introduced and separability may suffer.
Since multidimensional probability density
functions are more difficult to handle, we focus on the one-dimensional
densities of the distance between two features. The probability density
function of the distances depend on the character of the noise, the distance
measure, the number of features, and the feature dimension.
We start with the case of finite many
different distances in F. For example, when using the discrete metric,
only the values 0 and 1 can appear. For each of these values their frequency
can be determined either by knowledge of the stochastical properties or
by measurements.
Let d: F x F ↦ DN
⊂ ℝ⊕ be a distance measure and DN
be a discrete space with N different positive real numbers. Then the bilateral
distance density functions are defined by
| pi'j(x)
≡ P{ω1, ω2 ∈ Ω
| di'j ≔ d(fi'(ω1),
fj(ω2))
= x ∈ DN} |
|
(4.40)
|
|
If i'=j, pi'j defines
the bilateral distance density function for the feature outcomes
of the same characteristic. The case i'≠j represents the probabilities
between the two features i' (reference) and j (request) of disjoint characteristics.
From the bilateral densities the global genuine and global
impostor distance densities can be derived:
| pI(x) ≡ |
|
|
N
|
|
|
i=1
|
|
| |
|
N
|
|
|
j=1
|
| j≠i |
|
pi'j(x) |
|
(4.41)
|
|
| pG(x) ≡ |
|
|
N
|
|
|
i=1
|
|
pi'i(x) |
|
(4.42)
|
|
Generally, the failure rates can be calculated
from these densities. For the global FNMR and FMR we get
|
FNMR(th)
|
= 1 − |
|
|
|
|
|
|
x∈DN
|
|
x≤th
|
|
pG(x) |
|
with |
|
|
|
|
|
x∈DN
|
|
pG(x)
= 1 |
|
(4.43)
|
|
and
|
FMR(th)
|
= |
|
|
|
|
|
|
x∈DN
|
|
x≤th
|
|
pI(x) |
|
with |
|
|
|
|
|
x∈DN
|
|
pI(x)
= 1 |
|
(4.44)
|
|
The bilateral and individual rates can be
defined correspondingly. If D is a continuous set D = [0,K] with K ∈
ℝ⊕, such that pG,
pI,
FNMR,
FMR
are functions D → [0,1], the sums have to be replaced by integrals:
|
FNMR(th)
|
D=
1 − ∫[0,th] |
pG |
|
with |
D∫DpG
= 1 |
|
and |
|
th ∈ D |
|
(4.45)
|
|
and
|
FMR(th)
|
D=
∫[0,th] |
pI |
|
with |
D∫DpI
= 1 |
|
and |
|
th ∈ D |
|
(4.46)
|
|
The two FNMR(th) and
FMR(th) curves can be combined into one curve when using the threshold
variable th as common implicit parameter. A plot of FNMR against FMR is
usually called Detection Error Tradeoff (DET) curve. If CMR is plotted
against FMR, the diagram is usually called Receiver Operating Characteristic
[Wikipedia].
This term has originally been created in radar signal detection theory.
The advantage of the DET is that it shows a global view of the performance
of a recognition system without scaling effects of the threshold variable.
An important feature
of the DET is its monotonicity which is a direct consequence of the monotonicity
of FMR and FNMR and the positivity of the integrated probabilities:
| x ≥ y ⇔
FNMR(x) ≥ FNMR(y) |
|
(4.47)
|
|
| FMR(x) ≤ FMR(y)
⇒x ≥ y ⇒ FNMR(x) ≥ FNMR(y) |
|
(4.49)
|
|
That is, the DET curve
decreases monotonically.
4.2.1
Separability measures
Since FNMR and FMR monotonically depend on
a threshold parameter, different requirements can be met by changing the
threshold. In most applications, e.g., it is more important to have a small
FMR than a small FNMR. To describe the system behavior, always the two
failure rates are to be considered. In certain cases, however, one looks
for a single number which describes some global performance independent
of the threshold. The DET already delivers a tool to judge the performance
globally. If one of two DET curves is lower than the other one nearly everywhere,
it can be claimed that for the corresponding system separability is always
better - provided no FTE and FTA failures occur. In practice, DET curves
of different systems tend to intersect anywhere so that other comparison
measures are to be used.
From the DET curve two such measures can
be defined: The Equal Error Rate (EER) and the area
under the curve which has been called ROCA in [BioFAQ].
Within the DET where FNMR is plotted as "function" of FMR, EER is defined
by:
If FNMR and FMR are both functions of the
threshold th, the definition of EER is given by
thEER is the point where
FMR and FNMR intersect. Unfortunately, this point usually will not exist
in every case when D is a discrete set. In such cases, linear interpolation
might help:
| EERint
= |
|
FNMR−FMR+
− FNMR+ FMR−
|
|
|
FNMR−
+ FMR+ − FNMR+ − FMR−
|
|
|
(4.52)
|
|
Here, the superscript + denotes the values
at the right side of the intersection and the superscript - the values
at the left side.
Since the DET curve completely eliminates
the threshold parameter, notation could be simplified in the discrete case
without changing the curve. This can be achieved by replacing the set of
distances DK by an index set comprising only natural
numbers without gaps. This would enable us to use a simple sum notation
even in the case where the discrete distance values are discrete real numbers
with unequal distances. Furthermore, we introduce an negative threshold
value for symmetry reasons. Although this value is impossible and thus
virtual, it will help us to simplify the discrete case formulas and to
show the relationship to the continuous case if we define the output of
a negative threshold in such a way that it approximates the continuous
case most naturally.
This can be attained when defining
the match rate MRij : D ⋃ {-ε} → [0,1] by
| MRij(-ε)
≡
0 for every ε > 0 and any feature combination in F |
|
(4.53)
|
|
As a result we get
| pI(-ε)
= 0 |
| pG(-ε)
= 0 |
| FMR(-ε)
= 0 |
| FNMR(-ε)
= 1 |
|
(4.54)
|
|
The justification comes
from the fact that the continuous case confirms the formulas as limit case
for threshold 0. Generally, the value of a threshold does in no way contribute
to separability. In fact, only the values of FNMR and FMR at a certain
threshold are of practical importance. The statement "The systems shows
an FMR of 0.0001% at an FMR of 10%" is completely describing separability
in one operating point. Threshold values are only needed for the operator
of a specific system to adjust the operating point. However, the threshold
value will also depend on the distance measure and thus differ between
systems of different origin. Hence, every bijective map between different
threshold scalings should be allowed without changing appropriate separability
measures. This especially affects transforms between scores and distances
for which the relation holds: the higher the score, the lower the distance
(and vice versa).
Occasionally, we
will not plot the distance against the values of D. Instead, we use partly
a bijective map between D and an index set I which otherwise shows all
features of D. In the discrete case, the index set usually will be a set
of natural numbers {0,1,2,3, ... ,K} such that 0 represents the virtual
threshold -ε and the other numbers from 1
to K represent the K different distance values in DK
in increasing order, e.g.,
| {-ε}
∪ DK ≔ {-ε} ∪
{0, α2, α3, α4,,,αK}
→ IK ≡ {0, 1, 2, 3,,,K} |
|
(4.55)
|
|
where 0 < α1
< α2 < α3 <<<<
αK and -ε ↦ 0, 0
↦ 1, α2 ↦ 2, α3 ↦ 3,,,,
αK ↦ K. We will use this definition throughout
the paper. The corresponding functions will be underlined to indicate
that they are different:
| pI({-ε}
∪ DK) ≡ pI(IK) |
| pG({-ε}
∪ DK) ≡ pG(IK) |
| FMR({-ε}
∪ DK) ≡ FMR(IK) |
| FNMR({-ε}
∪ DK) ≡ FNMR(IK) |
|
(4.56)
|
|
resulting in
| pI(0)
= pI(-ε) = 0 |
| pG(0)
= pG(-ε) = 0 |
| FMR(0) =
FMR(-ε) = 0 |
| FNMR(0)
= FNMR(-ε) = 1 |
|
(4.57)
|
|
with K being the smallest
threshold such that
With these conventions,
Roca (≡ ROCA) can be written for discrete
D as
|
Roca
|
≡ |
|
K
|
|
|
n=1
|
|
FNMR(n-1)pI(n) |
|
with FNMR(0)
≡
1 and FNMR(K) = 0 |
|
(4.59)
|
|
All other functions affected will be redefined
accordingly:
|
FNMR(th)
|
= 1 − |
|
|
|
|
|
|
x∈DK
|
|
x≤th
|
|
pG(x) |
|
= |
FNMR(n) = 1 − |
|
n
|
|
|
k=1
|
|
pG(k) |
|
(4.60)
|
|
FMR(th)
|
= |
|
|
|
|
|
|
x∈DN
|
|
x≤th
|
|
pI(x) |
|
= |
FMR(n) = |
|
n
|
|
|
k=1
|
|
pI(k) |
|
(4.61)
|
|
Independent of the use of direct thresholds
or threshold indices, the DET curve will always be
the same! From (4.61) and (4.60)
it can be deduced that
| pI(n)
= FMR(n) - FMR(n-1) |
|
(4.62)
|
|
and
| pG(n)
= FNMR(n-1) - FNMR(n) |
|
(4.63)
|
|
In the continuous case
there is no significant difference between the two notations, except perhaps
if one chooses the common interval [0,1] with a bijective map D → I1
≔ [0,1[. The interval may be half open in those cases where D = ℝ⊕.
Equations (4.62) and (4.63) are
written as
in the continuous case
where ' denotes the derivative. The continuous Roca is given by
|
Roca
|
≡ |
I∫I1FNMR
⋅ pI |
|
with FNMR(0)
= 1 and FNMR(1) ≔ 0 |
|
(4.66)
|
|
Another measure for
separability can be derived when using linear interpolation instead
of calculating the ROC area by rectangles. We will call this measure DET
area or shortly Deta. The desired interpolation is achieved
when replacing FNMR(n-1) in (4.59)
by the mean value of neighboring FNMR values, i.e., by ½(FNMR(n)
+
FNMR(n-1)):
| Deta |
≡ ½ |
|
K
|
|
|
k=1
|
|
(FNMR(k)+FNMR(k-1))pI(k) |
|
with FNMR(0)
≡ 1 and FNMR(K) = 0 |
|
(4.67)
|
|
As a result, Deta delivers the same value
in all cases where all the points in a DET curve lie on a straight line
such as in the trivial case of non-separability. In all other cases, Deta
delivers more comparable values for differing number of distance points.
If D is continuous, Deta is always equal to Roca (4.66).
For that reason, no extra definition is required in the continuous case.
Example 4.11: DET formulas for the continuos
case DK = [0,K]
All proofs: Integration
by parts
| ∫DKFNMRn
⋅ pG = ∫DKFMRn
⋅ pI = |
|
|
(4.68)
|
|
| ∫DKFNMR²
⋅ pI = 2∫DKFNMR
⋅ FMR ⋅ pG |
|
(4.69)
|
|
| ∫DKFNMR²
⋅ pG = 2∫DKFNMR
⋅ FMR ⋅ pI |
|
(4.70)
|
|
| Roca = Deta = ∫DKFNMR
⋅ pI = ∫DKFMR
⋅ pG |
|
(4.71)
|
|
| ∫DKFNMR²
⋅ FMR ⋅ pI = ∫DKFNMR
⋅ FMR² ⋅ pG |
|
(4.72)
|
|
"⋅" denotes the pointwise
multiplication between functions. |
The relationship between EER and Deta is
best given by the inequalities
| EER² < Deta < (2 −
EER)EER |
|
(4.77)
|
|
This can be shown by constructing the two
worst case DET curves which are compliant with monotonicity and which meet
the same EER. As a result, the smallest area DET curve belongs to an area
of EER² and the largest area DET curve to an area of (2-EER)EER. If the
DET curve even does not exceed the straight lines between (FMR,FNMR) =
(0,1) and EER and between EER and (FMR,FNMR) = (1,0), an even tighter approximation
is possible:
A further measure which
is able to describe separability via the sum or mean of errors, namely
In the case of ideal
separability, both SOE and MOE are zero in at least one point. Ideal Non-separability
is characterized by SOE(x) = 1 and MOE(x) = ½, respectively, for every
x ∈ D.
In the continuous
case, it can be shown that a minimum of SOE or MOE at a point min implies
pI(min) = pG(min). The reverse
is not true. Especially, there may be multiple local minima. This behavior
is unknown to the EER which is unique in value but may be spread over a
neighboring subset of
D. The advantage of SOE and MOE however is
their existence for all distances x ∈ D. If there is a global minimum,
this minimum ("MSOE" and "MMOE") will be a global performance or separability
measure like Deta.
| MSOE ≡ min {SOE(x)
| x ∈ D} |
|
(4.81)
|
|
| MMOE ≡ min {MOE(x)
| x ∈ D} |
|
(4.82)
|
|
Using the same worst
case DET curves as for (4.77) we get similar
inequalities for MOE:
In those cases where
MMOE = EER, a further refinement of inequality (4.78)
is possible:
| MMOE = EER ⇒
2EER² < Deta ≤ EER |
|
(4.84)
|
|
In the case of perfect
non-separability, we have the functional equation FNMR = 1 - FMR
where 1 denotes the identity function 1(x) = 1 for every
x. As a result we get
| FNMR = 1
− FMR ⇒ SOE = 1 ⇒
MSOE = 1 |
|
(4.85)
|
|
| FNMR = 1
− FMR ⇒ MOE = ½1 ⇒
MMOE = ½ |
|
(4.86)
|
|
If the equal error
rate point xEER exits, the MOE at this point is equal
to EER:
| FNMR(xEER)
= FMR(xEER) = EER ⇒ MOE(xEER)
= EER |
|
(4.87)
|
|
Example 4.13: Ideal non-separability
To see how the different separability measures
behave, we consider the case of complete non-separability which is defined
by the situation where the genuine and impostor densities completely overlap:
pg = pI. If the densities are
both continuous, which allows us to replace the sums by integrals,
the results are easily seen to be:
| Continuous case: EER = Roca = Deta =
MMOE = ½ |
|
(4.88)
|
|
since FNMR = 1 −
FMR in the case of ideal non-separability (Example
4.7). The situation changes when D is a discrete finite set of values.
Then no closed curve can be drawn, the DET curve is given as a sequence
of points.
|
|
|
Fig. 4.5: Detection
Error Tradeoff curve with discrete index
|
|
|
|
|
Fig. 4.6: Discrete
FNMR
and
FMR versus threshold index
|
|
In the cases where continuous functions
are sampled, the ROC area (Roca) defined in [BioFAQ]
is only an approximation to the continuous case since it defines the area
as sum of rectangles. If the DET diagram only comprises a finite set of
points, the area may be "overestimated", i.e., the area is greater than
½. Let N+1 be the number of points including FNMR=1. If we assume that
the FNMR/FMR points are equidistant, the ROC area (4.59)
can be calculated as
This expression belongs to the distance densities
pg(n) = pI(n) = 1/N for every
n ∈ D. As we can see, Roca approaches the desired value ½ slowly and
never reaches it, while the (interpolated) EER always assumes the value
½. The following diagram (Figure 4.7) shows two DET curves with the same
Roca and the same Deta. If calculating Roca of the sampled DET points on
the blue curve which represents non-separability, see Fig. 4.7, its area
is exactly equal to the area under the red curve which comes from Fig.
4.8 and which does not represent exact non-separability. That is, we have
the same Roca for significantly different areas. This is a result of disregarding
the sampling theorem [Wikipedia]
for continuous curves. The blue curve stands for exact non-separability
in the continuous case. The corresponding area is exactly what we expect:
½. The EER is ½ in all three cases although the red curve has deserved
a value larger than ½. Only Deta behaves "reasonable" in all three cases
since Deta matches the continuous case exactly if the continuous curve
connects the sample points by straight lines.
|
|
|
Fig. 4.7: Detection
Error Tradeoff curve with discrete D
|
|
We get
|
Blue line cont. |
Blue line sampled |
Red line cont. |
Red line sampled |
| EER |
0.5
|
0.5
|
0.5
|
0.5
|
| Roca |
0.5
|
0.55
|
0.55
|
0.55
|
| Deta |
0.5
|
0.5
|
0.55
|
0.55
|
| MMOE |
0.5
|
0.5
|
0.5
|
0.5
|
Only bold values are in accordance with
intuition. Since natural curves usually cannot be represented by piecewise
straight lines, artifacts for all three separability measures
can be constructed.
|
|
|
Fig. 4.8: Discrete
FNMR
and
FMR versus threshold index (constructed example)
|
|
In this constructed example we can observe
the rare case where Roca and Deta coincide in the discrete case. According
to (4.75) from Example 4.12,
this would require the product of all density values being zero. This is
indeed the case since
pG
and pI are zero
where the other one is nonzero and vice versa. |
Example 4.14: Simple feature space with stochastical
features - cont'd
If we calculate EER, Roca, and Deta from
Example
4.9, we see that EER in the nonseparable case E1 and E4 can only be
calculated by interpolation. Linear interpolation indeed delivers the expected
values ½. For E2 and E3, EER delivers the only reasonable values. However,
Roca is larger than ½ in all cases although it confirms the tendency of
EER that smaller values yield better systems.
| |
f1' |
f2'
|
pG(0) |
pG(1) |
pI(0) |
pI(1) |
FNMR(0) |
FNMR(1) |
FMR(0) |
FMR(1) |
EER |
Roca |
Deta |
|
E1
|
NH
|
NH
|
0.65
|
0.35
|
0.65
|
0.35
|
0.35
|
0
|
0.65
|
1
|
(0.5) |
0.7725 |
0.5 |
|
E2
|
H
|
NH
|
0.55
|
0.45
|
0.45
|
0.55
|
0.45
|
0
|
0.45
|
1
|
0.45 |
0.6975 |
0.45 |
|
E3
|
NH
|
H
|
0.45 |
0.55 |
0.55 |
0.45 |
0.55
|
0
|
0.55
|
1
|
0.55 |
0.7975 |
0.55 |
|
E4
|
H
|
H
|
0.35 |
0.65 |
0.35 |
0.65 |
0.65
|
0
|
0.35
|
1
|
(0.5) |
0.7725 |
0.5 |
|
|
|
Fig. 4.9: Discrete
FNMR
and
FMR versus threshold index (hat)
|
|
|
|
|
Fig. 4.10: Detection
Error Tradeoff curve (hat)
|
|
| |
f1' |
f2'
|
pG(0) |
pG(1) |
pI(0) |
pI(1) |
FNMR(0) |
FNMR(1) |
FMR(0) |
FMR(1) |
EER |
Roca |
Deta |
|
E1
|
NJ
|
NJ
|
0.55
|
0.45
|
0.55
|
0.45
|
0.45
|
0
|
0.55
|
1
|
(0.5) |
0.7525 |
0.5 |
|
E2
|
J
|
NJ
|
0.15
|
0.85
|
0.85
|
0.15
|
0.85
|
0
|
0.85
|
1
|
0.85 |
0.9775 |
0.85 |
|
E3
|
NJ
|
J
|
0.85 |
0.15 |
0.15 |
0.85 |
0.15
|
0
|
0.15
|
1
|
0.15 |
0.2775 |
0.15 |
|
E4
|
J
|
J
|
0.45 |
0.55 |
0.45 |
0.55 |
0.55
|
0
|
0.45
|
1
|
(0.5) |
0.7525 |
0.5 |
|
|
|
Fig. 4.11: Discrete
FNMR
and
FMR versus threshold index (jacket)
|
|
|
|
|
Fig. 4.12: Detection
Error Tradeoff curve (jacket)
|
|
The reason for this unwanted behavior
which may prevent comparability between different systems is the discrete
nature, i.e., the difference between sums and integrals. The situation
can be mitigated, although not completely solved by using Deta instead
of Roca. At least in this example, EER and Deta deliver quite satisfying
values. |
Example 4.15: Simple feature space with stochastical
features and different enrolments (generalization)
Let us assume two individuals ⓘ1
and ⓘ2 with the following properties: ⓘ1
shows a feature with probability ψ1 and ⓘ2
carries the same feature with probability ψ2. We
try to distinguish these two individuals solely by the availability of
the feature. Furthermore we use the discrete metric and consider all four
enrolment states for the reference features
f1'
and f2'.
| |
f1' |
f2'
|
Reference Distance d(f1',f2') |
|
E1
|
NA
|
NA
|
0
|
|
E2
|
A
|
NA
|
1
|
|
E3
|
NA
|
A
|
1
|
|
E4
|
A
|
A
|
0
|
|
|
(4.90)
|
|
The two individuals share the same feature
with different probabilities which are represented by two points in a one
dimensional feature space
which contains the values: A = "feature present"
and NA = "feature not present". The set of distances and indices are then
given by
With this knowledge, the probabilities for
equality and inequality can be determined straightforward:
| |
f1' |
f2'
|
p1'1(1) |
p1'2(1) |
p2'1(1) |
p2'2(1) |
p1'1(2) |
p1'2(2) |
p2'1(2) |
p2'2(2) |
|
E1
|
NA
|
NA
|
1-ψ1
|
1-ψ2
|
1-ψ1
|
1-ψ2
|
ψ1
|
ψ2
|
ψ1
|
ψ2
|
|
E2
|
A
|
NA
|
ψ1
|
ψ2
|
1-ψ1
|
1-ψ2
|
1-ψ1
|
1-ψ2
|
ψ1
|
ψ2
|
|
E3
|
NA
|
A
|
1-ψ1
|
1-ψ2
|
ψ1
|
ψ2
|
ψ1
|
ψ2
|
1-ψ1
|
1-ψ2
|
|
E4
|
A
|
A
|
ψ1
|
ψ2
|
ψ1
|
ψ2
|
1-ψ1
|
1-ψ2
|
1-ψ1
|
1-ψ2
|
The pi'j(0)
are 0 by definition. Correspondingly, pi'j(1)
= pi'j(0) and pi'j(2)
= pi'j(1), see (4.55).
|
|
f1'
|
f2'
|
pG(1)
|
pG(2)
|
pI(1)
|
pI(2)
|
FNMR(0)
|
FNMR(1)
|
FNMR(2)
|
FMR(0)
|
FMR(1)
|
FMR(2)
|
|
E1
|
NA
|
NA
|
1-(ψ1+ψ2)/2
|
(ψ1+ψ2)/2
|
1-(ψ1+ψ2)/2
|
(ψ1+ψ2)/2
|
1
|
(ψ1+ψ2)/2
|
0
|
0
|
1-(ψ1+ψ2)/2
|
1
|
|
E2
|
A
|
NA
|
(1+ψ1-ψ2)/2
|
(1-ψ1+ψ2)/2
|
(1-ψ1+ψ2)/2
|
(1+ψ1-ψ2)/2
|
1
|
(1-ψ1+ψ2)/2
|
0
|
0
|
(1-ψ1+ψ2)/2
|
1
|
|
E3
|
NA
|
A
|
(1-ψ1+ψ2)/2
|
(1+ψ1-ψ2)/2
|
(1+ψ1-ψ2)/2
|
(1-ψ1+ψ2)/2
|
1
|
(1+ψ1-ψ2)/2
|
0
|
0
|
(1+ψ1-ψ2)/2
|
1
|
|
E4
|
A
|
A
|
(ψ1+ψ2)/2
|
1-(ψ1+ψ2)/2
|
(ψ1+ψ2)/2
|
1-(ψ1+ψ2)/2
|
1
|
1-(ψ1+ψ2)/2
|
0
|
0
|
(ψ1+ψ2)/2
|
1
|
This example also
shows the large influence of enrolment. The red
entries belong to a reference distance of zero. As a result, no selectivity
can be expected. This is clearly shown by the fact that pI
= pG which is
characteristic for the case of ideal non-separability (Example
4.13). We get the equal error rates:
| EER(E2)
=
(1 − ψ1 + ψ2)/2 |
|
(4.93)
|
|
| EER(E3)
=
(1 + ψ1 − ψ2)/2 |
|
(4.94)
|
|
Since it is our interest
to keep the EER below ½, the following restrictions hold for the corresponding
enrolment:
These are the conditions for the optimum
enrolment. If we assume, without loss of generality, that E3
is the optimum enrolment, then the conditions for ideal separability are
| EER = 0 ⇔ ψ1=
0 and ψ2 = 1 |
|
(4.97)
|
|
If ψ2 = 1 − ψ1,
ERR simplifies to
The other separability measures are seen
to be (E3)
|
4.2.2
Weak Universality
Not every biometric
characteristic will be available for each individual all the time. In such
cases, no feature can be extracted and comparisons cannot take place. This
leads to a new kind of error, called Failure to Acquire (FTA) [BioFAQ].
FTAs cannot be represented directly in the feature space.
4.3
Intra- Versus Inter-Characteristic Failure Rates
The preceding inter-characteristic approach
to define biometric failure rates is completely different from the intra-characteristic
approach in [Bromba2009].
In [Bromba2009] probabilities
for the comparison of two samples from the same characteristic and from
two different characteristics are considered. Here, we define the failure
rates as mean values of the results of N² comparisons in a feature space
with N features. In the first instance, assuming ideal biometric characteristics,
the comparison results are deterministic and no failures occur. Only if
we introduce a threshold, deterministic failures may be produced if the
distance between two different features becomes smaller than the threshold.
Depending on the biometric modality, features
may be distributed randomly or according to certain rule - at least theoretically.
In the case of random distributions as suggested
in Fig. 3.4, the distances
will also behave random. Otherwise it should be possible to consider only
a small number of feature pairs to evaluate the whole failure behavior,
e.g., when the features are arranged in a regular grid.
In the intra-characteristic approach,
only 1 pair of characteristics (same or different) is considered, well
knowing that in practice the result is only valid for this specific pair
at a certain time instant. From this view, the bilateral failure rates
defined above correspond to the intra-characteristic failure rates. Thus,
the natural extension would be to combine bilateral failure rates according
to the definition above to get individual and global failure rates. For
illustration, Figs. 3.4 and 4.3
may help. In Fig. 3.4, the features are distributed
randomly. Each distance is more and less different and follows no rule
except for the triangle inequality. This case refers to inter-characteristic
variations. In Fig. 4.2, if we only consider one feature
pair, there is a nonzero fluctuation of the distance for each measurement.
This is the intra-characteristic case.
When using the score value statistics,
both cases (intra- and inter-characteristic) cannot be separated in practice.
However, by repeated measurements of the same features, mean values can
be built such that the sum statistics (intra- and inter-characteristic)
can be reduced to an inter-characteristic statistics. This has been shown
in [Bromba2001, Bromba2002]
for fingerprint. This way, the two cases can be separated under certain
assumptions.
Example 4.16: Equidistant 1D features: Inter-characteristic
FMR
Let us assume N+1 one-dimensional features
such as human height but with constant Euclidean distance 1 between neighboring
features, such that h0 = 0, h1=1,
..., hN=N. Then the (N+1)x(N+1) distance matrix with
the bilateral distances dij is given by
|
|
= |
|
⇒ |
|
d00
|
d01
|
d02
|
...
|
d0N
|
|
d10
|
d11
|
d12
|
...
|
d1N
|
|
d20
|
d32
|
d22
|
...
|
d2N
|
|
.
|
.
|
.
|
...
|
.
|
|
dN0
|
dN1
|
dN2
|
...
|
dNN
|
|
= |
|
0
|
1
|
2
|
...
|
N
|
|
1
|
0
|
1
|
...
|
N-1
|
|
2
|
1
|
0
|
...
|
N-2
|
|
.
|
.
|
.
|
...
|
.
|
|
N
|
N-1
|
N-2
|
...
|
0
|
|
Now, in analogy to the failure rates,
we may calculate the individual and global "distances" di
and d, where we average over all available distances
| di
≔ |
|
|
N
|
|
|
j=1
|
|
dij
= ½ (N+1) − i + |
|
|
(4.100)
|
|
| d ≔ |
|
|
N
|
|
|
j=1
|
|
di
= (N+1) + |
|
|
(4.101)
|
|
As the number of features (N+1) increases,
the mean individual and global distance will also increase. More interesting
is the development of failure rates in dependence on the threshold th.
Obviously, if th=0, no failures will occur. If th=1, neighboring features
cannot be distinguished any more. The following match rate matrix shows
this case:
|
MR00
|
MR01
|
MR02
|
MR03
|
...
|
MR0N
|
|
MR10
|
MR11
|
MR12
|
MR13
|
...
|
MR1N
|
|
MR20
|
MR21
|
MR22
|
MR23
|
...
|
MR2N
|
|
MR30
|
MR31
|
MR32
|
MR33
|
...
|
MR3N
|
|
.
|
.
|
.
|
.
|
...
|
.
|
|
MRN0
|
MRN1
|
MRN2
|
MRN3
|
...
|
MRNN
|
|
= |
|
1
|
1
|
0
|
0
|
...
|
0
|
|
1
|
1
|
1
|
0
|
...
|
0
|
|
0
|
1
|
1
|
1
|
...
|
0
|
|
0
|
0
|
1
|
1
|
...
|
1
|
|
.
|
.
|
.
|
.
|
...
|
.
|
|
0
|
0
|
0
|
0
|
...
|
1
|
|
The bilateral FMRij
for this case is equal to 1 if j=i±1 and 0 else. (The cases i=j represent
the CMRi which are always 1 in this case.) Then the
individual FMRi are given by FMRi
= 1/N if i = 1 or N and 2/N else. The reason for this behavior is that
the outer features only have 1 direct neighbor and thus are preferred.
If the threshold increases, more and more neighbors are to be considered.
A few simple-to-calculate general individual FMRs are given for i=0, N/2
(provided N is odd), or N . Then we get:
FMRN/2(th) is also the
largest value among all individual FMRs. Note the linear growth with the
normalized parameter th/N.
For the global FMR we get
For large N, the following approximation
holds:
|
|
Fig. 4.13: Continuous
inter-characteristic FMR versus threshold
|
|
Figure 4.13 shows the extreme cases with
maximum and minimum individual FMR. The border features (i=0 and i=N) show
the lowest individual FMR and the center feature at i=N/2 the largest one.
As expected, the global FMR which is the mean value of all individual FMRs,
is lying between these two extremes. The FNMR is 0 for every threshold
(except for the virtual threshold -ε) unless we expect stochastical errors. |
Example 4.4
(human height) shows a typical example for a pure intra-characteristic
FMR and FNMR. As usual, no threshold value can be
chosen such that both errors becomes zero. This is due to overlapping of
the probability densities for both height features.
The examples given above could be extended
to show the mixed case which would be the best approximation to reality.
Instead of fixed feature points in Fig. 3.4, we allow
all features to behave like a cloud (Fig. 4.2) as
shown for one pair in Fig. 4.4. Then we built the
mean value to get the inter-characteristic representations.
5. Performance
Parameters
This chapter discusses which influences are
able to improve the recognition performance, i.e., reduce recognition failures.
Most results are based on examples only, hence no general statements can
be derived. The best way to find an optimum is to model reality stochastically
and then trying to minimize failures. However, in many cases nature cannot
be simulated exactly or is too difficult to be described by a mathematical
model. In this cases heuristic methods are a common way to find a solution.
To support this trial-and-error process, the following hints may be helpful.
Additionally, the following considerations may help to find a deeper understanding
of optimum solutions if available [Zhang].
5.1 Enrolment
strategy
Example 5.1: Human height: Optimum enrolment
From Example
4.6 the question arises, how FMR and FNMR depend on the reference feature
h'1. Obviously, FMR will be lowest, if h'1
has the largest possible distance to the mean value of h2
and if we regard the bilateral false match rate. If we consider the individual
FMR, we have to take account all the other features which may be randomly
distributed around feature h2. This could have the
effect that a h'1, which has maximum distance
to h2, will become unnecessarily
close to a third feature, say h3. As a result, depending
on the feature distribution in F, one cannot compulsorily expect
an optimal value of h'1.
For FNMR the situation
is easier to manage. Obviously, the best h'1will
be that one which maximizes the integral (4.29). At
least if we have a symmetric probability density fh'1-h̃2
, this is the case if fh'1-h̃2 has its maximum
at zero and h'1 is equal to the mean value
of h̃2. This suggests the following
optimizing enrolment procedure: Take N samples of h1,
say h1i (i=1,...,11), and calculate the mean value
(as far as possible) to get the best reference:
| h'1
≔ |
|
|
N
|
|
|
i=1
|
|
h1i |
|
(5.1)
|
|
In many realistic cases, h'1
should minimize the individual FNMR while being neutral with respect to
global FMR. |
Example 5.2: Human height: Measurements with
repeated enrolments
This example, which
is based on Example 4.4, shows the relationship
between the one-dimensional feature densities, the feature distance densities,
and the failure rates depending on the decision threshold. No explicit
enrolment is expected. This is equivalent to repeating enrolment with each
new measurement.
|
|
|
Fig.
5.1: Density functions of the height features fh1
and fh2 and their distances p12
and p11
|
|
Fig. 5.1 shows the
symmetric densities of the two height features fh1
and fh2. As a result the densities
p12 and p21, which represent
the broader distance densities, are equal and their peak locations represent
the true difference h1−h2.
Since the area of the densities must be 1, its peak height is smaller than
the height of the feature densities. The densities p11=p22
represent the failure densities of two subsequent independent measurements
of h1 or h2, respectively. Their
height is twice as large as of p12=p21
as only positive values can occur.
From p11
and p12 FMR and FNMR can be calculated using formulae
(4.24) and (4.25). We get the following
diagram which shows the dependence of FMR and FNMR of the threshold value
th.
|
|
|
Fig. 5.2: FNMR
and FMR versus threshold th
|
|
The DET in Fig.
5.3 shows the corresponding dependence of FMR from FNMR with eliminated
parameter th. As we can see, no threshold value can be chosen such that
the error becomes zero. This is due to the overlap of the feature densities
for both height features.
|
|
|
Fig. 5.3: Detection
Error Tradeoff curve
|
|
|
Example 5.3: Human height: Measurements with
optimum one-time enrolment
Based on Example
4.4, the effect of optimum enrolment is shown here. We assume a one-time
enrolment with fixed reference feature h1' or h2'
such that both values represent the peak location of their corresponding
feature densities. Since the references are deterministic, the resulting
distance densities have the same width as the corresponding feature densities.
As a result, the overlap between p11 and p12
is smaller than in Example 5.2.
|
p1'1=p2'2 |
|
|
|
|
|
|
h1 |
p1'2=p2'1 |
h2 |
|
|
|
Fig. 5.4: Feature
densities and distance densities for optimum enrolment
|
|
Smaller overlap,
however, is equivalent to smaller failure rates, as is shown in the next
diagram and the following DET curve.
|
|
|
Fig. 5.5: FNMR
and FMR versus threshold th for optimum enrolment
|
|
|
|
|
Fig. 5.6: Detection
Error Tradeoff curve for optimum enrolment
|
|
Comparing Figure 5.6 with Figure
5.3, an optimized enrolment strategy shows a significantly improved
recognition performance! |
Characteristic-dependent
thresholds
Figure 4.3 shows a typical
behavior of the features of different biometric characteristics. There
may be characteristics which show large stochastical deviations while other
behave very reproducible. If it is possible to estimate a characteristics'
feature distribution during enrolment, it may suggest to introduce a variable
threshold to decrease FNMR in the first case by increasing the threshold
or FMR in the second one by decreasing it. The threshold is then to be
stored together with the reference feature vector and can be used for verification
as well as for identification. In the first case, however, this should
only be done if additionally the distance to other features (genuine or
impostors) is large enough. This has also to be checked during enrolment,
provided a sufficiently large feature database is available. Otherwise
the threshold decrease will increase FMR and thus may deteriorate security.
5.2
Adding feature components
Example 5.4: Combination of hat (Example
4.9) with jacket (Example 4.10)
This example treats the case of two individuals
with two feature components each. As we shall see, the result greatly depends
on the distance measure.
Discrete Metric δ
D3
= {0, 1} ⇒ I3 = {0, 1, 2}
|
f1'
|
f2'
|
pG(1) |
pG(2) |
pI(1) |
pI(2) |
FNMR(1) |
FMR(1) |
EER |
Roca |
Deta |
|
(H,NJ)
|
(NH,J)
|
0.46
|
0.54
|
0.06
|
0.94
|
0.54
|
0.06
|
(0.36̅4̅8̅) |
0.5676 |
0.3
|
|
|
|
Fig. 5.7: Discrete
FNMR
and
FMR versus threshold index (hat & jacket, discrete metric)
|
|
|
|
|
Fig. 5.8: Detection
Error Tradeoff curve (hat & jacket, discrete metric)
|
|
Hamming
metric d(f1,f2) ≔ δ(g1,g2)
+ δ(h1,h2), see (2.23)
f ≔ (g,h),
g ∈ {H,NH}, h ∈ {J,NJ}
D3
= {0, 1, 2} ⇒ I3 = {0, 1, 2,
3}
|
f1'
|
f2'
|
pG(1) |
pG(2) |
pG(3) |
pI(1) |
pI(2) |
pI(3) |
FNMR(1) |
FNMR(2) |
FMR(1) |
FMR(2) |
EER |
Roca |
Deta |
|
(H,NJ)
|
(NH,J)
|
0.46
|
0.48
|
0.06 |
0.06
|
0.48
|
0.46 |
0.54
|
0.06
|
0.06
|
0.54
|
(0.3) |
0.3468 |
0.204 |
|
|
|
Fig. 5.9: Discrete
FNMRand
FMR
versus threshold index (hat & jacket, Hamming metric)
|
|
|
|
|
Fig. 5.10: Detection
Error Tradeoff curve (hat & jacket, Hamming metric)
|
|
4-valued metric d1(f1,f2)
≔ αδ(g1,g2) + (1-α)δ(h1,h2),
0 < α < ½
f ≔ (g,h),
g ∈ {H,NH}, h ∈ {J,NJ}
D3
= {0, α, 1-α, 1} ⇒ I3 = {0,
1, 2, 3, 4}
|
f1'
|
f2'
|
pG(1) |
pG(2) |
pG(3) |
pG(4) |
pI(1) |
pI(2) |
pI(3) |
pI(4) |
FNMR(1) |
FNMR(2) |
FNMR(3) |
FMR(1) |
FMR(2) |
FMR(3) |
EER |
Roca |
Deta |
|
(H,NJ)
|
(NH,J)
|
0.46
|
0.39
|
0.09
|
0.06
|
0.06
|
0.09
|
0.39
|
0.46
|
0.54
|
0.15
|
0.06
|
0.06
|
0.15
|
0.54
|
0.15
|
0.1947
|
0.132
|
|
|
|
Fig. 5.11: Discrete
FNMR
and
FMR versus threshold index (hat & jacket, 4-valued metric
d1)
|
|
|
|
|
Fig. 5.12: Detection
Error Tradeoff curve (hat & jacket, 4-valued metric d1)
|
|
4-valued metric d2(f1,f2)
≔ αδ(g1,g2) + (1-α)δ(h1,h2),
½ < α < 1
f ≔ (g,h),
g ∈ {H,NH}, h ∈ {J,NJ}
D3
= {0, 1-α, α, 1} ⇒ I3 = {0,
1, 2, 3, 4}
This case corresponds
to non-optimal weighting. i.e., the less distinctive feature component
(hat) is weighted stronger.
|
f1'
|
f2'
|
pG(1) |
pG(2) |
pG(3) |
pG(4) |
pI(1) |
pI(2) |
pI(3) |
pI(4) |
FNMR(1) |
FNMR(2) |
FNMR(3) |
FMR(1) |
FMR(2) |
FMR(3) |
EER |
Roca |
Deta |
|
(H,NJ)
|
(NH,J)
|
0.46
|
0.09
|
0.39
|
0.06
|
0.06
|
0.39
|
0.09
|
0.46
|
0.54
|
0.45
|
0.06
|
0.06
|
0.45
|
0.54
|
0.45
|
0.3387
|
0.276
|
|
|
|
Fig. 5.13: Discrete
FNMR
and
FMR versus threshold index (hat & jacket, 4-valued metric
d2)
|
|
|
|
|
Fig. 5.14: Detection
Error Tradeoff curve (hat & jacket, 4-valued metric d2)
|
|
|
|
|
Fig. 5.15: Detection
Error Tradeoff curve (comparison, β ≔ 1-α)
|
|
Obviously, a weighted distance measure which
favors the most distinctive feature component delivers the best results
which is the jacket in this case, see Example 4.14:
| Method |
EER |
MMOE |
Deta |
| Hat only |
0.45 |
0.45 |
0.45 |
| Jacket only |
0.15 |
0.15 |
0.15 |
| Hat & jacket,
discrete metric |
(0.36̅4̅8̅) |
0.3 |
0.3 |
| Hat & jacket,
Hamming metric |
(0.3) |
0.3 |
0.204 |
| Hat & jacket,
weighted metric d1 |
0.15 |
0.15 |
0.132 |
| Hat & jacket,
weighted metric d2 |
0.45 |
0.3 |
0.276 |
Interestingly, only Deta sees an
advantage between the jacket-only case and the hat & jacket combination
with optimal metric and delivers a unique minimum! |
Example 5.5: 2-component features space with
stochastical features and weighted distance measures (generalization)
Let us assume two individuals ⓘ1
and ⓘ2 which distinguish by two stochastical properties
c1 and c2 ("components") each
of which are either available (A) or not (NA) with different probabilities
| ψij ≔ P(property
cj is available for individual ⓘi) |
|
c1 |
c2 |
| ⓘ1 |
ψ11 |
ψ12 |
| ⓘ2 |
ψ21 |
ψ22 |
|
|
(5.2)
|
|
The feature space now
has two "dimensions" and can be composed such that
F =
F1
x F2 = {fi
= (c1i,c2i)| i ∈ {0,1}} where
F1=
{c1| c1 ∈ {A, NA}} where A
denotes "available" and NA "not available"
F2=
{c2| c2 ∈ {A, NA}} where A
denotes "available" and NA "not available" |
|
(5.3)
|
|
Hence, F can
be represented by its elements:
| F = {(NA,NA),
(NA,A), (A,NA), (A,A)} |
|
(5.4)
|
|
As distance measure
we choose the composite measure
| d(f1,f2)
≔ αd1(c11,c21)
+ βd2(c12,c22) |
|
(5.5)
|
|
where d1
and d2 are the discrete metrics on F1
and F2, respectively, with f1,
f2
∈ F, c11, c21 ∈ F1,
and c12, c22 ∈ F2.
For the distance
and index sets we get, depending on the weighting constants α and β:
| α<β ⇒
D = {0, α, β, α+β}, I = {0,1,2,3,4} |
|
(5.6)
|
|
| α=β ⇒
D = {0,α,2α}, I = {0,1,2,3} |
|
(5.7)
|
|
| α>β ⇒
D = {0, β, α, α+β}, I = {0,1,2,3,4} |
|
(5.8)
|
|
As in Example
5.4, we try to distinguish the two individuals solely by the presence
of the feature components or properties. For enrolment, 16 possibilities
exist, starting with f1' = (NA,NA) and f2'
= (NA,NA) and ending with f1' = (A,A) and f2'
= (A,A). We exclude all trivial enrolments, i.e., enrolments with f1'
= f2'. Among the remaining 12 possibilities
we intuitively select those ones which represent the largest probability,
e.g.,
| f1'
= (A,NA) and f2' = (NA,A) if ψ11
> ψ21 and ψ22 > ψ12 |
|
(5.9)
|
|
The two individuals
ⓘ1
and ⓘ2 share the two properties
with different probabilities and are represented by four points in the
two-dimensional feature space F. To be more concrete, each reference
feature yields one point in F, as well as each measurement will
be represented by one point. But this may not always be the same point;
each of the available four points may be affected with different probability.
As results for the
bilateral distance densities pi'j(n), n∈D, (which
are the probabilities that a certain distance value occur), we get under
the assumption of stochastical independence:
| p1'1(0) |
= |
ψ11(1−ψ12)
|
| p1'1(α) |
= |
(1−ψ11)(1−ψ12) |
| p1'1(β) |
= |
ψ11ψ12
|
| p1'1(α+β) |
= |
(1−ψ11)ψ12
|
|
| p1'2(0) |
= |
ψ21(1−ψ22)
|
| p1'2(α) |
= |
(1−ψ21)(1−ψ22) |
| p1'2(β) |
= |
ψ21ψ22
|
| p1'2(α+β) |
= |
(1−ψ21)ψ22
|
|
| p2'1(0) |
= |
(1−ψ11)ψ12
|
| p2'1(α) |
= |
ψ11ψ12
|
| p2'1(β) |
= |
(1−ψ11)(1−ψ12) |
| p2'1(α+β) |
= |
ψ11(1−ψ12)
|
|
| p2'2(0) |
= |
(1−ψ21)ψ22
|
| p2'2(α) |
= |
ψ21ψ22
|
| p2'2(β) |
= |
(1−ψ21)(1−ψ22) |
| p2'2(α+β) |
= |
ψ21(1−ψ22)
|
|
|
|
(5.10)
|
|
Using formula (4.41)
and (4.42), the global densities can be calculated
as mean values:
|
Distance
|
Index |
pG(Index)
|
pI(Index)
|
|
-ε
|
0
|
0
|
0
|
|
0
|
1
|
½(ψ11+ψ22−ψ11ψ12−ψ21ψ22)
|
½(ψ12+ψ21−ψ11ψ12−ψ21ψ22)
|
|
α
|
2
|
½(1−ψ11−ψ12+ψ11ψ12+ψ21ψ22)
|
½(1−ψ21−ψ22+ψ11ψ12+ψ21ψ22)
|
|
β
|
3
|
½(1−ψ21−ψ22+ψ11ψ12+ψ21ψ22)
|
½(1−ψ11−ψ12+ψ11ψ12+ψ21ψ22)
|
|
α+β
|
4
|
½(ψ12+ψ21−ψ11ψ12−ψ21ψ22)
|
½(ψ11+ψ22−ψ11ψ12−ψ21ψ22)
|
|
|
(5.11)
|
|
The calculation of
FNMR and FMR depends on the order of the distance values. We distinguish
the three cases α<β, α>β, and α=β and start with
α<β
|
n
|
FNMR(n)
|
FMR(n)
|
|
-ε
|
1
|
0
|
|
0
|
1−½(ψ11+ψ22−ψ11ψ12−ψ21ψ22)
|
½(ψ12+ψ21−ψ11ψ12−ψ21ψ22)
|
|
α
|
½ − ½(ψ22
− ψ12)
|
½ − ½(ψ22
− ψ12)
|
|
β
|
½(ψ12+ψ21−ψ11ψ12−ψ21ψ22)
|
1−½(ψ11+ψ22−ψ11ψ12−ψ21ψ22)
|
|
α+β
|
0
|
1
|
|
|
(5.12)
|
|
At the distance α, FNMR and FMR become equal
and thus define the EER, i.e.,
It is striking that EER exclusively depends
on the probabilities of the second component c2! Since
β > α, this may be due to the stronger weight of the second component.
The assumption ψ22 > ψ12
takes care for the requirement that EER < ½. A more differentiating
view of the capability of multicomponent recognition delivers Deta which
after some heavy, but straightforward calculations, is seen to be
| Deta = ¼(2+ψ21−ψ11)
+ ¼(ψ12−ψ22)(3 −
(ψ11+ψ12+ψ21+ψ22)
+ 2(ψ11ψ12+ψ21ψ22)) |
|
(5.14)
|
|
Here, also the other probabilities have an
effect. This effect can be studied when making one component ideal, i.e.,
unique for each individual. This can be done by giving the first component
of the two individuals 100% certainty for availability, resp. non-availabilty.
That is we assume ψ11 = 1 and
ψ21 = 0. This would deliver
| Deta = (½(1+ψ12−ψ22))²
= EER² ≤ EER |
|
(5.16)
|
|
As already stated,
EER is not affected although both individuals are perfectly unique and
thus EER should be zero! This is an impressive example how comparison algorithms
may behave suboptimal. Comparing (5.15) with the one-dimensional
case (4.94), we see that no improvement is delivered
by the additional component! When calculating
| Deta = (½(1 +
ψ12− ψ22))² = EER² ≤
EER |
|
(5.17)
|
|
and comparing this
with the one-dimensional case (4.99) at least an improvement
can be stated which is expressed by the reduction effect of the squaring
operation. But this result is not satisfying since we again expect a Deta
of zero. The solution is to transfer the property of perfect uniqueness
to the second component such that ψ12 = 0 and
ψ22 = 1. Only in this case we get what we expect:
The systems behaves best, when ψ12
− ψ22 < ψ21 − ψ11,
combined with a stronger weighting of the second component (β >
α). This situation changes if α > β. In
that case, the system becomes optimal if ψ12
− ψ22 > ψ21 − ψ11.
α>β
|
n
|
FNMR(n)
|
FMR(n)
|
|
-ε
|
1
|
0
|
|
0
|
1−½(ψ11+ψ22−ψ11ψ12−ψ21ψ22)
|
½(ψ12+ψ21−ψ11ψ12−ψ21ψ22)
|
|
β
|
½ − ½(ψ11
− ψ21)
|
½ − ½(ψ11
− ψ21)
|
|
α
|
½(ψ12+ψ21−ψ11ψ12−ψ21ψ22)
|
1−½(ψ11+ψ22−ψ11ψ12−ψ21ψ22)
|
|
α+β
|
0
|
1
|
|
|
(5.19)
|
|
For completeness, also the special case α=β
is given here:
α=β
|
n
|
FNMR(n)
|
FMR(n)
|
|
-ε
|
1
|
0
|
|
0
|
1−½(ψ11+ψ22−ψ11ψ12−ψ21ψ22)
|
½(ψ12+ψ21−ψ11ψ12−ψ21ψ22)
|
|
α
|
½(ψ12+ψ21−ψ11ψ12−ψ21ψ22)
|
1−½(ψ11+ψ22−ψ11ψ12−ψ21ψ22)
|
|
2α
|
0
|
1
|
|
|
(5.20)
|
|
A simple calculation delivers the interpolated
equal error rate:
| EERint = ½(1 +
½(ψ12− ψ22)
+ ½(ψ21 − ψ11)) |
|
(5.21)
|
|
This case corresponds
to a metric without weighting. It seems to behave optimal when ψ12−ψ22
and ψ21−ψ11 are nearly
equal but without being better than the weighted cases, provided the mismatch
is small. However, it should be kept in mind that the interpolated EER
need not be a reliable measure for comparing separability.
The following two examples show the influence
of feature probabilities on Deta:
| ψ11=
0.7, ψ12= 0.3, ψ21= 0.3, ψ22=
0.7 ⇒ Deta = 0.216 |
|
(5.22)
|
|
| ψ11=
0.4, ψ12= 0.1, ψ21= 0.3, ψ22=
0.8 ⇒ Deta = 0.132 |
|
(5.23)
|
|
|
Example 5.6: Maximum distance features with
N feature components
We consider a feature space F where
the features fi of two individuals comprise
N feature components cn such that
| f = (c1,
c2, c3, ..., cN) |
|
(5.24)
|
|
Each feature component cn
may randomly assume the values 0 and 1 with probability 1- ψ and ψ, respectively,
for each component of individual ⓘ1, and ψ and
1- ψ for individual ⓘ2, i.e.,
| ψ ≤ ½ |
| ci ∈ {0,1} |
| ⓘ1:
P{ci = 1} = 1 − P{ci = 0} =
ψ |
| ⓘ2:
P{ci= 1} = 1 − P{ci = 0} =
1 − ψ |
|
(5.25)
|
|
Since ψ ≤ ½, we try as reference
features
| f1' ≔ (0,
0, 0, ..., 0) |
| f2' ≔ (1,
1, 1, ..., 1) |
|
(5.26)
|
|
which allow a simple calculation of the corresponding
error rates. As metric we take the Hamming distance d which counts the
number of component disagreements between two features. The individuals
ⓘ1 and ⓘ2 show the
maximum possible distance of their references:
D is seen to be D = {0, 1, 2, 3, ..., N}
and the index set I = {0, 1, 2, 3, ..., N+1}. The density functions P{d(f,g)
= k} are given by the binomial expression [Wikipedia]
| kp1'1(k)
= p2'2(k) ≔ P{d(f2',f2)
= k} = |
( |
|
) |
ψk(1 − ψ)N-k
≕ p(k) |
|
(5.28)
|
|
| kp1'2(k)
= p1'2(k) ≔ P{d(f1',f2)
= k} = |
( |
|
) |
(1 − ψ)kψN-k
≕ q(k) |
|
(5.29)
|
|
We then get
| p1'1(k) ≔ P{d(f1',f1)
= k} = p(k) |
| p1'2(k) ≔ P{d(f1',f2)
= k} = q(k) |
| p2'1(k) ≔ P{d(f2',f1)
= k} = q(k) |
| p2'2(k) ≔ P{d(f2',f2)
= k} = p(k) |
|
(5.30)
|
|
| pG(k) = p(k) |
| pI(k) = q(k) |
|
(5.31)
|
|
| kFNMR(n)
= 1 − |
|
n
|
|
|
k=1
|
|
( |
|
) |
ψk(1
− ψ)N-k |
|
(5.32)
|
|
| kFMR(n)
= |
|
n
|
|
|
k=1
|
|
( |
|
) |
(1 − ψ)kψN-k |
|
(5.33)
|
|
|
|
|
Fig. 5.16: Discrete
FNMR
and
FMR versus threshold index for 7 feature components
|
|
|
|
|
Fig. 5.17: Detection
Error Tradeoff curve for 7 feature components
|
|
The following animations
show the behavior for varying N.
|
|
|
Fig. 5.18: Discrete
FNMR
and
FMR versus threshold index
|
|
|
|
|
Fig. 5.19: Detection
Error Tradeoff curve with discrete index
|
|
|
Feature Components
|
ψ
|
EER
|
MMOE
|
Deta
|
|
1
|
0.3
|
0.3
|
0.3
|
0.3
|
|
2
|
0.3
|
(0.3)
|
0.3
|
0.216
|
|
3
|
0.3
|
0.216
|
0.216
|
0.16308
|
|
4
|
0.3
|
(0.216)
|
0.216
|
0.126036
|
|
5
|
0.3
|
0.16308
|
0.16308
|
0.09880866
|
|
6
|
0.3
|
(0.16308)
|
0.16308
|
0.078224791
|
|
7
|
0.3
|
0.126036
|
0.126036
|
0.062375212
|
|
Feature Components
|
ψ
|
EER
|
MMOE
|
Deta
|
|
7
|
0.5
|
0.5
|
0.5
|
0.5
|
|
7
|
0.4
|
0.289792
|
0.289792
|
0.228843953
|
|
7
|
0.3
|
0.216
|
0.216
|
0.16308
|
|
7
|
0.2
|
0.033344
|
0.033344
|
0.007003561
|
|
7
|
0.1
|
0.002728
|
0.002728
|
9.92855E-05
|
|
7
|
0.05
|
0.000193578
|
0.000193578
|
1.02554E-06
|
|
7
|
0.01
|
3.4167E-07
|
3.4167E-07
|
1.62789E-11
|
|
7
|
0.001
|
3.49161E-11
|
3.49161E-11
|
-
|
As expected, the recognition performance
increases with increasing number of feature components, provided we consider
the maximum distance features. Furthermore, while EER and MMOE change with
the same order for increasing number of feature components or "individuality"
(ψ vs. 1−ψ), Deta vanishes more rapidly. |
Example 5.7: Minimum distance features with
N feature components
Just as in Example
5.6 we consider a system where the features f of two individuals
comprise N feature components ci such that
| f = (c1,
c2, c3, ..., cN) |
|
(5.34)
|
|
Each feature component ci
may randomly assume the values 0 and 1 with probability 1- ψ and ψ, respectively,
for each component of individual ⓘ1. Individual
ⓘ2 changes probabilities for the first component
c1, i.e.,
| ψ ≤ ½ |
| ci ∈ {0,1} |
| ⓘ1:
P{ci = 1} = 1 − P{ci = 0} =
ψ |
| ⓘ2:
P{ci
= 1} = 1 − ψ for i = 1 and P{ci = 1} = ψ for all
i > 1 |
|
(5.35)
|
|
Since ψ ≤ ½, we try as reference
features
| f1' ≔ (0,
0, 0, ..., 0) |
| f2' ≔ (1,
0, 0, ..., 0) |
|
(5.36)
|
|
As metric we take the Hamming distance d
which counts the number of component disagreements between two features.
The individuals ⓘ1 and ⓘ2
show the minimum reasonable distance of their references:
D is given as D = {0, 1, 2, 3, ..., N} and
the index set I = {0, 1, 2, 3, ..., N+1}. The density functions P{d(f,g)
= k} are given by the binomial expression
| kp1'1(k)
= p2'2(k) ≔ P{d(f2',f2)
= k} = |
( |
|
) |
ψk(1 − ψ)N-k
≕ p(k) |
|
(5.38)
|
|
| kp1'2(k)
= p1'2(k) ≔ P{d(f1',f2)
= k} = |
( |
|
) |
ψk+1(1 − ψ)N-k-1
+ ( |
( |
|
) |
+ |
( |
|
) |
)ψk-1(1 − ψ)N-k+1
≕ q(k) |
|
(5.39)
|
|
We then get
| p1'1(k) ≔ P{d(f1',f1)
= k} = p(k) |
| p1'2(k) ≔ P{d(f1',f2)
= k} = q(k) |
| p2'1(k) ≔ P{d(f2',f1)
= k} = q(k) |
| p2'2(k) ≔ P{d(f2',f2)
= k} = p(k) |
|
(5.40)
|
|
| pG(k) = p(k) |
| pI(k) = q(k) |
|
(5.41)
|
|
| kFNMR(n)
= 1 − |
|
n
|
|
|
k=1
|
|
p(k) |
|
(5.42)
|
|
| kFMR(n)
= |
|
n
|
|
|
k=1
|
|
q(k) |
|
(5.43)
|
|
|
|
|
Fig. 5.20: Discrete
FNMRand
FMR
versus threshold index
|
|
|
|
|
Fig. 5.21: Detection
Error Tradeoff curve with discrete index
|
|
|
|
|
Fig. 5.22: Discrete
FNMR
and
FMR versus threshold index (hat & jacket)
|
|
|
|
|
Fig. 5.23: Detection
Error Tradeoff curve with discrete index
|
|
|
Feature Components
|
ψ
|
EER
|
MMOE
|
Deta
|
|
1
|
0.3
|
0.3
|
0.3
|
0.3
|
|
2
|
0.3
|
(0.384)
|
0.36
|
0.342
|
|
3
|
0.3
|
(0.40984615)
|
0.402
|
0.36636
|
|
4
|
0.3
|
(0.414975)
|
0.4118
|
0.382341
|
|
5
|
0.3
|
(0.42238435)
|
0.41768
|
0.393758532
|
|
6
|
0.3
|
(0.43264246)
|
0.42797
|
0.402413305
|
|
7
|
0.3
|
(0.43630738)
|
0.435173
|
0.409257623
|
In contrast to Example
5.6 where we consider two features with maximum distance, this case
shows the effect that the recognition performance decreases with the number
of feature components. |
5.3
Multiple sampling during recognition
The same procedure which is common for enrolment
can also be taken to improve recognition performance. Typically, the capturing
process is repeated until recognition is successful or a limit is reached.
The limit is required to prevent brute force attacks by repeatedly trying
new characteristics. In the ideal case (the characteristic has not changed
since enrolment and the trials are stochastically independent), FNMR will
significantly be improved. At the same time FMR may increase, too. If the
genuine repeats a trial, statistics is only affected by the intra-characteristic
probability density. Since this density does not overlap too much with
other densities, FMR will only be slightly affected. However, multiple
recognition trials can also be used by impostors to try multiple characteristics.
For that reason, other methods may be preferable. For example, one may
try to build the mean value of all repetitions and then use it for comparison,
unless variation is too large. This is excessively used by variable characteristics
such as key stroke biometrics where it is essential to hit enough feature
components, and if possible, to decrease their variance by averaging [Bartmann].
Generally, multiple trials can be considered
as information fusion, using different samples of the same biometric characteristic.
This fusion can obviously be performed at different stages of the system
(Figure 2.1):
-
Feature layer: combining feature vectors
-
Distance layer: combining distances as the
direct result of comparisons
-
Decision layer: combining decision results
Depending on which method is used and on
the quality of enrolment, different improvements are expected. An improvement,
however is not self-evident: While FMR may decrease, FNMR may increase
and vice versa. This will be illustrated with several examples.
To simplify the calculation, stochastical
independence is expected. But it should be noted that most practical situations
do not imply independence. Evaluations may force special measures to keep
independence for the sake of comparability. This may, however, not necessarily
boost usability in practice!
5.3.1 Feature
layer fusion
Manipulations in the feature space mostly
encompass linear and non-linear averaging operations on the individual
feature components. This may require an extension of the feature space.
Example 5.8: Jacket with Euclidean metric
Instead of using the discrete metric as in
Example
4.10 we now try the Euclidean metric for determining the distance between
reference features and sample features.
As references we do not choose the symbols
J and NJ as in Example 4.10, but numbers such
as 1 for "jacket" and 0 for "no jacket". As we use the same stochastical
conditions as in Example 4.10, we are now able
to use a kind of optimal enrolment which corresponds to the mean value
of the features. The feature samples f1 and f2
can assume the values 0 and 1. Hence, we try
f1' ≔ 0.1 and f2'
≔ 0.8
as values for the references since these
are the probabilities for the individuals ⓘ1 and
ⓘ2 to wear a jacket. The feature space is then given
by F = {0, 0.1, 0.8, 1} and the space of enrolled features by F'
= {0.1, 0.8}.
D = {0, 0.1, 0.2, 0.8, 0.9}
I = {0, 1, 2, 3, 4}
Since the distance value 0 does only
appear in theory here, we can omit the virtual distance -ε and map 0 ∈
D to 0 ∈ I.
dij ≔ |fi
- fj| (Euclidean metric)
A general improvement by using the "optimized"
enrolment cannot be expected here. Indeed, the general situation does not
change since there are fixed relationships between the distances such that
the most probable feature meets the shortest distances. This should only
change the distances, not their probabilities. There is one difference,
however: The probabilities are spread over more different distance values!
This may slightly change our separability values.
| p1'1(0.1) = 0.9 |
| p1'1(0.9) = 0.1 |
|
| p1'2(0.1) = 0.2 |
| p1'2(0.9) = 0.8 |
|
| p2'1(0.2) = 0.1 |
| p2'1(0.8) = 0.9 |
|
| p2'2(0.2) = 0.8 |
| p2'2(0.8) = 0.2 |
|
| Distance |
Index |
pG |
FNMR |
pI |
FMR |
MOE |
| 0 |
0 |
0 |
1 |
0 |
0 |
0.5 |
| 0.1 |
1 |
0.45 |
0.55 |
0.1 |
0.1 |
0.325
| |