On the reconstruction of biometric raw data from template data

Manfred Bromba
http://www.bromba.com/contacte.htm
2006-12-23
(first release: 2003-04-20)
Permanent address for citation: urn:nbn:de:0125-2008041402

Summary

Privacy activists concerning about the protection of stored or transmitted biometric data are often reassured by the statement that biometric raw data cannot be reconstructed from stored biometric templates. This paper shows that a more differentiating consideration is necessary. Especially, it is shown that there is strong evidence for raw data to be reconstructible from template data at least partially. Furthermore, misuse of templates does not necessarily need a reconstruction of raw data.

Introduction

Often it is argued that privacy is guaranteed or at least improved, when biometric raw data cannot be reconstructed from biometric templates which are stored and transmitted for the purpose of biometric authentication.

In this paper I will pose two questions:

  • Can biometric raw data be reconstructed from template data?
  • Does non-reconstructibility help privacy?
Aim is to initiate a discussion on the sense or nonsense of non-reconstructibility with respect to privacy. Maybe, the requirement of non-reconstructibility proves as of the same category as the statement, biometric features cannot be faked.

Definitions

Biometric authentication system

To enable an understanding of the following discussion, it would be helpful to define the biometric authentication system. A biometric authentication system mainly comprises the following functional units:
  • Sensor device for acquisition of biometric raw data
  • Feature extraction for template creation
  • Matcher to compare the actual biometric template with the stored reference templates
  • Reference archive for storing the biometric reference templates
Sensor
Device
raw
data
Feature
Extraction
template
data
Matcher
out
put
|
Reference
Archive
(template data)
 
Fig. 1: Block diagram of biometric authentication system

All units may be placed at different locations. The matcher output (the "score" value) is used for the decision whether the actual template fits to a reference template.

Raw data

Under raw data we will understand the unmodified output of the sensor device. This can be the image of a fingerprint, a face, an iris, or a sound from a microphone. Some preprocessing is allowed in the definition, provided that no information nor redundancy is added or dropped.

Template data

Under template data we will understand those data which are compared in the matcher unit. Normally, templates will only contain information necessary for comparison. However, it is not fixed what is necessary for comparison.
    Example fingerprint: Presently, three formats for fingerprint data are being discussed within ISO/IEC JTC1/SC37 for standardization: minutiae, pattern based, and image based. All three formats can be used as template formats, the latter also as raw data format. For all three formats appropriate matchers are available:
    • "Minutiae matcher": Here the template only should contain minutiae information such as position, type, and angle
    • "Pattern matcher": Here the template only should contain pattern information (ridge structure)
    • "Image matcher": Here the raw data are matched directly. Template and raw data may be identical.
    • Many matcher implementations are combinations from the matchers above.
    Note that the amount and type of template information is different in all cases!

What kind of information is available in the raw data?

Regarding privacy, the kind of information determines the potential for misuse. There are various classification schemes possible. Let us try the following one (A):
A1: Information usable for authentication
A2: Information not usable for authentication
Another one (B) is given by the origin of biometric features:
B1: Genotypic information
B2: Randotypic information (sometimes called phenotypic, but without genetic parts)
B3: Behavioral information
B4: Information about "unchanging marks"
Genotypic information is completely determined by genetics. Randotypic information is completely random, and behavioral information is completely determined by training. Unchanging marks may be scars, tattoos, or chronic disease. Naturally, biometric features always are composed of B1, B2, B3, and B4, with different weighting, depending on the type of feature.

While the first classification scheme (A) by definition covers the whole information content of the raw data, say I(RD), the second one (B) obviously does not. However, it is assumed that it completely covers the biometric part usable for authentication. If this is true, we have the following relations, where "+" denotes the union of two sets:

  • I(RD) = A1 + A2
  • A1 is a subset of B1 + B2 + B3 + B4
Remember that permanence is one of the basic requirements for a useful biometric feature. So, if the raw data comprise information about acute disease, this information will be a subset of A2 by definition. Chronic disease information, if available in the raw data, may be a subset of A1.

Can biometric raw data be reconstructed from template data?

Case 1. From the definition of template data a limit case may be constructed for which template data and raw data are equal. In this case the answer is trivial:
If template data equals raw data, reconstruction is trivial. In this case the information content is the same: I(RD) = I(T).
This case is not purely theoretical, although such systems are rather rare.

Case 2. Suppose, the raw data only contain information usable for authentication, i.e., I(RD) = A1. In this case the feature extraction need not extract anything: raw data may equal template data and case 1 applies. Now the question arises, whether there is any transformation in place of the feature extraction which is mathematically not invertible but does not reduce information nor degrades matcher performance.

This question is left open, but my conjecture is, raw data will always be reconstructible from template data if I(RD) = A1!
In practice, the property of non-invertibility may be reduced to "computationally very difficult to reverse".

Case 3. Under the assumption that the raw data contain information not usable for authentication, we have I(RD) = A1 + A2 with A2 non-empty. An ideally working feature extraction unit will completely remove A2 without removing useful information, i.e., I(T) = A1. A2 contains information not necessary for matching, e.g., information added because of or in seldom cases even about acute disease.

Information removed by feature extraction cannot be reconstructed from template data. In this case, missing information can only be guessed.
The chance for guessing the lost information depends on the number of discrete possibilities.
Case 3 includes the following very realistic situation: Let us assume that the matcher uses less information than is available for authentication. That is, the feature extraction need not only remove A2, it may also remove parts of A1 (those parts, the matcher will not utilize).  For example, a fingerprint matcher may use only minutiae information; information about ridges, pores etc. is not processed.
My conjecture: If the template contains less information for authentication than is available (I(T) < A1), biometric performance is worse than necessary. As a result, the probability (~ False Acceptance Rate) that two templates from different fingers become indistinguishable, will increase. Conversely, it  will become more difficult to reconstruct complete A1 by chance.
Standardization of Templates. To make biometric systems interoperable, it may be necessary to have a template standard to allow for exchange of templates between different systems without losing much performance. An example are templates to be processed in international ID cards.

Assuming the same raw data, the most simple way for template standardization is to take the feature extraction unit/algorithms from a proven system and to fix it. Obviously, this will be the end of further improvements with respect to biometric performance (or the end of the standard, if future improvement is high enough). A more flexible way is to define the template for a set of test raw data which represent a complete basis to cover most real raw data.

From the first method which fixes the algorithms, the question of nonreconstructibility may easily be answered by investigating the algorithm. In the second case, with some a priori knowledge, reconstruction could be possible even without knowing the algorithm.

How to reconstruct raw data from templates. Let us suppose the biometric authentication system (Fig. 1) is available and the template to be reconstructed is stored in the reference archive. Furthermore, the input of the feature extraction unit is accessible and the output value (= score) of the matcher is available with sufficient resolution / granularity. Now we are able to use the hill-climbing attack to reconstruct the raw data by iterative processing. We start with an initial guess for the raw image. An authentication is tried and the score value at the matcher's output is observed. Now the first raw data set is slightly modified and again presented for authentication. If the new score value indicates an improved similarity between iterated raw data / template and reference template, the modification was successful and is extended, otherwise another modification is tried. This process has to be continued (perhaps several million times) until the score value is sufficient for acceptance. A more sophisticated description of hill-climbing attacks has been given by Soutar.

Now we have to discuss what kind of "raw data" the hill-climbing attack delivers. From the discussion above we know, that information that has been removed from the template cannot be reconstructed. That is, if any information has been removed, the final iteration of the raw data will be ambiguous although the resulting templates are similar! (Especially, the final iteration result may depend on the initial guess.) Again, if the templates are defined such that they equal the raw data, the result normally should be unique.

For this reconstruction method it is not necessary to know how the algorithm works! Furthermore, if the template format is standardized, any biometric system may be used to try to reconstruct the raw image via hill-climbing attack. Superficially, this may be considered a drawback of standardization.

More information about reconstruction methods and its realization can be found in the thesis of Hill and a paper from Adler. Hill describes and executes a way to reconstruct images from fingerprint minutiae templates. Adler found an efficient method to reconstruct face images. His method needs only several thousand iterations to generate an image which can be confused with the original image when using the associated algorithm.

Conclusion. It is indeed possible to reconstruct at least those parts of the raw data information which is used for authentication. The better this information (A1) is utilized, the better the reconstruction will work. Only information which is not usable for authentication and thus is removed from the template, is not reconstructible. As far as information which is unsuitable for authentication is critical for privacy (e.g., acute disease information), this seems not to be a problem for a well designed system. However, the question remains whether the authenticational information, which is reconstructible, is critical with respect to privacy. This cannot be denied for biometric features which are mainly genotypic or behavioral. With respect to privacy, genotypic information seems to be most critical since it might reveal relationships to other persons, race, or even potential disease.

Does non-reconstructibility help privacy?

Privacy is strongly related to security. In many applications, biometric authentication is used to enhance privacy by preventing unauthorized access to personal data. On the other hand, biometric systems make use of  personal data to secure values such as (other) personal data. In security applications it is essential to know the potential damage that could happen to the value to be secured and the ways how this could happen (commonly known as attacks). However, since not all kind of attacks are known in advance, it may be more realistic, at least for privacy, to allow something like a right to self-determination. Nobody must give reasons for refusing the use of his personal data. Here we will focus on the question, whether non-reconstructibility can prevent misuse. I will show now, that some kind of misuse cannot be prevented (provided that an attacker has access to a template)!

Let us assume that only a partial reconstruction is possible. For example, from a raw fingerprint image only the positions of the minutiae shall be stored in the reference template. Now, with the knowledge of this (in future standardized!) data, it is possible to create a lot of fingerprint trial images with different ridge structure, but all with the same minutiae positions. This way, the raw data cannot be reconstructed uniquely from the template. However, any of these trial fingerprints, when processed by the feature extraction, deliver exactly the same template. That is, if I have the template data, I am able to fool at least that system, from which the original template is coming from. And this is at least a security problem! Similarly, if the minutiae locations are available, it should be possible without any problems, to perform an identification of the template under consideration against a large template data base, e.g., for the purpose of law enforcement, only by using a trial raw data set. This trial raw data set only must share that information with the unknown original raw data set the  system will use for identification. The rest will be irrelevant!

Even if the raw data wouldn't be reconstructible, there is no guarantee that template data cannot be misused.
Impact from publicity of biometric features.  Biometric features are more (= open) or less (= covered) public. Open features such as face, gait, or voice are easier to be captured without the knowledge or activity of the owner (= covert), whereas covered features such as fingerprint, iris, or retina need interaction of the owner (= overt). If one considers the coveredness as advantage with respect to security and privacy and wants to retain this advantage, non-reconstructibility becomes more essential for covered than for open features.

Biometric data and hash operations

It is well known from the processing of passwords, that it is more advantageous with respect to security to store not the password itself but the "hashed" value of the password. Hashing is an operation which may be described by the following properties:
  1. Hashing the same password delivers the same results ("hash values")
  2. Hash values have a fixed number of digits
  3. Hashing even slightly different passwords delivers totally different hash values
  4. The hash operation is extremely difficult to reverse, i.e., it is nearly impossible to reconstruct the password from its hash value
To compare the entered password with the stored reference password, it is sufficient to compare the hash value of the entered password with the stored hash value of the reference password. If these two are equal, the password must be equal, too. (If the password is longer than the hash value, property 3 cannot hold any longer. However, such occurrences where different inputs deliver the same hash value as output are so seldom that they can be neglected, provided that the hash value is long enough.) If we assume that the specific hash algorithm behaves as desired, the only ways to compromise the password is to tap the entered password while observing the result of the comparison or to perform a brute force attack by trying all possibilities until the result equals the stored hash value of the reference password. (Such an attack may take million of years, depending on the specific hash algorithm and the available processing power!)

The experience with passwords suggests us to store only the hashed biometric template data, making a reconstruction of original template data and thus of the raw data extremely difficult. Unfortunately, this method only makes sense in those trivial cases where the matcher is a simple comparator based on mathematical subtraction. But even if this should be the case, template data normally show variations due to unavoidable variations in the raw data. As a result, the hash values of biometric data will never be the same even if the raw data are coming from multiple samples of the same biometric feature. That is, hashing cannot be used for biometric templates. It is really one-way, unfortunately not only for attackers. (If two raw data sets are really equal, it must be assumed that one template is the digital copy of the other, e.g., originating from a replay attack. In biometric systems such equalities should be used to trigger an alarm rather than to enable a successful authentication!)

Nevertheless, researchers are investigating alternative authentication methods to overcome the problem. Several techniques have been proposed or are under consideration, which are able to virtually reduce the matching process to a simple subtraction, i.e., the reference template as well as the request (sample) template can be represented as simple numbers or vectors which can be hashed. Today, all these methods still suffer from restricted biometric performance. It is still open if they ever will reach the performance of classical realizations.

Conclusion

The global statement that biometric raw data cannot be reconstructed from template data is a very weak statement with respect to privacy and security for at least three reasons:
  • There are cases where raw data are very similar to template data by definition and therefore can hardly be distinguished.
  • Often the reconstruction is possible to a degree which is sufficient for misuse.
  • Even if reconstruction should not be possible in specific cases, misuse of templates remains possible.
Only the removal of sensitive information which is not usable for authentication can be properly constituted.

For real life, however, it is less important whether privacy can be infringed theoretically (or not). Essential is the effort necessary to achieve this and the possibilities for a concerned person to prevent it. From my point of view, the reconstructibility of raw data from template data has only a small impact on privacy compared to other flaws. It remains an academic discussion unless concrete scenarios are addressed.

Acknowledgment

I would like to thank Paul Reid, James Wayman, and James Reisman for valuable suggestions!

Publications

Soutar, C.: "Biometric System Security", in: Secure No. 5 ,2002, p 46-49. Direct download: http://www.silicontrust.com/pdf/secure_5/46_techno_4.pdf (Site has gone)

Hill, C.J.; Risk of Masquerade Arising from the Storage of Biometrics, B.S. Thesis, Australian National University, 2001. Download from page: http://chris.fornax.net/biometrics.html

Adler, A.; Sample images can be independently restored from face recognition templates, University of Ottawa, 2003. Download from page: 
http://www.sce.carleton.ca/faculty/adler/publications/publications.html



Download former releases:
2003-04-23 2003-06-02 2003-07-09