On the reconstruction of biometric raw data from template datahttp://www.bromba.com/contacte.htm
2006-12-23 |
SummaryPrivacy activists concerning about the protection of stored or transmitted biometric data are often reassured by the statement that biometric raw data cannot be reconstructed from stored biometric templates. This paper shows that a more differentiating consideration is necessary. Especially, it is shown that there is strong evidence for raw data to be reconstructible from template data at least partially. Furthermore, misuse of templates does not necessarily need a reconstruction of raw data.IntroductionOften it is argued that privacy is guaranteed or at least improved, when biometric raw data cannot be reconstructed from biometric templates which are stored and transmitted for the purpose of biometric authentication.In this paper I will pose two questions:
DefinitionsBiometric authentication systemTo enable an understanding of the following discussion, it would be helpful to define the biometric authentication system. A biometric authentication system mainly comprises the following functional units:
All units may be placed at different locations. The matcher output (the "score" value) is used for the decision whether the actual template fits to a reference template. Raw dataUnder raw data we will understand the unmodified output of the sensor device. This can be the image of a fingerprint, a face, an iris, or a sound from a microphone. Some preprocessing is allowed in the definition, provided that no information nor redundancy is added or dropped.Template dataUnder template data we will understand those data which are compared in the matcher unit. Normally, templates will only contain information necessary for comparison. However, it is not fixed what is necessary for comparison.
What kind of information is available in the raw data?Regarding privacy, the kind of information determines the potential for misuse. There are various classification schemes possible. Let us try the following one (A):A1: Information usable for authenticationAnother one (B) is given by the origin of biometric features: B1: Genotypic informationGenotypic information is completely determined by genetics. Randotypic information is completely random, and behavioral information is completely determined by training. Unchanging marks may be scars, tattoos, or chronic disease. Naturally, biometric features always are composed of B1, B2, B3, and B4, with different weighting, depending on the type of feature. While the first classification scheme (A)
by definition covers the whole information content of the raw data, say
Can biometric raw data be reconstructed from template data?Case 1. From the definition of template data a limit case may be constructed for which template data and raw data are equal. In this case the answer is trivial:If template data equals raw data, reconstruction is trivial. In this case the information content is the same:This case is not purely theoretical, although such systems are rather rare. Case 2. Suppose, the raw data only
contain information usable for authentication, i.e., This question is left open, but my conjecture is, raw data will always be reconstructible from template data ifIn practice, the property of non-invertibility may be reduced to "computationally very difficult to reverse". Case 3. Under the assumption that
the raw data contain information not usable for authentication, we have
Information removed by feature extraction cannot be reconstructed from template data. In this case, missing information can only be guessed.The chance for guessing the lost information depends on the number of discrete possibilities. Case 3 includes the following very realistic situation: Let us assume that the matcher uses less information than is available for authentication. That is, the feature extraction need not only remove A2, it may also remove parts of A1 (those parts, the matcher will not utilize). For example, a fingerprint matcher may use only minutiae information; information about ridges, pores etc. is not processed. My conjecture: If the template contains less information for authentication than is availableStandardization of Templates. To make biometric systems interoperable, it may be necessary to have a template standard to allow for exchange of templates between different systems without losing much performance. An example are templates to be processed in international ID cards. Assuming the same raw data, the most simple way for template standardization is to take the feature extraction unit/algorithms from a proven system and to fix it. Obviously, this will be the end of further improvements with respect to biometric performance (or the end of the standard, if future improvement is high enough). A more flexible way is to define the template for a set of test raw data which represent a complete basis to cover most real raw data. From the first method which fixes the algorithms, the question of nonreconstructibility may easily be answered by investigating the algorithm. In the second case, with some a priori knowledge, reconstruction could be possible even without knowing the algorithm. How to reconstruct raw data from templates.
Let us suppose the biometric authentication system Now we have to discuss what kind of "raw data" the hill-climbing attack delivers. From the discussion above we know, that information that has been removed from the template cannot be reconstructed. That is, if any information has been removed, the final iteration of the raw data will be ambiguous although the resulting templates are similar! (Especially, the final iteration result may depend on the initial guess.) Again, if the templates are defined such that they equal the raw data, the result normally should be unique. For this reconstruction method it is not necessary to know how the algorithm works! Furthermore, if the template format is standardized, any biometric system may be used to try to reconstruct the raw image via hill-climbing attack. Superficially, this may be considered a drawback of standardization. More information about reconstruction methods and its realization can be found in the thesis of Hill and a paper from Adler. Hill describes and executes a way to reconstruct images from fingerprint minutiae templates. Adler found an efficient method to reconstruct face images. His method needs only several thousand iterations to generate an image which can be confused with the original image when using the associated algorithm. Conclusion. It is indeed possible to reconstruct at least those parts of the raw data information which is used for authentication. The better this information (A1) is utilized, the better the reconstruction will work. Only information which is not usable for authentication and thus is removed from the template, is not reconstructible. As far as information which is unsuitable for authentication is critical for privacy (e.g., acute disease information), this seems not to be a problem for a well designed system. However, the question remains whether the authenticational information, which is reconstructible, is critical with respect to privacy. This cannot be denied for biometric features which are mainly genotypic or behavioral. With respect to privacy, genotypic information seems to be most critical since it might reveal relationships to other persons, race, or even potential disease. Does non-reconstructibility help privacy?Privacy is strongly related to security. In many applications, biometric authentication is used to enhance privacy by preventing unauthorized access to personal data. On the other hand, biometric systems make use of personal data to secure values such as (other) personal data. In security applications it is essential to know the potential damage that could happen to the value to be secured and the ways how this could happen (commonly known as attacks). However, since not all kind of attacks are known in advance, it may be more realistic, at least for privacy, to allow something like a right to self-determination. Nobody must give reasons for refusing the use of his personal data. Here we will focus on the question, whether non-reconstructibility can prevent misuse. I will show now, that some kind of misuse cannot be prevented (provided that an attacker has access to a template)!Let us assume that only a partial reconstruction is possible. For example, from a raw fingerprint image only the positions of the minutiae shall be stored in the reference template. Now, with the knowledge of this (in future standardized!) data, it is possible to create a lot of fingerprint trial images with different ridge structure, but all with the same minutiae positions. This way, the raw data cannot be reconstructed uniquely from the template. However, any of these trial fingerprints, when processed by the feature extraction, deliver exactly the same template. That is, if I have the template data, I am able to fool at least that system, from which the original template is coming from. And this is at least a security problem! Similarly, if the minutiae locations are available, it should be possible without any problems, to perform an identification of the template under consideration against a large template data base, e.g., for the purpose of law enforcement, only by using a trial raw data set. This trial raw data set only must share that information with the unknown original raw data set the system will use for identification. The rest will be irrelevant! Even if the raw data wouldn't be reconstructible, there is no guarantee that template data cannot be misused.Impact from publicity of biometric features. Biometric features are more (= open) or less (= covered) public. Open features such as face, gait, or voice are easier to be captured without the knowledge or activity of the owner (= covert), whereas covered features such as fingerprint, iris, or retina need interaction of the owner (= overt). If one considers the coveredness as advantage with respect to security and privacy and wants to retain this advantage, non-reconstructibility becomes more essential for covered than for open features. Biometric data and hash operationsIt is well known from the processing of passwords, that it is more advantageous with respect to security to store not the password itself but the "hashed" value of the password. Hashing is an operation which may be described by the following properties:
The experience with passwords suggests us to store only the hashed biometric template data, making a reconstruction of original template data and thus of the raw data extremely difficult. Unfortunately, this method only makes sense in those trivial cases where the matcher is a simple comparator based on mathematical subtraction. But even if this should be the case, template data normally show variations due to unavoidable variations in the raw data. As a result, the hash values of biometric data will never be the same even if the raw data are coming from multiple samples of the same biometric feature. That is, hashing cannot be used for biometric templates. It is really one-way, unfortunately not only for attackers. (If two raw data sets are really equal, it must be assumed that one template is the digital copy of the other, e.g., originating from a replay attack. In biometric systems such equalities should be used to trigger an alarm rather than to enable a successful authentication!) Nevertheless, researchers are investigating alternative authentication methods to overcome the problem. Several techniques have been proposed or are under consideration, which are able to virtually reduce the matching process to a simple subtraction, i.e., the reference template as well as the request (sample) template can be represented as simple numbers or vectors which can be hashed. Today, all these methods still suffer from restricted biometric performance. It is still open if they ever will reach the performance of classical realizations. ConclusionThe global statement that biometric raw data cannot be reconstructed from template data is a very weak statement with respect to privacy and security for at least three reasons:
For real life, however, it is less important whether privacy can be infringed theoretically (or not). Essential is the effort necessary to achieve this and the possibilities for a concerned person to prevent it. From my point of view, the reconstructibility of raw data from template data has only a small impact on privacy compared to other flaws. It remains an academic discussion unless concrete scenarios are addressed. AcknowledgmentI would like to thank Paul Reid, James Wayman, and James Reisman for valuable suggestions!PublicationsSoutar, C.: "Biometric System Security", in: Secure No. 5 ,2002, p 46-49. Direct download: http://www.silicontrust.com/pdf/secure_5/46_techno_4.pdf (Site has gone)Hill, C.J.; Risk of Masquerade Arising from the Storage of Biometrics, B.S. Thesis, Australian National University, 2001. Download from page: http://chris.fornax.net/biometrics.html Adler, A.; Sample images
can be independently restored from face recognition templates, University
of Ottawa, 2003. Download from page:
Download former releases:
|