How vulnerable are biometric Big Data systems: causes of errors and metrics for measuring them
In the last article, we talked about the largest data leaks from biometric Big Data systems. Today we look at the characteristic vulnerabilities of biometrics: the natural limitations of personal identification methods using machine learning (ML) and targeted attacks.
2 main vulnerabilities of biometric Big Data systems based on Machine Learning
First of all, we note that for biometric systems the same risk factors are characteristic as for any Big Data project. Why data leaks happen: mainly people (third-party hackers or internal users), infrastructure problems, software vulnerabilities or third-party services are to blame. However, in addition to these reasons, biometrics have specific problems that are directly related to the personality recognition algorithms themselves based on machine learning methods. Therefore, they are called the natural limitations of biometric identification methods. In this case, errors of the 1st and 2nd kind may occur in the error matrix (confusion matrix):
• false match due to an intruder who managed to deceive ML recognition algorithms by posing as another user – the False Positive option (FP), error of the first kind;
• false inconsistency and denial of service, when the ML model could not recognize a legitimate user without finding a suitable digital template for the submitted biometric personal data (BPD) in the database – False Negative option (FN), error of the second kind.
A type 2 error is mainly related to the quality of recognition algorithms and / or the quality of input data. And errors of the first kind, as a rule, arise as a result of a fake attack when the biometric feature used in ML algorithms is falsified. For example, an artificial finger with the desired fingerprints, a three-dimensional face mask, or even the real part of the body of a legitimate user, cut off from him. It was this incident that occurred with the owner of a premium car in Malaysia in 2005, which criminals crippled while trying to steal his car. However, cybercriminals successfully use less traumatic methods of manufacturing fake biometric media. In particular, hackers imitate the necessary fingerprints using silicone films, graphite powder and superglue, and face photos with plaster copies of the head and masks. Such methods make it possible to deceive simple biometric identification systems in smartphones with not too complicated ML-algorithms and / or not the most sensitive sensors.
In fact, both variants of false positives are highly undesirable, because entail unlawful actions with information (in the case of FP) or dissatisfaction of the user (in the case of FN), which leads to reputation losses and increases the likelihood of a churn rate.
What is the most effective biometrics: analysis of metrics and factors
Machine Learning-based biometric systems do not work according to the principle of unambiguously matching the previously presented template. Typically, the comparison algorithm decides whether the data matches the degree of proximity of the presented samples to the template. Therefore, the developers of the ML recognition model seek to find a balance between the FAR and FRR indicators by varying the value of this delta (threshold) of data proximity. For example, when the threshold is reduced, there will be less false mismatches, but more false tricks. A high threshold will reduce FAR, but increase FRR. To determine this balance, the EER coefficient is used, at which reception errors and deviations are equivalent and arise with equal probability. Low EER systems are believed to be more accurate. It is also worth noting the upward trend in the sensitivity of biometric devices, which reduces FAR, but increases FRR.
However, with the same FAR value, the biometry with a lower FRR will be better. The FAR and FRR values determine how many users the system will work efficiently without annoying with its errors. This number is usually inversely proportional to the square root of the parameter being analyzed. For example, with a FAR of 0.01% and an acceptable error level of not more than 1 per day, it is advisable to use the biometric system in companies with up to 100 employees. And the unified biometric system introduced in 2018 in Russia implies a recognition accuracy of 1 in 10,000,000, i.e. for 10 million cases, one single recognition error is possible. At the same time, a single biometric system uses 2 biometric parameters to identify the person: three-dimensional scanning of the face and voice.
It is worth remembering that the recognition success, and, therefore, the FAR, FRR indicators and other metrics for evaluating the effectiveness of the biometric system, depends on the nature and amount of data used. Of course, multifactor systems that use a combination of several biometric parameters, for example, drawing veins on the palms, features of the iris and gait, are more reliable. However, such an integrated approach increases the complexity and, consequently, the cost of implementation. In addition, when choosing biometric methods, the application context and operating conditions of such a Big Data system should be taken into account. In the next article, we will talk about the most popular methods of biometric identification. Let’s touch on some “exotic” ways of identifying a person: by smell, heartbeat and internal vibrations.