Abstract:Biometrics is an emerging technology more and more present in our daily life. However, building biometric systems requires a large amount of data that may be difficult to collect. Collecting such sensitive data is also very time consuming and constrained, s.a. GDPR legislation. In the case of keystroke dynamics, existing databases have less than 200 users. For these reasons, we aim at generating a keystroke dynamics synthetic dataset. This paper presents the generation of keystroke data from known users as a f… Show more
“…This invited article supports and improves the results of the original "Analysis of Keystroke Dynamics For the Generation of Synthetic Datasets" [10].…”
Biometrics is an emerging technology more and more present in our daily life. However, building biometric systems requires a large amount of data that may be difficult to collect. Collecting such sensitive data is also very time consuming and constrained, s.a. GDPR legislation in Europe. In the case of keystroke dynamics, most existing databases have less than 200 users. For these reasons, it is crucial for this biometric modality to be able to generate a significant and realistic synthetic dataset of keystroke dynamics samples. We propose in this paper an original approach for the generation of synthetic keystroke data given samples from known users as a first step towards the generation of synthetic datasets. Experimental results show the capability of the proposed statistical model to generate realistic samples from existing datasets in the literature.
“…This invited article supports and improves the results of the original "Analysis of Keystroke Dynamics For the Generation of Synthetic Datasets" [10].…”
Biometrics is an emerging technology more and more present in our daily life. However, building biometric systems requires a large amount of data that may be difficult to collect. Collecting such sensitive data is also very time consuming and constrained, s.a. GDPR legislation in Europe. In the case of keystroke dynamics, most existing databases have less than 200 users. For these reasons, it is crucial for this biometric modality to be able to generate a significant and realistic synthetic dataset of keystroke dynamics samples. We propose in this paper an original approach for the generation of synthetic keystroke data given samples from known users as a first step towards the generation of synthetic datasets. Experimental results show the capability of the proposed statistical model to generate realistic samples from existing datasets in the literature.
“…To our best knowledge, this is the first systematic attempt to compare several distributions for fitting keystroke dynamics timing profiles when the text is not short and fixed, as in a password or a passphrase. Attempting to overcome the limitations in existing datasets, Migdal and Rosenberger [11,12] have carried out a detailed comparison of almost twenty candidate distributions for the generation of synthetic datasets using statistical models; the Gumbel distribution provided the best overall fit. Our approach differs in the target tasks that were considered and the evaluation criteria; while theirs, using the GREYC dataset [25], represents short fixed texts like usernames and passwords that the user has typed repeatedly, ours is focused on free text composition and transcription tasks.…”
Section: Previous Studiesmentioning
confidence: 99%
“…Going beyond authentication, [28] and [29] employ the sigma-lognormal model of rapid human movements to detect the age group of users based on their interaction with a touch screen, while [30] leverages different distributions to discriminate a human user from a bot. No other systematic comparison of distributions for the task of fitting keystroke timings histograms was found other than the aforementioned [21], [22], and [11 1.…”
Section: Previous Studiesmentioning
confidence: 99%
“…No claim is made about the shape of timing distributions generated by other types of writing tasks; in particular, password typing and short fixed texts were not considered. The reader interested in these cases is referred to [11,12].…”
Section: Limitations Of This Studymentioning
confidence: 99%
“…Considering that most distance metrics and classification methods are sensitive to discrepancies between the assumed model and the empirical data, it is puzzling that a systematic study of histogram shapes was not an early step in the discipline. Not long ago a systematic comparison of a large number of candidates has been carried out [11,12] but, unfortunately, it is restricted to fixed text.…”
Keystroke dynamics is a soft biometric trait. Although the shape of the timing distributions in keystroke dynamics profiles is a central element for the accurate modeling of the behavioral patterns of the user, a simplified approach has been to presuppose normality. Careful consideration of the individual shapes for the timing models could lead to improvements in the error rates of current methods or possibly inspire new ones. The main objective of this study is to compare several heavy-tailed and positively skewed candidate distributions in order to rank them according to their merit for fitting timing histograms in keystroke dynamics profiles. Results are summarized in three ways: counting how many times each candidate distribution provides the best fit and ranking them in order of success, measuring average information content, and ranking candidate distributions according to the frequency of hypothesis rejection with an Anderson-Darling goodness of fit test. Seven distributions with two parameters and seven with three were evaluated against three publicly available free-text keystroke dynamics datasets. The results confirm the established use in the research community of the log-normal distribution, in its two-and three-parameter variations, as excellent choices for modeling the shape of timings histograms in keystroke dynamics profiles. However, the log-logistic distribution emerges as a clear winner among all two-and three-parameter candidates, consistently surpassing the log-normal and all the rest under the three evaluation criteria for both hold and flight times.
The exponential growth in the use of smartphones means that users must constantly be concerned about the security and privacy of mobile data because the loss of a mobile device could compromise personal information. To address this issue, continuous authentication systems have been proposed, in which users are monitored transparently after initial access to the smartphone. In this study, the authors address the problem of user authentication by considering human activities as behavioural biometric information. The authors convert the behavioural biometric data (considered as time series) into a 2D colour image. This transformation process keeps all the characteristics of the behavioural signal. Time series does not receive any filtering operation with this transformation, and the method is reversible. This signal-to-image transformation allows us to use the 2D convolutional networks to build efficient deep feature vectors. This allows them to compare these feature vectors to the reference template vectors to compute the performance metric. The authors evaluate the performance of the authentication system in terms of Equal Error Rate on a benchmark University of Californy, Irvine Human Activity Recognition dataset, and they show the efficiency of the approach.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.