Linear Regression-based Classifier for audio visual person identification

Alam, Mohammad Rafiqul; Togneri, Roberto; Sohel, Ferdous; Bennamoun, Mohammed; Naseem, Imran

doi:10.1109/iccspa.2013.6487281

Cited by 11 publications

(8 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…MFCCs have been widely used AV for person recognition [6], [7], [16], [23], [35], [46], [69], [73], [89], [94], [123], [125], [144], [145]. Classification methods based on Gaussian Mixture Models (GMMs), Vector Quantization (VQ) have displayed a consistent speaker recognition performance using MFCCs.…”

Section: ) Cepstral Coefficientsmentioning

confidence: 99%

“…When the face is more occluded, Haar cascade classifiers are used for detecting the eye portion of the image. An integral image representation that reduces time complexity and uses Haar-based features to perform AV person identification in [6]. Further, K-SVD (Single Value Decomposition) algorithm is used to create a dictionary for every video sample [105] by taking advantage of high redundancy between the video frames.…”

Section: ) Convolution Kernel Based Featuresmentioning

confidence: 99%

“…The fused method achieved an EER of 22.7% for males, 19.3% for females, and an average of 21.6%, which are far better than the EERs of individual cues. In [6], a novel Linear Regression Gaussian Mixture Models along with Universal Background Model (LRC-GMM-UBM) is used for speaker recognition. For complementing the voice utterance, a Linear Regression-based Classifier (LRC) is used for face recognition.…”

Section: Post-mapping or Late Fusionmentioning

confidence: 99%

See 2 more Smart Citations

Audio-Visual Biometric Recognition and Presentation Attack Detection: A Comprehensive Survey

Mandalapu

Ramachandra

et al. 2021

IEEE Access

View full text Add to dashboard Cite

Biometric recognition is a trending technology that uses unique characteristics data to identify or verify/authenticate security applications. Amidst the classically used biometrics, voice and face attributes are the most propitious for prevalent applications in day-to-day life because they are easy to obtain through restrained and user-friendly procedures. The pervasiveness of low-cost audio and face capture sensors in smartphones, laptops, and tablets has made the advantage of voice and face biometrics more exceptional when compared to other biometrics. For many years, acoustic information alone has been a great success in automatic speaker verification applications. Meantime, the last decade or two has also witnessed a remarkable ascent in face recognition technologies. Nonetheless, in adverse unconstrained environments, neither of these techniques achieves optimal performance. Since audio-visual information carries correlated and complementary information, integrating them into one recognition system can increase the system's performance. The vulnerability of biometrics towards presentation attacks and audio-visual data usage for the detection of such attacks is also a hot topic of research. This paper made a comprehensive survey on existing state-of-the-art audio-visual recognition techniques, publicly available databases for benchmarking, and Presentation Attack Detection (PAD) algorithms. Further, a detailed discussion on challenges and open problems is presented in this field of biometrics.INDEX TERMS Biometrics, audio-visual person recognition, presentation attack detection.

show abstract

Section: ) Cepstral Coefficientsmentioning

confidence: 99%

Section: ) Convolution Kernel Based Featuresmentioning

confidence: 99%

Section: Post-mapping or Late Fusionmentioning

confidence: 99%

See 1 more Smart Citation

Audio-Visual Biometric Recognition and Presentation Attack Detection: A Comprehensive Survey

Mandalapu

Ramachandra

et al. 2021

IEEE Access

View full text Add to dashboard Cite

show abstract

“…We used the LRC-GMM-UBM and LRC-ROI-RAW frameworks that we previously used in our works in [23,24] as the matchers of the audio and visual modalities, respectively. The main concept is that the samples from a specific user lie on a linear subspace, and therefore the task of person identification is considered to be a linear regression problem [25].…”

Section: Systemmentioning

confidence: 99%

A confidence-based late fusion framework for audio-visual biometric identification

Alam

Bennamoun

Togneri

et al. 2015

Pattern Recognition Letters

View full text Add to dashboard Cite

a b s t r a c tThis paper presents a confidence-based late fusion framework and its application to audio-visual biometric identification. We assign each biometric matcher a confidence value calculated from the matching scores it produces. Then a transformation of the matching scores is performed using a novel confidence-ratio (C-ratio) i.e., the ratio of a matcher confidence obtained at the test phase to the corresponding matcher confidence obtained at the training phase. We also propose modifications to the highest rank and Borda count rank fusion rules to incorporate the matcher confidence. We demonstrate by experiments that our proposed confidencebased fusion framework is more robust compared to the state-of-the-art late (score-and rank-level) fusion approaches.

show abstract

“…In the field of visual-audio dual-modality biometrics, the general pipelines follow three stages including raw feature extraction, feature fusion for joint representation, and identity decision. Modality fusion [5] can be categorized as feature-level (early) [6], [7], classifier-level (intermediate) [8], [9], [10], [11] or score/decision-level (late) [12], [13], [14] fusion. Existing visual-audio biometrics algorithms are listed in Table I.…”

Section: A Related Workmentioning

confidence: 99%

Dual-modality Talking-metrics: 3D Visual-Audio Integrated Behaviometric Cues from Speakers

Zhang

Richmond

Fisher

2018

2018 24th International Conference on Pattern Recognition (ICPR)

View full text Add to dashboard Cite

Face-based behaviometrics focus on dynamic biological signatures generated from face behaviors, which are informative and subject-specific for identity recognition. Most existing face behaviometrics rely on 2D visual features and thus are sensitive to pose or intensity variations. This paper presents a dual-modality behaviometrics algorithm (talking-metrics) that integrates 3D video and audio cues from a human face speaking a passphrase. Static and dynamic 3D face features are extracted algorithmically and audio features are transformed through a few learning models. We concatenate the top 18 discriminative 3D visual-audio features to represent the bi-modality and utilize an linear discrimant analysis (LDA) classifier for identity recognition. The experiments were conducted on a new publicly released dataset (S3DFM). Both qualitative feature distributions and quantitative comparison results show the feasibility of the proposed pipeline and the superiority over using each modality independently. A 98.5% cross-validation recognition rate over 60 subjects and 10 trials was achieved. An anti-spoofing test also demonstrates the robustness of the proposed method.

show abstract

Linear Regression-based Classifier for audio visual person identification

Cited by 11 publications

References 15 publications

Audio-Visual Biometric Recognition and Presentation Attack Detection: A Comprehensive Survey

Audio-Visual Biometric Recognition and Presentation Attack Detection: A Comprehensive Survey

A confidence-based late fusion framework for audio-visual biometric identification

Dual-modality Talking-metrics: 3D Visual-Audio Integrated Behaviometric Cues from Speakers

Contact Info

Product

Resources

About