Abstract:Digital media forensics can exploit the electric network frequency of audio signals to detect tampering. However, current electric network based audio forensic schemes are limited by their inability to obtain concurrent electric network frequency reference datasets from power grids. In addition, most forensic algorithms do not provide high detection precision in adverse signal-to-noise conditions.This chapter proposes an automated electric network frequency based audio forensic scheme that monitors abrupt muta… Show more
“…The experimental results demonstrate that there is no proportional enhancement in the model's performance with an increase 14/19 …”
mentioning
confidence: 89%
“…The study conducted experiments on audio data tampering detection by inserting with 1-second, 2-second, and 3-second segments, whose results showed the highest detection accuracy for 3-second insert tampering. Mao et al 19 proposed a two-dimensional convolutional neural network model for binary classification of original audio and tampered audio. Zeng et al 27 proposed an audio tampering detection method based on ENF phase sequence representation learning.…”
Section: Related Workmentioning
confidence: 99%
“…The advent of deep convolutional neural networks provides a compelling alternative, enabling the automatic extraction of latent features from audio without enabling the. Mao et al put forward a two-dimensional convolutional neural network for binary classification of raw and tampered audio 19 . However, their approach focused solely on the fundamental ENF wave, and neglected the higher-order harmonic components, thereby omitting some distinctive characteristics.…”
The extensive adoption of digital audio recording has revolutionized its application in digital forensics, particularly in civil litigation and criminal prosecution. Electric Network Frequency (ENF) has emerged as a reliable technique in the field of audio forensics. However, the absence of comprehensive ENF reference datasets limits current ENF-based methods. To address this, this study introduces ATD, a blind audio forensics framework based on a One-Dimensional Convolutional Neural Network (1D-CNN) model. ATD can identify phase mutations and waveform discontinuities within the tampered ENF signal, without relying on an ENF reference database. To enhance feature extraction, the framework incorporates characteristics of the fundamental harmonics of ENF signals. In addition, a denoising method termed ENF Noise Reduction (ENR) based on the Variational Mode Decomposition (VMD) and Robust Filtering Algorithm (RFA) is proposed to reduce the impact of external noise on embedded Electric Network Frequency signals. This study investigates three distinct types of audio tampering—deletion, insertion, and replacement—culminating in the design of binary-class tampering detection scenarios and four-class tampering detection scenarios tailored to these tampering types. ATD achieves a tampering detection accuracy of over 93% in the four-class scenario and exceeds 96% in the binary-class scenario. The effectiveness, efficiency, adaptability, and robustness of ATD in the two and four classification scenarios have been confirmed by extensive experiments.
“…The experimental results demonstrate that there is no proportional enhancement in the model's performance with an increase 14/19 …”
mentioning
confidence: 89%
“…The study conducted experiments on audio data tampering detection by inserting with 1-second, 2-second, and 3-second segments, whose results showed the highest detection accuracy for 3-second insert tampering. Mao et al 19 proposed a two-dimensional convolutional neural network model for binary classification of original audio and tampered audio. Zeng et al 27 proposed an audio tampering detection method based on ENF phase sequence representation learning.…”
Section: Related Workmentioning
confidence: 99%
“…The advent of deep convolutional neural networks provides a compelling alternative, enabling the automatic extraction of latent features from audio without enabling the. Mao et al put forward a two-dimensional convolutional neural network for binary classification of raw and tampered audio 19 . However, their approach focused solely on the fundamental ENF wave, and neglected the higher-order harmonic components, thereby omitting some distinctive characteristics.…”
The extensive adoption of digital audio recording has revolutionized its application in digital forensics, particularly in civil litigation and criminal prosecution. Electric Network Frequency (ENF) has emerged as a reliable technique in the field of audio forensics. However, the absence of comprehensive ENF reference datasets limits current ENF-based methods. To address this, this study introduces ATD, a blind audio forensics framework based on a One-Dimensional Convolutional Neural Network (1D-CNN) model. ATD can identify phase mutations and waveform discontinuities within the tampered ENF signal, without relying on an ENF reference database. To enhance feature extraction, the framework incorporates characteristics of the fundamental harmonics of ENF signals. In addition, a denoising method termed ENF Noise Reduction (ENR) based on the Variational Mode Decomposition (VMD) and Robust Filtering Algorithm (RFA) is proposed to reduce the impact of external noise on embedded Electric Network Frequency signals. This study investigates three distinct types of audio tampering—deletion, insertion, and replacement—culminating in the design of binary-class tampering detection scenarios and four-class tampering detection scenarios tailored to these tampering types. ATD achieves a tampering detection accuracy of over 93% in the four-class scenario and exceeds 96% in the binary-class scenario. The effectiveness, efficiency, adaptability, and robustness of ATD in the two and four classification scenarios have been confirmed by extensive experiments.
“…Lin and Kang [8] proposed a wavelet-filtered ENF signal to highlight the abnormal ENF variations and employed autoregressive coefficients to train the classifier under a supervised-learning framework. Mao et al [9] utilized the multiple ENF features as input eigenvectors to the convolutional neural networks for detecting spliced audio. Meng et al [4] used the spectral entropy method to determine the length of each syllable and calculated the variance of the background noise of each syllable, then judged whether there is an operation of the heterogeneous splicing tampering in the audio by comparing the similarities between the variance of the background noise of each syllable.…”
Section: Audio Splicing Detentionmentioning
confidence: 99%
“…However, when the signal-to-noise ratio between the spliced segments is close or even the same, the performance of the noise levels based audio splicing detection methods will decrease sharply. In addition, based on the fact that inserting an audio segment into another audio recording leads to anomalous variations of the electric network frequency (ENF) signal, several kinds of research [7][8][9] have shown that it is an efficient way to detect spliced audio via the analysis of ENF signal. Whereas due to legal restrictions, it is difficult to obtain concurrent reference datasets of power systems, which makes the ENF based audio splicing detection methods difficult to implement [10].…”
Audio splicing means inserting an audio segment into another audio, which presents a great challenge to audio forensics. In this paper, a novel audio splicing detection and localization method based on an encoder-decoder architecture (ASLNet) is proposed. Firstly, an audio clip is divided into several small audio segments according to the size of the smallest localization region
L
slr
, and the acoustic feature matrix and corresponding binary ground truth mask are created from each audio segment. Then, we concatenate acoustic feature matrices from all segments of an audio clip into an acoustic feature matrix and send it to a fully convolutional network (FCN) based encoder-decoder architecture which consists of a series of convolutional, pooling and transposed convolutional layers to get a binary output mask. Next, the binary output mask is divided into small segments according to the
L
slr
, and the ratio
ρ
of the number of elements equal to one to the number of all elements in a small segment is calculated. Finally, we compare
ρ
with the predetermined threshold
T
to determine whether the corresponding audio segment is spliced. We evaluate the effectiveness of the proposed ASLNet on four datasets produced from publicly available speech corpus. Extensive experiments show that the best detection accuracy of ASLNet for the intradatabase and cross-database evaluation can achieve 0.9965 and 0.9740 receptively, which outperforms the state-of-the-art method.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.