“…The results of up-sampling are fuse bottom-up generated feature maps of the same scale, that is, added pixel by pi the experiment, bilinear interpolation is selected for up-sampling. X is obtained attention module, which reduces the number of channels to 256 by 1 × 1 convo X′ through (6):…”
Section: Feature Fusionmentioning
confidence: 99%
“…The undetectable manipulation of digital speech avatars poses substantial threats to judicial processes, political fields, and social security. Contemporary speech forensics techniques are pivotal in ensuring the integrity of digital avatars and focus on detecting tampering facilitated by audio editing software, such as deletion, insertion, copy and move, splicing, resampling and recompression of audio clips [4][5][6][7]. It is worth noting that in the field of speech content forensics, there are more forensic methods for speech deletion, copy and move, splicing, and other tampering approaches [8][9][10], while there are relatively few methods for speech resampling forensics, and these tampering means are often accompanied by resampling operations.…”
Speech forgery and tampering, increasingly facilitated by advanced audio editing software, pose significant threats to the integrity and privacy of digital speech avatars. Speech resampling is a post-processing operation of various speech-tampering means, and the forensic detection of speech resampling is of great significance. For speech resampling detection, most of the previous works used traditional methods of feature extraction and classification to distinguish original speech from forged speech. In view of the powerful ability of deep learning to extract features, this paper converts the speech signal into a spectrogram with time-frequency characteristics, and uses the feature pyramid network (FPN) with the Squeeze and Excitation (SE) attention mechanism to learn speech resampling features. The proposed method combines the low-level location information and the high-level semantic information, which dramatically improves the detection performance of speech resampling. Experiments were carried out on a resampling corpus made on the basis of the TIMIT dataset. The results indicate that the proposed method significantly improved the detection accuracy of various resampled speech. For the tampered speech with a resampling factor of 0.9, the detection accuracy is increased by nearly 20%. In addition, the robustness test demonstrates that the proposed model has strong resistance to MP3 compression, and the overall performance is better than the existing methods.
“…The results of up-sampling are fuse bottom-up generated feature maps of the same scale, that is, added pixel by pi the experiment, bilinear interpolation is selected for up-sampling. X is obtained attention module, which reduces the number of channels to 256 by 1 × 1 convo X′ through (6):…”
Section: Feature Fusionmentioning
confidence: 99%
“…The undetectable manipulation of digital speech avatars poses substantial threats to judicial processes, political fields, and social security. Contemporary speech forensics techniques are pivotal in ensuring the integrity of digital avatars and focus on detecting tampering facilitated by audio editing software, such as deletion, insertion, copy and move, splicing, resampling and recompression of audio clips [4][5][6][7]. It is worth noting that in the field of speech content forensics, there are more forensic methods for speech deletion, copy and move, splicing, and other tampering approaches [8][9][10], while there are relatively few methods for speech resampling forensics, and these tampering means are often accompanied by resampling operations.…”
Speech forgery and tampering, increasingly facilitated by advanced audio editing software, pose significant threats to the integrity and privacy of digital speech avatars. Speech resampling is a post-processing operation of various speech-tampering means, and the forensic detection of speech resampling is of great significance. For speech resampling detection, most of the previous works used traditional methods of feature extraction and classification to distinguish original speech from forged speech. In view of the powerful ability of deep learning to extract features, this paper converts the speech signal into a spectrogram with time-frequency characteristics, and uses the feature pyramid network (FPN) with the Squeeze and Excitation (SE) attention mechanism to learn speech resampling features. The proposed method combines the low-level location information and the high-level semantic information, which dramatically improves the detection performance of speech resampling. Experiments were carried out on a resampling corpus made on the basis of the TIMIT dataset. The results indicate that the proposed method significantly improved the detection accuracy of various resampled speech. For the tampered speech with a resampling factor of 0.9, the detection accuracy is increased by nearly 20%. In addition, the robustness test demonstrates that the proposed model has strong resistance to MP3 compression, and the overall performance is better than the existing methods.
“…Dhiman et al [29] used GLCM and LBP for content-based image retrieval to apply to the CORAL dataset. Suleman et al [30] and Zeeshan et al [31] used contextual techniques to find similarities in 1D and 2D signals. Sukhjeet et al [32] used a hybrid approach which utilizes color space and quaternion moment vector to create this unique feature vector.…”
This paper presents a novel feature descriptor termed principal component analysis (PCA)-based Advanced Local Octa-Directional Pattern (ALODP-PCA) for content-based image retrieval. The conventional approaches compare each pixel of an image with certain neighboring pixels providing discrete image information. The descriptor proposed in this work utilizes the local intensity of pixels in all eight directions of its neighborhood. The local octa-directional pattern results in two patterns, i.e., magnitude and directional, and each is quantized into a 40-bin histogram. A joint histogram is created by concatenating directional and magnitude histograms. To measure similarities between images, the Manhattan distance is used. Moreover, to maintain the computational cost, PCA is applied, which reduces the dimensionality. The proposed methodology is tested on a subset of a Multi-PIE face dataset. The dataset contains almost 800,000 images of over 300 people. These images carries different poses and have a wide range of facial expressions. Results were compared with state-of-the-art local patterns, namely, the local tri-directional pattern (LTriDP), local tetra directional pattern (LTetDP), and local ternary pattern (LTP). The results of the proposed model supersede the work of previously defined work in terms of precision, accuracy, and recall.
“…In addition to this, the use of other advanced technologies such as artificial intelligence (AI), big data analytics (BDA), machine learning (ML), and other emerging tools helped in utilizing collected data effectively through different sources in the network. Therefore, through this practice, the processed data can be used to improve system efficiency and performance [2,3]. To accomplish a highly interactive, efficient but secure network, various elements and factors such as data privacy, authentication, ease of use and maintenance, and high security standards against possible attacks are needed.…”
Data security is a major issue for smart home networks. Yet, different existing tools and techniques have not been proven highly effective for home networks’ data security. Blockchain is a promising technology because of the distributed computing infrastructure network that makes it difficult for hackers to intrude into the systems through the use of cryptographic signatures and smart contracts. In this paper, an architecture for smart home networks that could guarantee data integrity, robust security, and the ability to protect the validity of the blockchain transactions has been investigated. The system model is tested using various sizes of realistic datasets (30, 3 k, and 30 k to represent a small, medium, and large number of transactions, respectively). Four different consensus algorithms were considered, the conventional schemes concatenated hash transactions (CHT) and Merkle hash tree (MHT), as well as the newly proposed odd and even modified MHT (O&E MHT) and modified MHT (MMHT). Moreover, 15 hash functions were also examined and compared to understand the effects of each consensus algorithms on the data integrity verification check execution time and the time optimization provided by the proposed MMHT algorithm. The results show that even though the CHT algorithm gives the lowest execution time, it is impractical for a blockchain implementation due to the requirement to copy the entire blockchain ledger in real time. Meanwhile, the O&E MHT does not give any tangible benefit in the execution time. However, the proposed MMHT offers a minimum of 30% gain in time optimization than the conventional MHT algorithm typically used in blockchains. This work shows that the proposed MMHT consensus algorithm not only can identify malicious codes but has an improved data integrity check performance in smart homes, all while ensuring network stability.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.