In many areas multimedia technology has made its way into mainstream. In the case of digital audio this is manifested in numerous online music stores having turned into profitable businesses. The widespread user adaption of digital audio both on home computers and mobile players show the size of this market. Thus, ways to automatically process and handle the growing size of private and commercial collections become increasingly important; along goes a need to make music interpretable by computers. The most obvious representation of audio files is their sound -there are, however, more ways of describing a song, for instance its lyrics, which describe songs in terms of content words. Lyrics of music may be orthogonal to its sound, and differ greatly from other texts regarding their (rhyme) structure. Consequently, the exploitation of these properties has potential for typical music information retrieval tasks such as musical genre classification; so far, there is a lack of means to efficiently combine these modalities. In this paper, we present findings from investigating advanced lyrics features such as the frequency of certain rhyme patterns, several parts-of-speech features, and statistic features such as words per minute (WPM). We further analyse in how far a combination of these features with existing acoustic feature sets can be exploited for genre classification and provide experiments on two test collections.
Unsupervised learning is very important in the processing of multimedia content as clustering or partitioning of data in the absence of class labels is often a requirement. This chapter begins with a review of the classic clustering techniques of k-means clustering and hierarchical clustering. Modern advances in clustering are covered with an analysis of kernel-based clustering and spectral clustering. One of the most popular unsupervised learning techniques for processing multimedia content is the self-organizing map, so a review of self-organizing maps and variants is presented in this chapter. The absence of class labels in unsupervised learning makes the question of evaluation and cluster quality assessment more complicated than in supervised learning. So this chapter also includes a comprehensive analysis of cluster validity assessment techniques.
Procedures have been developed for the synthesis of 5,6-diamino-2,4-dihydroxypyrimidine and 4-hydroxy-2,5,6-triaminopyrimidine bisulfite in appreciably better yields and involving fewer isolations of intermediate products than previously reported.2. These compounds have been condensed with several dicarbonyl compounds to yield pyrimido [4,5-b]pyrazines symmetrically substituted in the 6-and 7positions.3. Ultraviolet absorption spectra of alkaline solutions of the compounds have been measured. Ithaca, X. Y.
High dependence on web services and service-oriented architecture affects not only business solutions, but also scientific research. Web services may be delivered by third parties, and thus are candidates for outsourcing. However, they represent a source of risks, which can jeopardise the robustness of processes. Hence, there is a need for actions which can contribute to the mitigation of possible threats to the continuity of processes. In this paper, risk affecting processes are classified, followed by a discussion about particular changes stemming from web services. Three distinct approaches allowing improvements are described: a newly proposed web services monitoring framework supported by a software solution, the concept of resilient web services, which specifies new design requirements for web services, and digital preservation strategies, which apart from long-term benefits can support sustainability of currently running processes.
The re-usability and repeatability of e-Science experiments is widely understood as a requirement of validating and reusing previous work in data-intensive domains. Experiments are, however, often complex chains of processing, involving a number of data sources, computing infrastructure, software tools, or external and third-party services, rendering repeatability a challenging task. Another important aspect of many experiments is in the social and organisational dimension -very often, knowledge on how experiments are performed is tacit and remains with the researcher, and the collaborative and distributed aspects especially of larger collaborative experiments adds to this challenge. Therefore, a number of approaches have tackled this issue from various angles -initiatives for data sharing, code versioning and publishing as open source, the use of workflow engines to formalise the steps taken in an experiment, to ways to describe the complex environment an experiment is executed in, e.g. via Research Objects. In this paper, we present a model that has a specific focus on the technical infrastructure that is the basis of the research experiment. We demonstrate how this model can be applied to describe e-Science experiments, and align and compare it to Research Objects.
With the recent advances and increasing activities in data mining and analysis, the protection of the privacy of individuals is crucial. Several approaches address this concern, from techniques like data anonymisation to secure, non-disclosive computation, all of which have their specific strengths and weaknesses, depending on the specific requirements. A slightly different approach is the generation of synthetic data, which tries to preserve the overall properties and characteristics of the original data without revealing information about actual individual data samples. The promise is that, for most purposes, models trained on the synthetic data instead of the real data do not show a significant loss of performance. In this paper, we give an overview on currently available approaches for synthetic data generation, and empirically evaluate the utility of the generated synthetic data by testing them on a number of supervised machine learning tasks on several publicly available datasets. CCS CONCEPTS • Computing methodologies → Supervised learning; • Security and privacy → Data anonymization and sanitization; Usability in security and privacy; Privacy protections;
With ever increasing capacity for collecting, storing, and processing of data, there is also a high demand for intelligent data analysis methods. While there have been impressive advances in machine learning and similar domains in recent years, this also gives rise to concerns regarding the protection of personal and otherwise sensitive data, especially if it is to be analysed by third parties. Besides anonymisation, which becomes challenging with high dimensional data, one approach for privacy-preserving data mining lies in the usage of synthetic data, which comes with the promise of protecting the users' data and producing analysis results close to those achieved by using real data. In this paper, we analyse a number of different approaches for creating synthetic data, and study the utility of the created datasets for regression tasks, i.e. the prediction of a numeric value. We further investigate the similarity of real and synthetic data samples. Finally, we contribute to privacy assessments and measurements of the risk of attribute disclosure on synthetic data by extending an approach developed for categorical data.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.