Motor-activity-related mental tasks are widely adopted for brain-computer interfaces (BCIs) as they are a natural extension of movement intention, requiring no training to evoke brain activity. The ideal BCI aims to eliminate neuromuscular movement, making motor imagery tasks, or imagined actions with no muscle movement, good candidates. This study explores cortical activation differences between motor imagery and motor execution for both upper and lower limbs using functional near-infrared spectroscopy (fNIRS). Four simple finger- or toe-tapping tasks (left hand, right hand, left foot, and right foot) were performed with both motor imagery and motor execution and compared to resting state. Significant activation was found during all four motor imagery tasks, indicating that they can be detected via fNIRS. Motor execution produced higher activation levels, a faster response, and a different spatial distribution compared to motor imagery, which should be taken into account when designing an imagery-based BCI. When comparing left versus right, upper limb tasks are the most clearly distinguishable, particularly during motor execution. Left and right lower limb activation patterns were found to be highly similar during both imagery and execution, indicating that higher resolution imaging, advanced signal processing, or improved subject training may be required to reliably distinguish them.
A statistical pattern-recognition technique was applied to the classification of musical instrument tones within a taxonomic hierarchy. Perceptually salient acoustic featuresrelated to the physical properties of source excitation and resonance structure-were measured from the output of an auditory model (the log-lag correlogram) for 1023 isolated tones over the full pitch ranges of 15 orchestral instruments. The data set included examples from the string (bowed and plucked), woodwind (single, double, and air reed), and brass families. Using 70%/30% splits between training and test data, maximum a posteriori classifiers were constructed based on Gaussian models arrived at through Fisher multiplediscriminant analysis. The classifiers distinguished transient from continuant tones with approximately 99% correct performance. Instrument families were identified with approximately 90% performance, and individual instruments were identified with an overall success rate of approximately 70%. These preliminary analyses compare favorably with human performance on the same task and demonstrate the utility of the hierarchical approach to classification.
In developing automated systems to recognize the emotional content of music, we are faced with a problem spanning two disparate domains: the space of human emotions and the acoustic signal of music. To address this problem, we must develop models for both data collected from humans describing their perceptions of musical mood and quantitative features derived from the audio signal. In previous work, we have presented a collaborative game, MoodSwings, which records dynamic (per-second) mood ratings from multiple players within the two-dimensional Arousal-Valence representation of emotion. Using this data, we present a system linking models of acoustic features and human data to provide estimates of the emotional content of music according to the arousal-valence space. Furthermore, in keeping with the dynamic nature of musical mood we demonstrate the potential of this approach to track the emotional changes in a song over time. We investigate the utility of a range of acoustic features based on psychoacoustic and music-theoretic representations of the audio for this application. Finally, a simplified version of our system is re-incorporated into MoodSwings as a simulated partner for single-players, providing a potential platform for furthering perceptual studies and modeling of musical mood.
The task of automatically annotating music with text tags (referred to as autotagging) is vital to creating a large-scale semantic music discovery engine. Yet for an autotagging system to be successful, a large and cleanly-annotated data set must exist to train the system. For this reason, we have collected a data set, called Swat10k, which consists of 10,870 songs annotated using a vocabulary of 475 acoustic tags and 153 genre tags from Pandora's Music Genome Project. The acoustic tags are considered "acoustically-objective" because they can be consistently applied to songs by expert musicologists. To develop an autotagging system, we use the Swat10k data set in conjunction with two new sets of content-based audio features obtained using the publiclyavailable Echo Nest API. The Echo Nest Timbre (ENT) features represent a song using a collection of short-time feature vectors. Compared with Mel-frequency cepstral coefficients (MFCCs), ENTs provide a more compact representation of music and improve autotagging performance. We also evaluate the Echo Nest Song (ENS) feature vector, which is a collection of mid-level acoustic features (e.g., beats per minute, average loudness). While the ENS features generally perform worse than the ENTs, they increase the performance of several individual tags. Furthermore, we plan to publicly release our song annotations and corresponding Echo Nest features so that other researchers will be able to use Swat10K to develop and compare alternative autotagging algorithms.
Research on creative cognition reveals a fundamental disagreement about the nature of creative thought, specifically, whether it is primarily based on automatic, associative (Type-1) or executive, controlled (Type-2) processes. We hypothesized that Type-1 and Type-2 processes make differential contributions to creative production that depend on domain expertise. We tested this hypothesis with jazz pianists whose expertise was indexed by the number of public performances given. Previous fMRI studies of musical improvisation have reported that domain expertise is characterized by deactivation of the right-dorsolateral prefrontal cortex (r-DLPFC), a brain area associated with Type-2 executive processing. We used anodal, cathodal, and sham transcranial direct current stimulation (tDCS) applied over r-DLPFC with the reference electrode on the contralateral mastoid (1.5 mA for 15 min, except for sham) to modulate the quality of the pianists' performances while they improvised over chords with drum and bass accompaniment. Jazz experts rated each improvisation for creativity, esthetic appeal, and technical proficiency. There was no main effect of anodal or cathodal stimulation on ratings compared to sham; however, a significant interaction between anodal tDCS and expertise emerged such that stimulation benefitted musicians with less experience but hindered those with more experience. We interpret these results as evidence for a dual-process model of creativity in which novices and experts differentially engage Type-1 and Type-2 processes during creative production.
The medium of music has evolved specifically for the expression of emotions, and it is natural for us to organize music in terms of its emotional associations. But while such organization is a natural process for humans, quantifying it empirically proves to be a very difficult task, and as such no dominant feature representation for music emotion recognition has yet emerged. Much of the difficulty in developing emotion-based features is the ambiguity of the ground-truth. Even using the smallest time window, opinions on the emotion are bound to vary and reflect some disagreement between listeners. In previous work, we have modeled human response labels to music in the arousal-valence (A-V) representation of affect as a time-varying, stochastic distribution. Current methods for automatic detection of emotion in music seek performance increases by combining several feature domains (e.g. loudness, timbre, harmony, rhythm). Such work has focused largely in dimensionality reduction for minor classification performance gains, but has provided little insight into the relationship between audio and emotional associations. In this new work we seek to employ regression-based deep belief networks to learn features directly from magnitude spectra. While the system is applied to the specific problem of music emotion recognition, it could be easily applied to any regression-based audio feature learning problem.
Abstract-This work investigates the potential of a fourclass motor-imagery-based brain-computer interface (BCI) using functional near-infrared spectroscopy (fNIRS). Four motor imagery tasks (right hand, left hand, right foot, and left foot tapping) were executed while motor cortex activity was recorded via fNIRS. Preliminary results from three participants suggest that this could be a viable BCI interface, with two subjects achieving 50% accuracy. fNIRS is a noninvasive, safe, portable, and affordable optical brain imaging technique used to monitor cortical hemodynamic changes. Because of its portability and ease of use, fNIRS is amenable to deployment in more natural settings. Electroencephalography (EEG) and functional magnetic resonance imaging (fMRI) BCIs have already been used with up to four motor-imagery-based commands. While fNIRS-based BCIs are relatively new, success with EEG and fMRI systems, as well as signal characteristics similar to fMRI and complementary to EEG, suggest that fNIRS could serve to build or augment future BCIs.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.