Amir Shirian scite author profile

2021

We propose a deep graph approach to address the task of speech emotion recognition. A compact, efficient and scalable way to represent data is in the form of graphs. Following the theory of graph signal processing, we propose to model speech signal as a cycle graph or a line graph. Such graph structure enables us to construct a Graph Convolution Network (GCN)-based architecture that can perform an accurate graph convolution in contrast to the approximate convolution used in standard GCNs. We evaluated the performance of our model for speech emotion recognition on the popular IEMOCAP and MSP-IMPROV databases. Our model outperforms standard GCN and other relevant deep graph architectures indicating the effectiveness of our approach. When compared with existing speech emotion recognition methods, our model achieves comparable performance to the state-of-the-art with significantly fewer learnable parameters (∼30K) indicating its applicability in resource-constrained devices. Our code is available at /github.com/AmirSh15/Compact SER.

Dynamic Emotion Modeling With Learnable Graphs and Graph Inception Network

Tripathi

2022

IEEE Trans. Multimedia

Self-Supervised Graphs for Audio Representation Learning With Limited Labeled Data

Somandepalli

IEEE J. Sel. Top. Signal Process.

2022

Self-supervised Graphs for Audio Representation Learning with Limited Labeled Data

Shirian¹,

Somandepalli²,

Guha³

2022

Preprint

Large scale databases with high-quality manual annotations are scarce in audio domain. We thus explore a self-supervised graph approach to learning audio representations from highly limited labeled data. Considering each audio sample as a graph node, we propose a subgraph-based framework with novel self-supervision tasks that can learn effective audio representations. During training, subgraphs are constructed by sampling the entire pool of available training data to exploit the relationship between the labeled and unlabeled audio samples. During inference, we use random edges to alleviate the overhead of graph construction. We evaluate our model on three benchmark audio databases, and two tasks: acoustic event detection and speech emotion recognition. Our semi-supervised model performs better or on par with fully supervised models and outperforms several competitive existing models. Our model is compact (240k parameters), and can produce generalized audio representations that are robust to different types of signal noise.

Heterogeneous Graph Learning for Acoustic Event Classification

Shirian¹,

Ahmadian²,

Somandepalli³

et al. 2023

Preprint

Compact Graph Architecture for Speech Emotion Recognition

2020

Preprint

Future Image Prediction of Plantar Pressure During Gait Using Spatio-temporal Transformer

Ahmadian

Rahmani-Boldaji

2022

Highly scalable, shared-memory, Monte-Carlo tree search based Blokus Duo Solver on FPGA

Qasemi

Samadi

Shadmehr

et al. 2014

In this paper we present our hardware architecture on a highly scalable, shared-memory, Monte-Carlo Tree Search (MCTS) based Blokus-Duo solver. In the proposed architecture each MCTS solver module contains a centralized MCTS controller which can also be implemented using soft-cores with a true dual-port access to a shared memory called main memory, and multitude number of MCTS engines each containing several simulation cores. Consequently, this highly flexible architecture guaranties the optimized performance of the solver regardless of the actual FPGA platform used. Our design has been inspired from parallel MCTS algorithms and is potentially capable of obtaining maximum possible parallelism from MCTS algorithm. On the other hand, in our design we combine MCTS with pruning heuristics to increase both the memory and LE utilizations. The results show that our architecture can run up to 50MHz on DE2-115 platform, where each Simulation core requires 11K LEs and MCTS controller requires 10K LEs.