A angústia epistemológica na psicologia

Detecting the sentiment expressed by a document is a key task for many applications, e.g., modeling user preferences, monitoring consumer behaviors, assessing product quality. Traditionally, the sentiment analysis task primarily relies on textual content. Fueled by the rise of mobile phones that are often the only cameras on hand, documents on the Web (e.g., reviews, blog posts, tweets) are increasingly multimodal in nature, with photos in addition to textual content. A question arises whether the visual component could be useful for sentiment analysis as well. In this work, we propose Visual Aspect Attention Network or VistaNet, leveraging both textual and visual components. We observe that in many cases, with respect to sentiment detection, images play a supporting role to text, highlighting the salient aspects of an entity, rather than expressing sentiments independently of the text. Therefore, instead of using visual information as features, VistaNet relies on visual information as alignment for pointing out the important sentences of a document using attention. Experiments on restaurant reviews showcase the effectiveness of visual aspect attention, vis-à-vis visual features or textual attention.

show abstract

Fast and Accurate Capitalization and Punctuation for Automatic Speech Recognition Using Transformer and Chunk Merging

Nguyen

Nguyen²,

Nguyen

et al. 2019

View full text Add to dashboard Cite

In recent years, studies on automatic speech recognition (ASR) have shown outstanding results that reach human parity on short speech segments. However, there are still difficulties in standardizing the output of ASR such as capitalization and punctuation restoration for long-speech transcription. The problems obstruct readers to understand the ASR output semantically and also cause difficulties for natural language processing models such as NER, POS and semantic parsing. In this paper, we propose a method to restore the punctuation and capitalization for long-speech ASR transcription. The method is based on Transformer models and chunk merging that allows us to (1), build a single model that performs punctuation and capitalization in one go, and (2), perform decoding in parallel while improving the prediction accuracy. Experiments on British National Corpus showed that the proposed approach outperforms existing methods in both accuracy and decoding speed.

show abstract

Preserving Word-Level Emphasis in Speech-to-Speech Translation

Truong

Toda

Neubig

et al. 2017

IEEE/ACM Trans. Audio Speech Lang. Process.

View full text Add to dashboard Cite

Speech-to-speech translation (S2ST) is a technology that translates speech across languages, which can remove barriers in cross-lingual communication. In conventional S2ST systems, the linguistic meaning of speech was translated, but paralinguistic information conveying other features of the speech such as emotion or emphasis were ignored. In this paper, we propose a method to translate paralinguistic information, specifically focusing on emphasis. The method consists of a series of components that can accurately translate emphasis using all acoustic features of speech. First, linear-regression hidden semi-Markov models (LR-HSMMs) are used to estimate a real-numbered emphasis value for every word in an utterance, resulting in a sequence of values for the utterance. After that, the emphasis translation module translates the estimated emphasis sequence into a target language emphasis sequence using a conditional random field (CRF) model considering the features of emphasis levels, words, and part-of-speech tags. Finally, the speech synthesis module synthesizes emphasized speech with LR-HSMMs, taking into account the translated emphasis sequence and transcription. The results indicate that our translation model can translate emphasis information, correctly emphasizing words in the target language with 91.6% F-measure by objective evaluation. A listening test with human subjects further showed that they could identify the emphasized words with 87.8% Fmeasure, and that the naturalness of the audio was preserved.

show abstract

Bilateral Variational Autoencoder for Collaborative Filtering

Truong

Salah

Lauw

2021

View full text Add to dashboard Cite

Visual Sentiment Analysis for Review Images with Item-Oriented and User-Oriented CNN

Truong

Lauw

2017

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Quoc Truong

VistaNet: Visual Aspect Attention Network for Multimodal Sentiment Analysis

Fast and Accurate Capitalization and Punctuation for Automatic Speech Recognition Using Transformer and Chunk Merging

Preserving Word-Level Emphasis in Speech-to-Speech Translation

Bilateral Variational Autoencoder for Collaborative Filtering

Visual Sentiment Analysis for Review Images with Item-Oriented and User-Oriented CNN

Contact Info

Product

Resources

About