João Paulo Neto scite author profile

This paper describes our work on the development of an audio segmentation, classification and clustering system applied to a Broadcast News task for the European Portuguese language.We developed a new algorithm for audio segmentation that is both accurate and uses less computational resources than other approaches. Our speaker clustering module uses a modified BIC algorithm which performs substantially better than the standard KL2 and is much faster than the full BIC. Finally, we developed a scheme for tagging certain speaker clusters (anchors) using trained cluster models. A series of tests were conducted showing the advantage of the new algorithms. This system is part of a prototype system that is daily processing the main news show of the national Portuguese broadcaster.

show abstract

Automatic Keyword Extraction on Twitter

Marujo

Wang

Trancoso

et al. 2015

View full text Add to dashboard Cite

In this paper, we build a corpus of tweets from Twitter annotated with keywords using crowdsourcing methods. We identify key differences between this domain and the work performed on other domains, such as news, which makes existing approaches for automatic keyword extraction not generalize well on Twitter datasets. These datasets include the small amount of content in each tweet, the frequent usage of lexical variants and the high variance of the cardinality of keywords present in each tweet. We propose methods for addressing these issues, which leads to solid improvements on this dataset for this task.

show abstract

AUDIMUS.MEDIA: A Broadcast News Speech Recognition System for the European Portuguese Language

Meinedo

Caseiro

Neto

et al. 2003

View full text Add to dashboard Cite

Broadcast news subtitling system in Portuguese

Neto

Meinedo

Viveiros

et al. 2008

View full text Add to dashboard Cite

The subtitling of broadcast news programs are starting to become a very interesting application due to the technological advances in Automatic Speech Recognition and associated technologies. However, to build this kind of systems, several advances are necessary both in terms of the technological components and on main blocks integration. In this paper, we are presenting the overall architecture of a subtitling system running daily at RTP (the Portuguese public broadcast company). The goal is to integrate our components in a system for the subtitling of RTP programs. The global system includes the subtitling of recorded and direct programs.

show abstract

Exploring events and distributed representations of text in multi-document summarization

Marujo

Ribeiro

Gershman

et al. 2016

Knowledge-Based Systems

View full text Add to dashboard Cite

In this article, we explore an event detection framework to improve multidocument summarization. Our approach is based on a two-stage single-document method that extracts a collection of key phrases, which are then used in a centrality-as-relevance passage retrieval model. We explore how to adapt this single-document method for multi-document summarization methods that are able to use event information. The event detection method is based on Fuzzy

show abstract

Self reinforcement for important passage retrieval

Ribeiro

Marujo

Matos

et al. 2013

View full text Add to dashboard Cite

In general, centrality-based retrieval models treat all elements of the retrieval space equally, which may reduce their effectiveness. In the specific context of extractive summarization (or important passage retrieval), this means that these models do not take into account that information sources often contain lateral issues, which are hardly as important as the description of the main topic, or are composed by mixtures of topics. We present a new two-stage method that starts by extracting a collection of key phrases that will be used to help centrality-as-relevance retrieval model. We explore several approaches to the integration of the key phrases in the centrality model. The proposed method is evaluated using different datasets that vary in noise (noisy vs clean) and language (Portuguese vs English). Results show that the best variant achieves relative performance improvements of about 31% in clean data and 18% in noisy data.

show abstract

Speech recognition of broadcast news for the European Portuguese language

Meinedo

Souto

Neto

View full text Add to dashboard Cite

This paper describes our work on the development of a large vocabulary continuous speech recognition system applied to a Broadcast News task for the European Portuguese language in the scope of the ALERT project. We start by presenting the baseline recogniser AUDIMUS, which was originally developed with a corpus of read newspaper text. This is a hybrid system that uses a combination of phone probabilities generated by several MLPs trained on distinct feature sets. The paper details the modifications introduced in this system, namely in the development of a new language model, the vocabulary and pronunciation lexicon and the training on new data from the ALERT BN corpus currently available. The system trained with this BN corpus achieved 18.4% WER when tested with the F0 focus condition (studio, planed, native, clean), and 35.2% when tested in all focus conditions.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

João Paulo Neto

Non-speech audio event detection

Audio segmentation, classification and clustering in a broadcast news task

Automatic Keyword Extraction on Twitter

AUDIMUS.MEDIA: A Broadcast News Speech Recognition System for the European Portuguese Language

Broadcast news subtitling system in Portuguese

Exploring events and distributed representations of text in multi-document summarization

Self reinforcement for important passage retrieval

Speech recognition of broadcast news for the European Portuguese language

Contact Info

Product

Resources

About