Paweł Cyrta scite author profile

Paweł Cyrta

4Publications

23Citation Statements Received

71Citation Statements Given

How they've been cited

How they cite others

Affiliations

Publications

Order By: Most citations

Speaker Diarization Using Deep Recurrent Convolutional Neural Networks for Speaker Embeddings

Cyrta¹,

Trzciński

Stokowiec

2017

View full text Add to dashboard Cite

In this paper we propose a new method of speaker diarization that employs a deep learning architecture to learn speaker embeddings. In contrast to the traditional approaches that build their speaker embeddings using manually hand-crafted spectral features, we propose to train for this purpose a recurrent convolutional neural network applied directly on magnitude spectrograms. To compare our approach with the state of the art, we collect and release for the public an additional dataset of over 6 hours of fully annotated broadcast material. The results of our evaluation on the new dataset and three other benchmark datasets show that our proposed method significantly outperforms the competitors and reduces diarization error rate by a large margin of over 30% with respect to the baseline.

show abstract

Extracting Textual Overlays from Social Media Videos Using Neural Networks

Słucki

Trzciński

Bielski³

et al. 2018

View full text Add to dashboard Cite

Textual overlays are often used in social media videos as people who watch them without the sound would otherwise miss essential information conveyed in the audio stream. This is why extraction of those overlays can serve as an important meta-data source, e.g. for content classification or retrieval tasks. In this work, we present a robust method for extracting textual overlays from videos that builds up on multiple neural network architectures. The proposed solution relies on several processing steps: keyframe extraction, text detection and text recognition. The main component of our system, i.e. the text recognition module, is inspired by a convolutional recurrent neural network architecture and we improve its performance using synthetically generated dataset of over 600,000 images with text prepared by authors specifically for this task. We also develop a filtering method that reduces the amount of overlapping text phrases using Levenshtein distance and further boosts system's performance. The final accuracy of our solution reaches over 80% and is au pair with state-of-the-art methods.

show abstract

SocialML: machine learning for social media video creators

Trzciński¹,

Bielski²,

Cyrta³

et al. 2018

Preprint

View full text Add to dashboard Cite

Extracting textual overlays from social media videos using neural networks

Słucki¹,

Trzciński²,

Bielski³

et al. 2018

Preprint

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Paweł Cyrta

Speaker Diarization Using Deep Recurrent Convolutional Neural Networks for Speaker Embeddings

Extracting Textual Overlays from Social Media Videos Using Neural Networks

SocialML: machine learning for social media video creators

Extracting textual overlays from social media videos using neural networks

Contact Info

Product

Resources

About