Shubhi Tyagi scite author profile

We introduce ReFinED, an efficient end-to-end entity linking model which uses fine-grained entity types and entity descriptions to perform linking. The model performs mention detection, fine-grained entity typing, and entity disambiguation for all mentions within a document in a single forward pass, making it more than 60 times faster than competitive existing approaches. ReFinED also surpasses state-ofthe-art performance on standard entity linking datasets by an average of 3.7 F1. The model is capable of generalising to large-scale knowledge bases such as Wikidata (which has 15 times more entities than Wikipedia) and of zeroshot entity linking. The combination of speed, accuracy and scale makes ReFinED an effective and cost-efficient system for extracting entities from web-scale datasets, for which the model has been successfully deployed. Our code and pre-trained models are available at https://github.com/alexa/ReFinED.

show abstract

Proteno: Text Normalization with Limited Data for Fast Deployment in Text to Speech Systems

Tyagi¹,

Bonafonte²,

Lorenzo-Trueba³

et al. 2021

Preprint

View full text Add to dashboard Cite

Developing Text Normalization (TN) systems for Text-to-Speech (TTS) on new languages is hard. We propose a novel architecture to facilitate it for multiple languages while using data less than 3% of the size of the data used by the state of the art results on English. We treat TN as a sequence classification problem and propose a granular tokenization mechanism that enables the system to learn majority of the classes and their normalizations from the training data itself. This is further combined with minimal precoded linguistic knowledge for other classes. We publish the first results on TN for TTS in Spanish and Tamil and also demonstrate that the performance of the approach is comparable with the previous work done on English. All annotated datasets used for experimentation will be released at https://github. com/amazon-research/proteno.

show abstract

Adenotonsillectomy in Arnold Chiari Malformation: Navigating Surgical Complexities

Shashidhar

Bajwa

Tyagi

et al. 2023

Indian J Otolaryngol Head Neck Surg

View full text Add to dashboard Cite

Proteno: Text Normalization with Limited Data for Fast Deployment in Text to Speech Systems

Tyagi

Bonafonte

Lorenzo-Trueba

et al. 2021

View full text Add to dashboard Cite

Developing Text Normalization (TN) systems for Text-to-Speech (TTS) on new languages is hard. We propose a novel architecture to facilitate it for multiple languages while using data less than 3% of the size of the data used by the state of the art results on English. We treat TN as a sequence classification problem and propose a granular tokenization mechanism that enables the system to learn majority of the classes and their normalizations from the training data itself. This is further combined with minimal pre-coded linguistic knowledge for other classes. We publish the first results on TN for TTS in Spanish and Tamil and also demonstrate that the performance of the approach is comparable with the previous work done on English. All annotated datasets used for experimentation will be released at https://github. com/amazon-research/proteno.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Shubhi Tyagi

Dynamic Prosody Generation for Speech Synthesis Using Linguistics-Driven Acoustic Embedding Selection

ReFinED: An Efficient Zero-shot-capable Approach to End-to-End Entity Linking

Proteno: Text Normalization with Limited Data for Fast Deployment in Text to Speech Systems

Adenotonsillectomy in Arnold Chiari Malformation: Navigating Surgical Complexities

Proteno: Text Normalization with Limited Data for Fast Deployment in Text to Speech Systems

Contact Info

Product

Resources

About