Ying Sheng scite author profile

Ying Sheng

5Publications

52Citation Statements Received

67Citation Statements Given

How they've been cited

How they cite others

136

Affiliations

Huazhong University of Science and Technology, Beijing University of Technology, Google (United States)

Publications

Order By: Most citations

FreeDOM: A Transferable Neural Architecture for Structured Information Extraction on Web Documents

Lin

Sheng

et al. 2020

View full text Add to dashboard Cite

Extracting structured data from HTML documents is a long-studied problem with a broad range of applications like augmenting knowledge bases, supporting faceted search, and providing domain-specific experiences for key verticals like shopping and movies. Previous approaches have either required a small number of examples for each target site or relied on carefully handcrafted heuristics built over visual renderings of websites. In this paper, we present a novel two-stage neural approach, named FreeDOM, which overcomes both these limitations. The first stage learns a representation for each DOM node in the page by combining both the text and markup information. The second stage captures longer range distance and semantic relatedness using a relational neural network. By combining these stages, FreeDOM is able to generalize to unseen sites after training on a small number of seed sites from that vertical without requiring expensive hand-crafted features over visual renderings of the page. Through experiments on a public dataset with 8 different verticals, we show that FreeDOM beats the previous state of the art by nearly 3.7 F1 points on average without requiring features over rendered pages or expensive hand-crafted features.

show abstract

RiSER: Learning Better Representations for Richly Structured Emails

Kocayusufoglu

Sheng

et al. 2019

View full text Add to dashboard Cite

Anatomy of a Privacy-Safe Large-Scale Information Extraction System Over Email

Sheng

Tata

Wendt

et al. 2018

View full text Add to dashboard Cite

Extracting structured data from emails can enable several assistive experiences, such as reminding the user when a bill payment is due, answering queries about the departure time of a booked flight, or proactively surfacing an emailed discount coupon while the user is at that store. This paper presents Juicer, a system for extracting information from email that is serving over a billion Gmail users daily. We describe how the design of the system was informed by three key principles: scaling to a planet-wide email service, isolating the complexity to provide a simple experience for the developer, and safeguarding the privacy of users (our team and the developers we support are not allowed to view any single email). We describe the design tradeoffs made in building this system, the challenges faced and the approaches used to tackle them. We present case studies of three extraction tasks implemented on this platform-bill reminders, commercial offers, and hotel reservations-to illustrate the effectiveness of the platform despite challenges unique to each task. Finally, we outline several areas of ongoing research in largescale machine-learned information extraction from email.

show abstract

Research Sharing-Oriented Functional Neuroimaging Named Entity Recognition

Sheng

Lin

Gao

et al. 2019

View full text Add to dashboard Cite

A Multi-domain Named Entity Recognition Method Based on Part-of-Speech Attention Mechanism

Zhang

Sheng

Gao

et al. 2019

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Ying Sheng

FreeDOM: A Transferable Neural Architecture for Structured Information Extraction on Web Documents

RiSER: Learning Better Representations for Richly Structured Emails

Anatomy of a Privacy-Safe Large-Scale Information Extraction System Over Email

Research Sharing-Oriented Functional Neuroimaging Named Entity Recognition

A Multi-domain Named Entity Recognition Method Based on Part-of-Speech Attention Mechanism

Contact Info

Product

Resources

About