Celina P. Yu scite author profile

Large-scale learning problems require a plethora of labels that can be efficiently collected from crowdsourcing services at low cost. However, labels annotated by crowdsourced workers are often noisy, which inevitably degrades the performance of large-scale optimizations including the prevalent stochastic gradient descent (SGD). Specifically, these noisy labels adversely affect updates of the primal variable in conventional SGD. To solve this challenge, we propose a robust SGD mechanism called progressive stochastic learning (POSTAL), which naturally integrates the learning regime of curriculum learning (CL) with the update process of vanilla SGD. Our inspiration comes from the progressive learning process of CL, namely learning from "easy" tasks to "complex" tasks. Through the robust learning process of CL, POSTAL aims to yield robust updates of the primal variable on an ordered label sequence, namely, from "reliable" labels to "noisy" labels. To realize POSTAL mechanism, we design a cluster of "screening losses," which sorts all labels from the reliable region to the noisy region. To sum up, POSTAL using screening losses ensures robust updates of the primal variable on reliable labels first, then on noisy labels incrementally until convergence. In theory, we derive the convergence rate of POSTAL realized by screening losses. Meanwhile, we provide the robustness analysis of representative screening losses. Experimental results on UCI1 simulated and Amazon Mechanical Turk crowdsourcing data sets show that the POSTAL using screening losses is more effective and robust than several existing baselines.1UCI is the abbreviation of University of California Irvine.

show abstract

Supervised Learning for Suicidal Ideation Detection in Online User Content

Yu²,

et al. 2018

View full text Add to dashboard Cite

Early detection and treatment are regarded as the most effective ways to prevent suicidal ideation and potential suicide attempts—two critical risk factors resulting in successful suicides. Online communication channels are becoming a new way for people to express their suicidal tendencies. This paper presents an approach to understand suicidal ideation through online user-generated content with the goal of early detection via supervised learning. Analysing users’ language preferences and topic descriptions reveals rich knowledge that can be used as an early warning system for detecting suicidal tendencies. Suicidal individuals express strong negative feelings, anxiety, and hopelessness. Suicidal thoughts may involve family and friends. And topics they discuss cover both personal and social issues. To detect suicidal ideation, we extract several informative sets of features, including statistical, syntactic, linguistic, word embedding, and topic features, and we compare six classifiers, including four traditional supervised classifiers and two neural network models. An experimental study demonstrates the feasibility and practicability of the approach and provides benchmarks for the suicidal ideation detection on the active online platforms: Reddit SuicideWatch and Twitter.

show abstract

Beyond Majority Voting: A Coarse-to-Fine Label Filtration for Heavily Noisy Labels

Han

Tsang

Chen

et al. 2019

IEEE Trans. Neural Netw. Learning Syst.

View full text Add to dashboard Cite

Deep neighbor-aware embedding for node clustering in attributed graphs

Wang

Pan

Yu³

et al. 2022

Pattern Recognition

View full text Add to dashboard Cite

Clustering social audiences in business information networks

Zheng

Fung

et al. 2020

Pattern Recognition

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Celina P. Yu

Progressive Stochastic Learning for Noisy Labels

Supervised Learning for Suicidal Ideation Detection in Online User Content

Beyond Majority Voting: A Coarse-to-Fine Label Filtration for Heavily Noisy Labels

Deep neighbor-aware embedding for node clustering in attributed graphs

Clustering social audiences in business information networks

Contact Info

Product

Resources

About