Chuancong Gao scite author profile

Chuancong Gao

5Publications

76Citation Statements Received

37Citation Statements Given

How they've been cited

102

How they cite others

Affiliations

Simon Fraser University, Tsinghua University, Saarland University

Publications

Order By: Most citations

Efficient mining of frequent sequence generators

Gao

Wang

et al. 2008

View full text Add to dashboard Cite

Sequential pattern mining has raised great interest in data mining research field in recent years. However, to our best knowledge, no existing work studies the problem of frequent sequence generator mining. In this paper we present a novel algorithm, FEAT (abbr. Frequent sEquence generATor miner), to perform this task. Experimental results show that FEAT is more efficient than traditional sequential pattern mining algorithms but generates more concise result set, and is very effective for classifying Web product reviews.

show abstract

Direct mining of discriminative patterns for classifying uncertain data

Gao

Wang

2010

View full text Add to dashboard Cite

Efficient itemset generator discovery over a stream sliding window

Gao

Wang

2009

View full text Add to dashboard Cite

Home sweet home: Quantifying home court advantages for NCAA basketball statistics

Bommel

Bornn

Chow-White

et al. 2021

JSA

View full text Add to dashboard Cite

Box score statistics are the baseline measures of performance for National Collegiate Athletic Association (NCAA) basketball. Between the 2011-2012 and 2015-2016 seasons, NCAA teams performed better at home compared to on the road in nearly all box score statistics across both genders and all three divisions. Using box score data from over 100,000 games spanning the three divisions for both women and men, we examine the factors underlying this discrepancy. The prevalence of neutral location games in the NCAA provides an additional angle through which to examine the gaps in box score statistic performance, which we believe has been underutilized in existing literature. We also estimate a regression model to quantify the home court advantages for box score statistics after controlling for other factors such as number of possessions, and team strength. Additionally, we examine the biases of scorekeepers and referees. We present evidence that scorekeepers tend to have greater home team biases when observing men compared to women, higher divisions compared to lower divisions, and stronger teams compared to weaker teams. Finally, we present statistically significant results indicating referee decisions are impacted by attendance, with larger crowds resulting in greater bias in favor of the home team.

show abstract

Top-k interesting phrase mining in ad-hoc collections using sequence pattern indexing

Gao

Michel

2012

View full text Add to dashboard Cite

In this paper we consider the problem of mining frequently occurring interesting phrases in large document collections in an ad-hoc fashion. Ad-hoc refers to the ability to perform such analyses over text corpora that can be an arbitrary subset of a global set of documents. Most of the times the identification of these ad-hoc document collections is driven by a user or application defined query with the aim of gathering statistics describing the sub-collection, as a starting point for further data analysis tasks. Our approach to mine the top-k most interesting phrases consists of a novel indexing technique, called Sequence Pattern Indexing (SeqPattIndex), that benefits from the observation that phrases often overlap sequentially. We devise a forest based index for phrases and an further improved version with additional redundancy elimination power. The actual top-k phrase mining algorithm operating on these indices is a combination of a simple merge join and inspired by the pattern-growth framework from the data mining community, making use of early termination and search space pruning technologies that enhance the runtime performance. Overall, our approach has on average a lower index space consumption as well as a lower runtime for the top-k phrase mining task, as we demonstrate in the experimental evaluation using real-world data.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Chuancong Gao

Efficient mining of frequent sequence generators

Direct mining of discriminative patterns for classifying uncertain data

Efficient itemset generator discovery over a stream sliding window

Home sweet home: Quantifying home court advantages for NCAA basketball statistics

Top-k interesting phrase mining in ad-hoc collections using sequence pattern indexing

Contact Info

Product

Resources

About