Differential Predictive Modeling for Racial Disparities in Breast Cancer

Palit, Indranil; Reddy, Chandan K.; Schwartz, Kendra

doi:10.1109/bibm.2009.89

Cited by 3 publications

(4 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The binary datasets are represented by triplet (dataset, attributes, instances). The UCI datasets used are (blood, 5, 748), (liver, 6, 345), (diabetes, 8, 768), (gamma, 11, 19020), and (heart, 22,267). Synthetic datasets used in our work have 500,000 to 1 million tuples.…”

Section: Resultsmentioning

confidence: 99%

“…(1) Dataset Distribution Differences -Despite the importance of the problem, only a small amount of work is available in describing the differences between two data distributions. Earlier approaches for measuring the deviation between two datasets used simple data statistics after decomposing the feature space into smaller regions using tree based models [22,12]. However, the final result obtained is a data-dependent measure and do not give any understanding about the features responsible for measuring that difference.…”

Section: Related Workmentioning

confidence: 99%

“…In other words, the experts want to find the locations where the difference in the predictive (cancer) models for the two racial groups is higher and the locations where such difference is negligible. Depending on such information, more health care initiatives will be organized in certain locations to reduce the racial discriminations in cancer patients [22].…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Constrained Logistic Regression for Discriminative Pattern Mining

Anand

Reddy

2011

Machine Learning and Knowledge Discovery in Databases

Self Cite

View full text Add to dashboard Cite

Analyzing differences in multivariate datasets is a challenging problem. This topic was earlier studied by finding changes in the distribution differences either in the form of patterns representing conjunction of attribute value pairs or univariate statistical analysis for each attribute in order to highlight the differences. All such methods focus only on change in attributes in some form and do not implicitly consider the class labels associated with the data. In this paper, we pose the difference in distribution in a supervised scenario where the change in the data distribution is measured in terms of the change in the corresponding classification boundary. We propose a new constrained logistic regression model to measure such a difference between multivariate data distributions based on the predictive models induced on them. Using our constrained models, we measure the difference in the data distributions using the changes in the classification boundary of these models. We demonstrate the advantages of the proposed work over other methods available in the literature using both synthetic and real-world datasets.

show abstract

Section: Resultsmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Constrained Logistic Regression for Discriminative Pattern Mining

Anand

Reddy

2011

Machine Learning and Knowledge Discovery in Databases

Self Cite

View full text Add to dashboard Cite

show abstract

“…Haller et al 2012Gene expression classification using SVM Vanitha et al (2015) Ortholog detection in yeast species using imbalanced classification approaches including SVM Galpert et al (2015) Evolutionary feature selection using SVM and other techniques using MapReduce Peralta et al (2015) Genomic feature learning using SVMrecursive feature elimination algorithm Anaissi et al (2016) Decision trees Employing decision tree learning for processing of large datasets Hall et al (1998) RainForest, a framework supporting construction of fast decision tree for classification of large datasets Johannes Gehrke et al 2000Predictive decision tree model for analysing racial disparities in breast cancer Palit et al (2009) A streaming parallel decision tree algorithm for classification of largescale datasets and streaming data…”

Section: Logistic Regressionmentioning

confidence: 99%

Emerging trend of big data analytics in bioinformatics: a literature review

Nagaraj¹,

Sharvani²,

Sridhar³

2018

IJBRA

View full text Add to dashboard Cite

Advancement of unparalleled data in bioinformatics over the years is a major concern for storage and management. Such massive data must be handled efficiently to disseminate knowledge. Computational advancements in information technology present feasible analytical solutions to process such data. In this context, the paper is an attempt to highlight the influence of big data in bioinformatics. Some of the concepts emphasised are definition of big data; architectural platforms supporting data analytics; followed by the application of above-mentioned analytical techniques towards complex problems in bioinformatics. The challenges and future prospects of big data analytics in bioinformatics are briefly discussed. This paper provides a comprehensive summary of several data analytical techniques available for bioinformatics researchers and computer scientists.

show abstract

Adaptive Machine Learning Algorithm and Analytics of Big Genomic Data for Gene Prediction

Sarumi

Leung

2021

Intelligent Systems Reference Library

View full text Add to dashboard Cite

Differential Predictive Modeling for Racial Disparities in Breast Cancer

Cited by 3 publications

References 13 publications

Constrained Logistic Regression for Discriminative Pattern Mining

Constrained Logistic Regression for Discriminative Pattern Mining

Emerging trend of big data analytics in bioinformatics: a literature review

Adaptive Machine Learning Algorithm and Analytics of Big Genomic Data for Gene Prediction

Contact Info

Product

Resources

About