Our system is currently under heavy load due to increased usage. We're actively working on upgrades to improve performance. Thank you for your patience.
2019
DOI: 10.1038/s41598-019-54519-x
|View full text |Cite
|
Sign up to set email alerts
|

TSLRF: Two-Stage Algorithm Based on Least Angle Regression and Random Forest in genome-wide association studies

Abstract: One of the most important tasks in genome-wide association analysis (GWAS) is the detection of single-nucleotide polymorphisms (SNPs) which are related to target traits. With the development of sequencing technology, traditional statistical methods are difficult to analyze the corresponding high-dimensional massive data or SNPs. Recently, machine learning methods have become more popular in high-dimensional genetic data analysis for their fast computation speed. However, most of machine learning methods have s… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
4
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
1

Relationship

1
5

Authors

Journals

citations
Cited by 8 publications
(7 citation statements)
references
References 28 publications
0
4
0
Order By: Relevance
“…One of the study topics in statistical applications is the reduction of dimensionality since this is often the most effective method for working with large datasets of high complexity [7,8]. Over the last decade, the disciplines of machine learning and statistical methods have seen the introduction of many two-stage procedures, such as ranking variable selection (RVS) and classification algorithms, aimed at effectively tackling these challenges [5,9,10]. The application of RVS techniques in classification methodologies consistently improves performance by reducing data dimensionality through the exclusion of irrelevant genes (variables).…”
Section: Proposed Two-steps Procedures Of Classification High Dimensi...mentioning
confidence: 99%
“…One of the study topics in statistical applications is the reduction of dimensionality since this is often the most effective method for working with large datasets of high complexity [7,8]. Over the last decade, the disciplines of machine learning and statistical methods have seen the introduction of many two-stage procedures, such as ranking variable selection (RVS) and classification algorithms, aimed at effectively tackling these challenges [5,9,10]. The application of RVS techniques in classification methodologies consistently improves performance by reducing data dimensionality through the exclusion of irrelevant genes (variables).…”
Section: Proposed Two-steps Procedures Of Classification High Dimensi...mentioning
confidence: 99%
“…Thus, the proposed methods apply to the multi-pollutant framework, and the method used should be based goal of the study, be it a prediction, effect estimation, or screening for significant predictors and their interactions. Jiali S. [12] proposed a two-stage algorithm based on Least Angle Regression (LARS) and Random Forests. They concluded that the proposed method significantly improved the model fitting and variable selection, requiring less calculation time.…”
Section: Zou and Hastiementioning
confidence: 99%
“…This approach of detecting key intersecting categories in a training sample and inputted into a subsequent analysis sample has been used in genome researchfor which extremely high-dimensional data are the normfor detecting the intersecting combinations of genes that predict diseases. Examples include Khan et al (2020), Sun et al (2019) and Wright et al (2016). Even if key predictor interactions could be detected by machine learning, it does not seem possible to determine whether any statistically unimportant effects are due to true absences of effects or an underpowered analysis due to few observations in the intersecting categories.…”
Section: Machine Learning and Big Datamentioning
confidence: 99%