2022
DOI: 10.24215/16666038.22.e14
|View full text |Cite
|
Sign up to set email alerts
|

Statistical analysis of the performance of four Apache Spark ML algorithms

Abstract: Feature selection (FS) techniques generally require repeatedly training and evaluating models to assess theimportance of each feature for a particular task. However, due to the increasing size of currently availabledatabases, distributed processing has become a necessity for many tasks. In this context, the Apache SparkML library is one of the most widely used libraries for performing classification and other tasks with largedatasets. Therefore, knowing both the predictive performance and efficiency of its mai… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 0 publications
0
1
0
Order By: Relevance
“…Also, the dataset used in [50] was from the SEER repository. In [51], the highdimensional dataset for cancer prediction is used.…”
Section: Types Of Datasets Used In Related Workmentioning
confidence: 99%
“…Also, the dataset used in [50] was from the SEER repository. In [51], the highdimensional dataset for cancer prediction is used.…”
Section: Types Of Datasets Used In Related Workmentioning
confidence: 99%