Purpose of study: Small sample size is the most common limitation which restricts the generalization of research results, and this is true to many fields, including language testing. The current study is sought to show the predictive power of sample sizes over the population mean to decide what sample minimum size can be considered as a proper sample size for a language test.
Methodology: The data for this quantitative research was 5,250 paper-based TOEFL test scores considered as the population, which includes listening, structure, and reading tests, and it is the most familiar standardized test among EFL researchers. Due to its objective nature, it leaves little chance for bias scores. The score ranged between 30.7% of 417 in the TOEFL scale and 95.7% or 653. Standard error was used as the parameter in deciding the proper sample size. It was the cut-off point when the parameter did not show any obvious change when the sample size was added. We used hierarchical agglomerative clustering with three clusters, determined using 30 indices through the majority rule, in finding out the cut-off point.
Main Findings: It was found that the cut-off point is at the sample size of 52 with the range between 46 and 59. Therefore, it can be concluded that the minimum proper sample size for a research study involving a language test is n = 46.
Application of this study: The results of this study apply to the area of English language teaching and testing. However, it does not rule out the possibility that the study result applies to tests in other languages.
Novelty/Originality of this study: The result of this study should be treated as statistical evidence of the proper sample size to avoid inaccurate or conflicting research results in language teaching where a test is used for analysis.