With developments of modern and advanced information and communication technologies (ICTs), Industry 4.0 has launched big data analysis, natural language processing (NLP), and artificial intelligence (AI). Corpus analysis is also a part of big data analysis. For many cases of statistic-based corpus techniques adopted to analyze English for specific purposes (ESP), researchers extracted critical information by retrieving domain-oriented lexical units. However, even if corpus software embraces algorithms such as log-likelihood tests, log ratios, BIC scores, etc., the machine still cannot understand linguistic meanings. In many ESP cases, function words reduce the efficiency of corpus analysis. However, many studies still use manual approaches to eliminate function words. Manual annotation is inefficient and time-wasting, and can easily cause information distortion. To enhance the efficiency of big textual data analysis, this paper proposes a novel statistic-based corpus machine processing approach to refine big textual data. Furthermore, this paper uses COVID-19 news reports as a simulation example of big textual data and applies it to verify the efficacy of the machine optimizing process. The refined resulting data shows that the proposed approach is able to rapidly remove function and meaningless words by machine processing and provide decision-makers with domain-specific corpus data for further purposes.
Purpose: Knowledge, attitude, and practice (KAP) models are often used by researchers in the field of public health to explore people’s healthy behaviors. Therefore, this study mainly explored the relationships among participants’ sociodemographic status, COVID-19 knowledge, affective attitudes, and preventive behaviors. Method: This study adopted an online survey, involving a total of 136 males and 204 females, and used a cross-sectional study to investigate the relationships between variables including gender, age, COVID-19 knowledge, positive affective attitudes (emotional wellbeing, psychological wellbeing, and social wellbeing), negative affective attitudes (negative self-perception and negative perceptions of life), and preventive behaviors (hygiene habits, reducing public activities, and helping others to prevent the epidemic). Results: The majority of participants in the study were knowledgeable about COVID-19. The mean COVID-19 knowledge score was 12.86 (SD = 1.34, range: 7–15 with a full score of 15), indicating a high level of knowledge. However, the key to decide whether participants adopt COVID-19 preventive behaviors was mainly their affective attitudes, especially positive affective attitudes (β = 0.18–0.25, p< 0.01), rather than COVID-19 disease knowledge (β = −0.01–0.08, p > 0.05). In addition, the sociodemographic status of the participants revealed obvious differences in the preventive behaviors; females had better preventive behaviors than males such as cooperating with the epidemic prevention hygiene habits (t = −5.08, p< 0.01), reducing public activities (t = −3.00, p< 0.01), and helping others to prevent the epidemic (t = −1.97, p< 0.05), while the older participants were more inclined to adopt preventive behaviors including epidemic prevention hygiene habits (β = 0.18, p = 0.001, R2 = 0.03), reducing public activities (β = 0.35, p< 0.001, R2 = 0.13), and helping others to prevent the epidemic (β = 0.27, p< 0.001, R2 = 0.07). Conclusions: Having adequate COVID-19 knowledge was not linked to higher involvement in precautionary behaviors. Attitudes toward COVID-19 may play a more critical function in prompting individuals to undertake preventive behaviors, and different positive affective attitudes had different predictive relationships with preventive behaviors.
The COVID-19 epidemic has been confirmed as the largest scale outbreak of atypical pneumonia since the outbreak of severe acute respiratory syndrome (SARS) in 2003 and it has become a public health emergency of international concern. It exacerbated public confusion and anxiety, and the impact of COVID-19 on people needs to be better understood. Indeed, prior studies that conducted meta-analysis of longitudinal cohort research compared mental health before versus during the COVID-19 pandemic and proved that public health polices (e.g., city lockdowns, quarantines, avoiding gatherings, etc.) and COVID-19-related information that circulates on new media platforms directly affected citizen’s mental health and well-being. Hence, this research aims to explore Taiwanese people’s health status, anxiety, media sources for obtaining COVID-19 information, subjective well-being, and safety-seeking behavior during the COVID-19 epidemic and how they are associated. Online surveys were conducted through new media platforms, and 342 responses were included in the analysis. The research results indicate that the participants experienced different aspects of COVID-19 anxiety, including COVID-19 worry and perceived COVID-19 risk. Among the given media sources, the more participants searched for COVID-19 information on new media, the greater they worried about COVID-19. Furthermore, COVID-19 worry was positively related to safety-seeking behavior, while perceived COVID-19 risk was negatively related to subjective well-being. This paper concludes by offering some suggestions for future studies and pointing out limitations of the present study.
A corpus is a massive body of structured textual data that are stored and operated electronically. It usually combines with statistics, machine learning algorithms, or artificial intelligence (AI) technologies to explore the semantic relationship between lexical units, and beneficial when applied to language learning, information processing, translation, and so forth. In the face of a novel disease, like, COVID-19, establishing medical-
Many education systems globally adopt an English proficiency test (EPT) as an effective mechanism to evaluate English as a Foreign Language (EFL) speakers’ comprehension levels. Similarly, Taiwan’s military academy also developed the Military Online English Proficiency Test (MOEPT) to assess EFL cadets’ English comprehension levels. However, the difficulty level of MOEPT has not been detected to help facilitate future updates of its test banks and improve EFL pedagogy and learning. Moreover, it is almost impossible to carry out any investigation effectively using previous corpus-based approaches. Hence, based on the lexical threshold theory, this research adopts a corpus-based approach to detect the difficulty level of MOEPT. The function word list and Taiwan College Entrance Examination Center (TCEEC) word list (which includes Common European Framework of Reference for Language (CEFR) A2 and B1 level word lists) are adopted as the word classification criteria to classify the lexical items. The results show that the difficulty level of MOEPT is mainly the English for General Purposes (EGP) type of CEFR A2 level (lexical coverage = 74.46%). The findings presented in this paper offer implications for the academy management or faculty to regulate the difficulty and contents of MOEPT in the future, to effectively develop suitable EFL curriculums and learning materials, and to conduct remedial teaching for cadets who cannot pass MOEPT. By doing so, it is expected the overall English comprehension level of EFL cadets is expected to improve.
Within the new era of artificial intelligence (AI), education industry should develop in the direction of intelligence and digitalization. For evaluating learners’ academic performances, English high-stakes test is not only a mere means for measuring what English as a Foreign Language (EFL) stakeholders know or do not know but also likely to bring life-changing consequences. Hence, effective test preparation for English high-stakes test is crucial for those who futures depend on attaining a particular score. However, traditional corpus-based approaches cannot simultaneously take words’ frequency and range variables into consideration when evaluating their importance level, which makes the word sorting results inaccurate. Thus, to effectively and accurately extract critical words among English high-stakes test for enhancing EFL stakeholders’ test performance, this paper integrates a corpus-based approach and a revised Importance-Performance Analysis (IPA) method to develop a novel frequency-range analysis (FRA) method. Taiwan College Entrance Exam of English Subject (TCEEES) from the year of 2001 to 2022 are adopted as an empirical case of English high stake test and the target corpus for verification. Results indicate that the critical words evaluated by FRA method are concentrated on Quadrant I including 1,576 word types that account for over 60% running words of TCEEES corpus. After compared with the three traditional corpus-based approaches and the Term Frequency-Inverse Document Frequency (TF-IDF) method, the significant contributions include: (1) the FRA method can use a machine-based function words elimination technique to enhance the efficiency; (2) the FRA method can simultaneously take words’ frequency and range variables into consideration; (3) the FRA method can effectively conduct cluster analysis by categorizing the words into the four quadrants that based on their relative importance level. The results will give EFL stakeholders a clearer picture of how to allocate their learning time and education resources into critical words acquisition.
In the current COVID-19 post-pandemic era, COVID-19 vaccine hesitancy is hindering the herd immunity generated by widespread vaccination. It is critical to identify the factors that may cause COVID-19 vaccine hesitancy, enabling the relevant authorities to propose appropriate interventions for mitigating such a phenomenon. Keyword extraction, a sub-field of natural language processing (NLP) applications, plays a vital role in modern medical informatics. When traditional corpus-based NLP methods are used to conduct keyword extraction, they only consider a word’s log-likelihood value to determine whether it is a keyword, which leaves room for concerns about the efficiency and accuracy of this keyword extraction technique. These concerns include the fact that the method is unable to (1) optimize the keyword list by the machine-based approach, (2) effectively evaluate the keyword’s importance level, and (3) integrate the variables to conduct data clustering. Thus, to address the aforementioned issues, this study integrated a machine-based word removal technique, the i10-index, and the importance–performance analysis (IPA) technique to develop an improved corpus-based NLP method for facilitating keyword extraction. The top 200 most-cited Science Citation Index (SCI) research articles discussing COVID-19 vaccine hesitancy were adopted as the target corpus for verification. The results showed that the keywords of Quadrant I (n = 98) reached the highest lexical coverage (9.81%), indicating that the proposed method successfully identified and extracted the most important keywords from the target corpus, thus achieving more domain-oriented and accurate keyword extraction results.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.