With developments of modern and advanced information and communication technologies (ICTs), Industry 4.0 has launched big data analysis, natural language processing (NLP), and artificial intelligence (AI). Corpus analysis is also a part of big data analysis. For many cases of statistic-based corpus techniques adopted to analyze English for specific purposes (ESP), researchers extracted critical information by retrieving domain-oriented lexical units. However, even if corpus software embraces algorithms such as log-likelihood tests, log ratios, BIC scores, etc., the machine still cannot understand linguistic meanings. In many ESP cases, function words reduce the efficiency of corpus analysis. However, many studies still use manual approaches to eliminate function words. Manual annotation is inefficient and time-wasting, and can easily cause information distortion. To enhance the efficiency of big textual data analysis, this paper proposes a novel statistic-based corpus machine processing approach to refine big textual data. Furthermore, this paper uses COVID-19 news reports as a simulation example of big textual data and applies it to verify the efficacy of the machine optimizing process. The refined resulting data shows that the proposed approach is able to rapidly remove function and meaningless words by machine processing and provide decision-makers with domain-specific corpus data for further purposes.
Purpose: Knowledge, attitude, and practice (KAP) models are often used by researchers in the field of public health to explore people’s healthy behaviors. Therefore, this study mainly explored the relationships among participants’ sociodemographic status, COVID-19 knowledge, affective attitudes, and preventive behaviors. Method: This study adopted an online survey, involving a total of 136 males and 204 females, and used a cross-sectional study to investigate the relationships between variables including gender, age, COVID-19 knowledge, positive affective attitudes (emotional wellbeing, psychological wellbeing, and social wellbeing), negative affective attitudes (negative self-perception and negative perceptions of life), and preventive behaviors (hygiene habits, reducing public activities, and helping others to prevent the epidemic). Results: The majority of participants in the study were knowledgeable about COVID-19. The mean COVID-19 knowledge score was 12.86 (SD = 1.34, range: 7–15 with a full score of 15), indicating a high level of knowledge. However, the key to decide whether participants adopt COVID-19 preventive behaviors was mainly their affective attitudes, especially positive affective attitudes (β = 0.18–0.25, p< 0.01), rather than COVID-19 disease knowledge (β = −0.01–0.08, p > 0.05). In addition, the sociodemographic status of the participants revealed obvious differences in the preventive behaviors; females had better preventive behaviors than males such as cooperating with the epidemic prevention hygiene habits (t = −5.08, p< 0.01), reducing public activities (t = −3.00, p< 0.01), and helping others to prevent the epidemic (t = −1.97, p< 0.05), while the older participants were more inclined to adopt preventive behaviors including epidemic prevention hygiene habits (β = 0.18, p = 0.001, R2 = 0.03), reducing public activities (β = 0.35, p< 0.001, R2 = 0.13), and helping others to prevent the epidemic (β = 0.27, p< 0.001, R2 = 0.07). Conclusions: Having adequate COVID-19 knowledge was not linked to higher involvement in precautionary behaviors. Attitudes toward COVID-19 may play a more critical function in prompting individuals to undertake preventive behaviors, and different positive affective attitudes had different predictive relationships with preventive behaviors.
The COVID-19 epidemic has been confirmed as the largest scale outbreak of atypical pneumonia since the outbreak of severe acute respiratory syndrome (SARS) in 2003 and it has become a public health emergency of international concern. It exacerbated public confusion and anxiety, and the impact of COVID-19 on people needs to be better understood. Indeed, prior studies that conducted meta-analysis of longitudinal cohort research compared mental health before versus during the COVID-19 pandemic and proved that public health polices (e.g., city lockdowns, quarantines, avoiding gatherings, etc.) and COVID-19-related information that circulates on new media platforms directly affected citizen’s mental health and well-being. Hence, this research aims to explore Taiwanese people’s health status, anxiety, media sources for obtaining COVID-19 information, subjective well-being, and safety-seeking behavior during the COVID-19 epidemic and how they are associated. Online surveys were conducted through new media platforms, and 342 responses were included in the analysis. The research results indicate that the participants experienced different aspects of COVID-19 anxiety, including COVID-19 worry and perceived COVID-19 risk. Among the given media sources, the more participants searched for COVID-19 information on new media, the greater they worried about COVID-19. Furthermore, COVID-19 worry was positively related to safety-seeking behavior, while perceived COVID-19 risk was negatively related to subjective well-being. This paper concludes by offering some suggestions for future studies and pointing out limitations of the present study.
Military knowledge is an uncommon research field and is often classified as confidential information. Furthermore, when US military knowledge is adopted by English as a foreign language (EFL) countries, properly interpreting military texts brings about challenges. Taking Asian militaries as examples of EFL countries, not every trooper has sufficient English proficiency and capability to read and comprehend complicated military knowledge databases. In addition, under limited training time and lack of suitable reference materials, it is difficult to popularise and improve the efficiency of the courses that study US field manuals (FMs), which are important books that introduce US military combat tactics and strategies, military operation procedures, weapon systems, and others. Nevertheless, in many EFL countries, English learning is integrated into the education system to promote internationalisation and enhance global competitiveness. Thus, the English proficiency of nationals in most EFL countries is not negligible. Based on these considerations, this paper discusses the integration of the corpus software and cooperation of linguists and military experts to conduct syntax analysis and taxonomy of military terminology to enable EFL troopers with non-excellent English proficiency to understand the intricate US military domain knowledge and develop the military corpus as an auxiliary language training material. The US Army FMs of anti-tank missile systems are adopted as an empirical example to illustrate the proposed approach. Analytical findings will become critical reference indicators for defence language institutes (DLI) of EFL militaries in developing military English training materials and for processing military information.
Within the new era of artificial intelligence (AI), education industry should develop in the direction of intelligence and digitalization. For evaluating learners’ academic performances, English high-stakes test is not only a mere means for measuring what English as a Foreign Language (EFL) stakeholders know or do not know but also likely to bring life-changing consequences. Hence, effective test preparation for English high-stakes test is crucial for those who futures depend on attaining a particular score. However, traditional corpus-based approaches cannot simultaneously take words’ frequency and range variables into consideration when evaluating their importance level, which makes the word sorting results inaccurate. Thus, to effectively and accurately extract critical words among English high-stakes test for enhancing EFL stakeholders’ test performance, this paper integrates a corpus-based approach and a revised Importance-Performance Analysis (IPA) method to develop a novel frequency-range analysis (FRA) method. Taiwan College Entrance Exam of English Subject (TCEEES) from the year of 2001 to 2022 are adopted as an empirical case of English high stake test and the target corpus for verification. Results indicate that the critical words evaluated by FRA method are concentrated on Quadrant I including 1,576 word types that account for over 60% running words of TCEEES corpus. After compared with the three traditional corpus-based approaches and the Term Frequency-Inverse Document Frequency (TF-IDF) method, the significant contributions include: (1) the FRA method can use a machine-based function words elimination technique to enhance the efficiency; (2) the FRA method can simultaneously take words’ frequency and range variables into consideration; (3) the FRA method can effectively conduct cluster analysis by categorizing the words into the four quadrants that based on their relative importance level. The results will give EFL stakeholders a clearer picture of how to allocate their learning time and education resources into critical words acquisition.
Within the modern information, communication and technology (ICT), seeking high efficient and accurate corpus-based approaches to process natural language data (NLD) is critical. Traditional corpus-based approaches for processing corpus (i.e. the collected NLD) mainly focused on quantifying and ranking words for assisting human in extracting keywords. However, traditional corpus-based approaches cannot identify the meanings behind the words to properly extract terminologies nor their information. To address this issue, the main objective of this paper is to propose an integrated linguistic analysis approach that combines two corpus-based approaches and a rule-based natural language processing (NLP) approach to extract and identify terminologies and create the text database for extracting deeper domain-oriented information by using the terminologies as channels to retrieve core information from the target corpus. Military domain is an uncommon research field and often classified as confidential data, which caused little researches to focus on. Nevertheless, military information is vital to national security and should not be ignored. Hence, to verify the proposed approach in extracting terminologies and information of the terminologies, the researchers adopt the US Army field manual (FM) 8-10-6 as the target corpus and empirical case. Compared with AntConc 3.5.8 and Tongpoon-Patanasorn’s hybrid approach, the results indicate that from the perspectives of terminology identification, texts database creation, domain knowledge extraction, only the proposed approach can handle all these issues.
The use of corpus assessment approaches to determine and rank keywords for corpus data is critical due to the issues of information retrieval (IR) in Natural Language Processing (NLP), such as when encountering COVID-19, as it can determine whether people can rapidly obtain knowledge of the disease. The algorithms used for corpus assessment have to consider multiple parameters and integrate individuals’ subjective evaluation information simultaneously to meet real-world needs. However, traditional keyword-list-generating approaches are based on only one parameter (i.e., the keyness value) to determine and rank keywords, which is insufficient. To improve the evaluation benefit of the traditional keyword-list-generating approach, this paper proposed an extended analytic hierarchy process (AHP)-based corpus assessment approach to, firstly, refine the corpus data and then use the AHP method to compute the relative weights of three parameters (keyness, frequency, and range). To verify the proposed approach, this paper adopted 53 COVID-19-related research environmental science research articles from the Web of Science (WOS) as an empirical example. After comparing with the traditional keyword-list-generating approach and the equal weights (EW) method, the significant contributions are: (1) using the machine-based technique to remove function and meaningless words for optimizing the corpus data; (2) being able to consider multiple parameters simultaneously; and (3) being able to integrate the experts’ evaluation results to determine the relative weights of the parameters.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.