Advances in computational linguistics and discourse processing have made it possible to automate many language-and text-processing mechanisms. We have developed a computer tool called Coh-Metrix, which analyzes texts on over 200 measures of cohesion, language, and readability. Its modules use lexicons, part-of-speech classifiers, syntactic parsers, templates, corpora, latent semantic analysis, and other components that are widely used in computational linguistics. After the user enters an English text, CohMetrix returns measures requested by the user. In addition, a facility allows the user to store the results of these analyses in data files (such as Text, Excel, and SPSS). Standard text readability formulas scale texts on difficulty by relying on word length and sentence length, whereas Coh-Metrix is sensitive to cohesion relations, world knowledge, and language and discourse characteristics.
Coh-Metrix is among the broadest and most sophisticated automated textual assessment tools available today. Automated Evaluation of Text and Discourse with Coh-Metrix describes this computational tool, as well as the wide range of language and discourse measures it provides. Part I of the book focuses on the theoretical perspectives that led to the development of Coh-Metrix, its measures, and empirical work that has been conducted using this approach. Part II shifts to the practical arena, describing how to use Coh-Metrix and how to analyze, interpret, and describe results. Coh-Metrix opens the door to a new paradigm of research that coordinates studies of language, corpus analysis, computational linguistics, education, and cognitive science. This tool empowers anyone with an interest in text to pursue a wide array of previously unanswerable research questions.
This study examined the effects of providing reading strategy instruction to improve the effectiveness of self-explanation (i.e., explaining the meaning of information to oneself while reading). The effects of the reading strategy instruction, called Self-Explanation Reading Training (SERT), were examined both in terms of comprehension scores and self-explanation quality. Half of the participants (n = 42) received SERT, which included reading strategy instruction and self-explanation practice with 4 science texts (SERT condition). The remaining participants read aloud the 4 science texts (control condition). During this training phase, self-explanation, as compared to reading aloud, only improved comprehension for the most difficult of the 4 texts. Prior domain knowledge consistently improved comprehension performance, whereas reading skill and reading span had minimal effects. After training, both SERT and control participants self-explained a difficult text about cell mitosis. SERT improved comprehension and self-explanation quality only for participants with low domain knowledge. However, the effects of SERT on low-knowledge participants' comprehension emerged only for text-based questions and not for bridging-inference questions. Protocol analyses indicated that SERT helped these participants to use logic, or domain-general knowledge, rather than domain-specific knowledge to make sense of the text.Understanding and learning from written material is one of the most important skills to possess in modern society. The importance of understanding text ranges from being able to decipher the "three easy steps" for setting up your computer to understanding the ever-dreaded physiology textbook. Indeed, the ability to comprehend the challenging textbooks confronted in typical classrooms is one of the
In this study, a corpus of expert-graded essays, based on a standardized scoring rubric, is computationally evaluated so as to distinguish the differences between those essays that were rated as high and those rated as low. The automated tool, Coh-Metrix, is used to examine the degree to which high- and low-proficiency essays can be predicted by linguistic indices of cohesion (i.e., coreference and connectives), syntactic complexity (e.g., number of words before the main verb, sentence structure overlap), the diversity of words used by the writer, and characteristics of words (e.g., frequency, concreteness, imagability). The three most predictive indices of essay quality in this study were syntactic complexity (as measured by number of words before the main verb), lexical diversity (as measured by the Measure of Textual Lexical Diversity), and word frequency (as measured by Celex, logarithm for all words). Using 26 validated indices of cohesion from Coh-Metrix, none showed differences between high- and low-proficiency essays and no indices of cohesion correlated with essay ratings. These results indicate that the textual features that characterize good student writing are not aligned with those features that facilitate reading comprehension. Rather, essays judged to be of higher quality were more likely to contain linguistic features associated with text difficulty and sophisticated language.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.