This study focused on the relationship between percentage of vocabulary known in a text and level of comprehension of the same text. Earlier studies have estimated the percentage of vocabulary necessary for second language learners to understand written texts as being between 95% (Laufer, 1989) and 98% (Hu & Nation, 2000). In this study, 661 participants from 8 countries completed a vocabulary measure based on words drawn from 2 texts, read the texts, and then completed a reading comprehension test for each text. The results revealed a relatively linear relationship between the percentage of vocabulary known and the degree of reading comprehension. There was no indication of a vocabulary "threshold," where comprehension increased dramatically at a particular percentage of vocabulary knowledge. Results suggest that the 98% estimate is a more reasonable coverage target for readers of academic texts.
The goal of reading assessments is to provide feedback on the skills, processes, and knowledge resources that represent reading abilities. Reading assessments are used for many purposes. However, any appropriate use of reading assessments begins from an understanding of the reading construct, an awareness of the development of reading abilities, and an effort to reflect the construct in assessment tasks. In this chapter, we will first define the construct of reading. Then we will present a straightforward framework that categorizes many uses and purposes for reading assessment, including standardized reading proficiency assessment, classroom reading assessment, assessment for learning, assessment of curricular effectiveness, and assessment for research purposes. For each category in the assessment framework, we will outline and describe a number of major assessment techniques. Finally, we will explore some innovative techniques for reading assessment and discuss challenges and issues for reading assessment.
This article outlines the similarities and differences between reading in a first language (L1) and reading in a second language (L2). L1 and L2 reading comprehension abilities involve the same set of component reading abilities (for example, phonological processing, word recognition, syntactic processing, text meaning integration, inferencing, background knowledge use). However, there are a number of very important differences between L1 and L2 reading. These include linguistic proficiency differences, differences in amount of reading exposure, differences due to background knowledge, and differences due to motivational factors and sociocultural expectations. As L2 proficiency increases, the impact of many of these differences decrease.
MotivationFigures and captions convey essential information in biomedical documents. As such, there is a growing interest in mining published biomedical figures and in utilizing their respective captions as a source of knowledge. Notably, an essential step underlying such mining is the extraction of figures and captions from publications. While several PDF parsing tools that extract information from such documents are publicly available, they attempt to identify images by analyzing the PDF encoding and structure and the complex graphical objects embedded within. As such, they often incorrectly identify figures and captions in scientific publications, whose structure is often non-trivial. The extraction of figures, captions and figure-caption pairs from biomedical publications is thus neither well-studied nor yet well-addressed.ResultsWe introduce a new and effective system for figure and caption extraction, PDFigCapX. Unlike existing methods, we first separate between text and graphical contents, and then utilize layout information to effectively detect and extract figures and captions. We generate files containing the figures and their associated captions and provide those as output to the end-user.We test our system both over a public dataset of computer science documents previously used by others, and over two newly collected sets of publications focusing on the biomedical domain. Our experiments and results comparing PDFigCapX to other state-of-the-art systems show a significant improvement in performance, and demonstrate the effectiveness and robustness of our approach.Availability and implementationOur system is publicly available for use at: https://www.eecis.udel.edu/~compbio/PDFigCapX. The two new datasets are available at: https://www.eecis.udel.edu/~compbio/PDFigCapX/Downloads
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.