This paper presents an unsupervised method for assembling semantic knowledge from a part-ofspeech tagged corpus using graph algorithms. The graph model is built by linking pairs of words which participate in particular syntactic relationships. We focus on the symmetric relationship between pairs of nouns which occur together in lists. An incremental cluster-building algorithm using this part of the graph achieves 82% accuracy at a lexical acquisition task, evaluated against WordNet classes. The model naturally realises domain and corpus specific ambiguities as distinct components in the graph surrounding an ambiguous word.
Thoughts and ideas are multidimensional and often concurrent, yet they can be expressed surprisingly well sequentially by the translation into language. This reduction of dimensions occurs naturally but requires memory and necessitates the existence of correlations, e.g., in written text. However, correlations in word appearance decay quickly, while previous observations of long-range correlations using random walk approaches yield little insight on memory or on semantic context. Instead, we study combinations of words that a reader is exposed to within a ''window of attention,'' spanning about 100 words. We define a vector space of such word combinations by looking at words that co-occur within the window of attention, and analyze its structure. Singular value decomposition of the co-occurrence matrix identifies a basis whose vectors correspond to specific topics, or ''concepts'' that are relevant to the text. As the reader follows a text, the ''vector of attention'' traces out a trajectory of directions in this ''concept space.'' We find that memory of the direction is retained over long times, forming power-law correlations. The appearance of power laws hints at the existence of an underlying hierarchical network. Indeed, imposing a hierarchy similar to that defined by volumes, chapters, paragraphs, etc. succeeds in creating correlations in a surrogate random text that are identical to those of the original text. We conclude that hierarchical structures in text serve to create long-range correlations, and use the reader's memory in reenacting some of the multidimensionality of the thoughts being expressed.hierarchy ͉ language ͉ power laws ͉ singular value decomposition L anguage is a central link through which we interact with other people. As a channel of communication it is limited by our physical ability to speak only one word at a time. The question arises therefore how the complex products of our brain are transformed into the linear string of words that comprise speech or text. Since our mental processes are far from being onedimensional, the use of memory is essential, as is the existence of some type of correlations in time.Such questions have a long and intense history. Bolzano (1) already noted the need for specific organization in scientific texts, while Ingarden devotes his book (2) to understanding the process by which a text is understood and assimilated. Modern methods (3, 4) combine the work of linguists with those of computer scientists, physicists, physiologists, and researchers from many other fields to cover a wide range of texts, from the phoneme (5), going on to words (6-9, ʈ) and grammar (10, 11), and all of the way to global text analysis (12) and the evolution of language (13,14).Recent interest has focused on applying methods of statistical physics to identify possible trends and correlations in text (15-18). In ref. 18, for example, the authors study the distribution of words across different works by the same authors, combining notions of information, entropy, and statistics to def...
This paper presents an unsupervised algorithm which automatically discovers word senses from text. The algorithm is based on a graph model representing words and relationships between them. Sense clusters are iteratively computed by clustering the local graph of similar words around an ambiguous word. Discrimination against previously extracted sense clusters enables us to discover new senses. We use the same data for both recognising and resolving ambiguity.
The lack of an efficient modelling-simulation-analysis workflow for creating and utilising detailed subject-specific computational models is one of the key reasons why simulation-based approaches for analysing socket-stump interaction have not yet been successfully established. Herein, we propose a novel and efficient modelling-simulation-analysis workflow that uses commercial software for generating a detailed subject-specific, three-dimensional finite element model of an entire residual limb from Diffusion Tensor MRI images in <20 min. Moreover, to complete the modelling-simulation-analysis workflow, the generated subject-specific residual limb model is used within an implicit dynamic FE simulation of bipedal stance to predict the potential sites of deep tissue injury. For this purpose, a nonlinear hyperelastic, transversely isotropic skeletal muscle constitutive law containing a deep tissue injury model was implemented in LS-DYNA. To demonstrate the feasibility of the entire modelling-simulation-analysis workflow and the fact that detailed, anatomically realistic, multi-muscle models are superior to state-of-the-art, fused-muscle models, an implicit dynamic FE analysis of 2-h bipedal stance is carried out. By analysing the potential volume of damaged muscle tissue after donning an optimally-fitted and a misfitted socket, i.e., a socket whose volume was isotropically shrunk by 10%, we were able to highlight the differences between the detailed individual- and fused-muscle models. The results of the bipedal stance simulation showed that peak stresses in the fused-muscle model were four times lower when compared to the multi-muscle model. The peak interface stress in the individual-muscle model, at the end of bipedal stance analysis, was 2.63 times lower than that in the deep tissues of the stump. At the end of the bipedal stance analysis using the misfitted socket, the fused-muscle model predicted that 7.65% of the residual limb volume was injured, while the detailed-model predicted 16.03%. The proposed approach is not only limited to modelling residual limbs but also has applications in predicting the impact of plastic surgery, for detailed forward-dynamics simulations of normal musculoskeletal systems.
This paper describes a technique for extracting idioms from text. The technique works by finding patterns such as "thrills and spills", whose reversals (such as "spills and thrills") are never encountered. This method collects not only idioms, but also many phrases that exhibit a strong tendency to occur in one particular order, due apparently to underlying semantic issues. These include hierarchical relationships, gender differences, temporal ordering, and prototype-variant effects.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.