Abstract. Complex numbers are a fundamental aspect of the mathematical formalism of quantum physics. Quantum-like models developed outside physics often overlooked the role of complex numbers. Specifically, previous models in Information Retrieval (IR) have ignored complex numbers. In this paper, we argue that in order to advance the use of quantum models of IR, one has to lift the constraint of real-valued representations of the information space, and package more information within the representation by means of complex numbers. As a first attempt, we propose a complex-valued representation for IR, which explicitly uses complex valued Hilbert spaces, and thus where terms, documents and queries are represented as complex-valued vectors. The proposal consists of integrating distributional semantics evidence within the real component of a term vector; whereas, ontological information is encoded in the imaginary component. Our proposal has the merit of lifting the role of complex numbers from a computational byproduct of the model to the very mathematical texture that unifies different levels of semantic information. An empirical instantiation of our proposal is tested in the TREC Medical Record task of retrieving cohorts for clinical studies.
Information foraging connects optimal foraging theory in ecology with how humans search for information. The theory suggests that, following an information scent, the information seeker must optimize the tradeoff between exploration by repeated steps in the search space vs. exploitation, using the resources encountered. We conjecture that this tradeoff characterizes how a user deals with uncertainty and its two aspects, risk and ambiguity in economic theory. Risk is related to the perceived quality of the actually visited patch of information, and can be reduced by exploiting and understanding the patch to a better extent. Ambiguity, on the other hand, is the opportunity cost of having higher quality patches elsewhere in the search space. The aforementioned tradeoff depends on many attributes, including traits of the user: at the two extreme ends of the spectrum, analytic and wholistic searchers employ entirely different strategies. The former type focuses on exploitation first, interspersed with bouts of exploration, whereas the latter type prefers to explore the search space first and consume later. Our findings from an eye-tracking study of experts' interactions with novel search interfaces in the biomedical domain suggest that user traits of cognitive styles and perceived search task difficulty are significantly correlated with eye gaze and search behavior. We also demonstrate that perceived risk shifts the balance between exploration and exploitation in either type of users, tilting it against vs. in favor of ambiguity minimization. Since the pattern of behavior in information foraging is quintessentially sequential, risk and ambiguity minimization cannot happen simultaneously, leading to a fundamental limit on how good such a tradeoff can be. This in turn connects information seeking with the emergent field of quantum decision theory.
Abstract. Spectral theory in mathematics is key to the success of as diverse application domains as quantum mechanics and latent semantic indexing, both relying on eigenvalue decomposition for the localization of their respective entities in observation space. This points at some implicit "energy" inherent in semantics and in need of quantification. We show how the structure of atomic emission spectra, and meaning in concept space, go back to the same compositional principle, plus propose a tentative solution for the computation of term, document and collection "energy" content.
Education and training of morphology for medical students, and professionals specializing in pediatric cardiology and surgery has traditionally been based on hands-on encounter with congenitally malformed cardiac specimens. Large international archives are no longer widely available due to stricter data protection rules, a reduced number of autopsies, attrition rate of existing specimens, and most importantly due to a higher survival rate of patients. Our Cardiac Archive houses about 400 cardiac specimens with congenital heart disease. The collection spans almost 60 years and thus goes back to pre-surgical era.Unfortunately, attrition rate due to desiccation has led to an increased natural decay in recent years. The present multi-institutional project focuses on saving the collection by digitization. Specimens are scanned by high-resolution micro-CT/MRI. Virtual 3D-models are segmented and a comprehensive database is built.We now report an initial feasibility study with six test specimens that provided promising results, however, adequate presentation of the intracardiac anatomy, including septa and cardiac valves requires further refinements. Computer assisted design methods are necessary to overcome consequences of pathological examination, shrinkage and/or distortion of the specimens. For a next step, we anticipate an expandable webbased virtual museum with interactive reference and training tools. Web access for professional third parties will be provided by registration/subscription. In a future phase, segmental wall motion data could be added to virtual models. 3D-printed models may replace actual specimens and serve as hands-on surgical training to elucidate complex morphologies, promote surgical emulation, and extract more accurate procedural knowledge based on such a collection.
Abstract. With insight from linguistics that degrees of text cohesion are similar to forces in physics, and the frequent use of the energy concept in text categorization by machine learning, we consider the applicability of particle-wave duality to semantic content inherent in index terms. Wave-like interpretations go back to the regional nature of such content, utilizing functions for its representation, whereas content as a particle can be conveniently modelled by position vectors. Interestingly, wave packets behave like particles, lending credibility to the duality hypothesis. We show in a classical mechanics framework how metaphorical term mass can be computed.
Various lexical resource-based (Budanitsky and Hirst, 2006) and distributional measures (Mohammad and Hirst, 2005) have been proposed to measure semantic relatedness and distance between terms. Terms can be corpus-or genre-specific. Manually constructed general-purpose lexical resources include many usages that are infrequent in a particular corpus or genre of documents. For example, one of the 8 senses of company in WordNet is a visitor/visitant, which is a hyponym of person. This usage of the term is practically never used in newspaper articles, hence distributional attributes should be taken into consideration. Composite measures that combine the advantages of both approaches have also been developed (Resnik, 1995;Jiang and Conrath, 1997). However, these measures focused on pairwise relations, and did not consider similarity between more than two terms. This paper proposes an algorithm for a semantic ordering of terms to support text representation, information retrieval (IR) and text classification.Ordering of terms based on semantic relatedness offers an answer to the simple question, can statistical term weighting be eclipsed? Namely, variants of weighting schemes based on term occurrences and co-occurrences dominate the IR and machine learning scenes (Manning et al., 2008). However, the connection between statistics and word semantics is in general not understood very well (Hofmann, 1999). In other words, a systematic discussion of mappings between theories of word meaning and modelling them by mathematical objects is missing for the time being.By assigning specific scalar values to terms in an ontology, terms represented by sets of geometric coordinates can be outdone. Such values result from a one-dimensional ordering based on the idea of a sense-preserving distance between terms in a conceptual hierarchy. Such distance values can be combined with term occurrence and co-occurrence based weighting, modelling the unification of several major theories of word meaning, and thereby qualify for the adjective semantic weighting.Let V denote a set of terms {t 1 , t 2 , . . . , t n } and let d(t i , t j ) denote the semantic distance between the terms t i and t j . Let G = (V, E) denote a weighted undirected graph, where the weights on the set E are defined by the distances between the terms. Finding a semantic ordering of terms can be translated to a graph problem: a minimum-weight Hamiltonian path E of G gives the ordering by reading the nodes from one end of the path to the other. A nearest neighbor heuristic can be used to achieve computational feasibility.Integrating lexical resources into an upgraded semantic weighting scheme that could eventually replace, or at least augment, statistical term weighting is a prospect that cannot be overlooked while experimenting with the idea of the Semantic Web. One can foresee that such resources may become publicly available computing resources over time. By assigning specific scalar values to terms in an ontology, they can be conceived as e.g. band lines in a conceptua...
Scientific computations have been using GPU-enabled computers successfully, often relying on distributed nodes to overcome the limitations of device memory. Only a handful of text mining applications benefit from such infrastructure. Since the initial steps of text mining are typically data-intensive, and the ease of deployment of algorithms is an important factor in developing advanced applications, we introduce a flexible, distributed, MapReducebased text mining workflow that performs I/O-bound operations on CPUs with industry-standard tools and then runs compute-bound operations on GPUs which are optimized to ensure coalesced memory access and effective use of shared memory. We have performed extensive tests of our algorithms on a cluster of eight nodes with two NVidia Tesla M2050 attached to each, and we achieve considerable speedups for random projection and self-organizing maps.
Abstract. We introduce Claude Lévi Strauss' canonical formula (CF), an attempt to rigorously formalise the general narrative structure of myth. This formula utilises the Klein group as its basis, but a recent work draws attention to its natural quaternion form, which opens up the possibility that it may require a quantum inspired interpretation. We present the CF in a form that can be understood by a non-anthropological audience, using the formalisation of a key myth (that of Adonis) to draw attention to its mathematical structure. The future potential formalisation of mythological structure within a quantum inspired framework is proposed and discussed, with a probabilistic interpretation further generalising the formula.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.