Text is a very important type of data within the biomedical domain. For example, patient records contain large amounts of text which has been entered in a non-standardized format, consequently posing a lot of challenges to processing of such data. For the clinical doctor the written text in the medical findings is still the basis for decision makingneither images nor multimedia data. However, the steadily increasing volumes of unstructured information need machine learning approaches for data mining, i.e. text mining. This paper provides a short, concise overview of some selected text mining methods, focusing on statistical methods, i.e. Latent Semantic Analysis, Probabilistic Latent Semantic Analysis, Latent Dirichlet Allocation, Hierarchical Latent Dirichlet Allocation, Principal Component Analysis, and Support Vector Machines, along with some examples from the biomedical domain. Finally, we provide some open problems and future challenges, particularly from the clinical domain, that we expect to stimulate future research.
Anticipating repliers in online conversations is a fundamental challenge for computer mediated communication systems which aim to make textual, audio and/or video communication as natural as face to face communication. The massive amounts of data that social media generates has facilitated the study of online conversations on a scale unimaginable a few years ago. In this work we use data from Twitter to explore the predictability of repliers, and investigate the factors which influence who will reply to a message. Our results suggest that social factors, which describe the strength of relations between users, are more useful than topical factors. This indicates that Twitter users' reply behavior is more impacted by social relations than by topics. Finally, we show that a binary classification model, which differentiates between users who will and users who will not reply to a certain message, may achieve an F1-score of 0.74 when using social features.
In view of the high number of deaths and complication rates of major surgical procedures worldwide, surgical safety is described as a substantial global public-health concern. Naturally, patient safety has become an international priority. The increasing amount of electronically available clinical documents holds great potential for the computational analysis of large repositories. However, most of this data is in textual form and the clinical domain is a challenging field for the appliance of natural language processing. This is particularly the case if you deal with a language other than English, due to the little attention from the international research community. In this project, we are concerned with the utilization of a Germanspeaking operative report repository for the purpose of risk management and patient safety research. In this particular paper we focus on the description of our information retrieval approach. We investigated the thought process of a domain expert in order to derive his information of interest and describe a facet-based way to navigate this kind of information in the form of extracted phrases. Initial results and feedback has been very promising, but a formal evaluation is still missing.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.