With around 300 millions people worldwide suffering from depression, the detection of this disorder is crucial and a challenge for individual and public health. As with many diseases, early detection means better medical management; the use of social media messages as potential clues to depression is an opportunity to assist in this early detection by automatic means. This chapter is based on the participation of the CNRS IRIT laboratory in the early detection of depressive people (e-Risk) task at the CLEF evaluation forum. Early depression detection differs from depression detection in that it considers temporality; the system must make its decision about a user's possible depression with as little data as possible. In this chapter we reevaluate the models we have developed for our participation at e-Risk over the years on the different collections, to obtain a more robust comparison. We also add new models. We use well-established classification methods, such as Logistic regression, Random forest, and Support vector machine. The input data of the users, the system should detect if they are depressed, are represented as vectors composed of (a) various task-oriented features including depression related lexicons and (b) word and document embeddings, extracted from the users' posts. We perform an ablation study to analyze the most important features for our models. We also use BERT deep learning architecture for comparison purposes, both for depression detection and early depression detection. According to our results, well-established machine learning models are still better than more modern models for -early-detection of depression.
In information retrieval systems, search parameters are optimized to ensure high effectiveness based on a set of past searches and these optimized parameters are then used as the search configuration for all subsequent queries. A better approach, however, would be to adapt the parameters to fit the query at hand. Selective query expansion is one such an approach, in which the system decides automatically whether or not to expand the query, resulting in two possible search configurations. This approach was extended recently to include many other parameters, leading to many possible search configurations where the system automatically selects the best configuration on a per-query basis. One problem with this approach is the system training which requires evaluation of each training query with every possible configuration. In real-world systems, so many parameters and possible values must be evaluated that this approach is impractical, especially when the system must be updated frequently, as is the case for commercial search engines. In general, the more configurations, the greater the effectiveness when configuration selection is appropriate but also the greater the risk of decreasing effectiveness in the case of an inappropriate configuration selection. To determine the ideal configurations to use on a per-query basis in real-world systems we developed a method in which a restricted number of possible configurations is pre-selected and then used in a meta-search engine that decides the best search configuration on a per query basis. We define a risk-sensitive approach for configuration pre-selection that considers the risk-reward trade-off between the number of configurations kept, and system effectiveness. We define two alternative risk functions to apply to different goals. For final configuration selection, the decision is based on query feature similarities. We compare two alternative risk functions on two query types: ad hoc and diversity and compare these to more sophisticated machine learning-based methods. We find that a relatively small number of configurations (20) selected by our risk-sensitive model is sufficient to obtain results close to the best achievable results for each query. Effectiveness is increased by about 15% according to the P@10 and nDCG@10 evaluation metrics when compared to traditional grid search using a single configuration and by about 20% when compared to learning to rank documents. Our risk-sensitive approach works for both diversity- and ad hoc-oriented searches. Moreover, the similarity-based selection method outperforms the more sophisticated approaches. Thus, we demonstrate the feasibility of developing per-query information retrieval systems, which will guide future research in this direction.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.