Introduction: Psychiatric disorders are diagnosed according to diagnostic criteria such as the DSM-5 and ICD-11. Basically, psychiatrists extract symptoms and make a diagnosis by conversing with patients. However, such processes often lack objectivity. In contrast, specific linguistic features can be observed in some psychiatric disorders, such as a loosening of associations in schizophrenia. The purposes of the present study are to quantify the language features of psychiatric disorders and neurocognitive disorders using natural language processing and to identify features that differentiate disorders from one another and from healthy subjects.
Methods: This study will have a multi-center prospective design. Major depressive disorder, bipolar disorder, schizophrenia, anxiety disorder including obsessive compulsive disorder and, major and minor neurocognitive disorders, as well as healthy subjects will be recruited. A psychiatrist or psychologist will conduct 30-to-60-min interviews with each participant and these interviews will be recorded using a microphone headset. In addition, the severity of disorders will be assessed using clinical rating scales. Data will be collected from each participant at least twice during the study period and up to a maximum of five times.
Discussion: The overall goal of this proposed study, the Understanding Psychiatric Illness Through Natural Language Processing (UNDERPIN), is to develop objective and easy-to-use biomarkers for diagnosing and assessing the severity of each psychiatric disorder using natural language processing. As of August 2021, we have collected a total of >900 datasets from >350 participants. To the best of our knowledge, this data sample is one of the largest in this field.