Provision of mental health care is almost entirely built on a singular medium -naturally occurring spoken-language conversations. However, datasets of spoken language from patients experiencing mental health issues are surprisingly difficult to obtain. In this commentary, we discuss some of the reasons behind this, and highlight successful approaches adopted in other areas of clinical linguistics and pose some ways forward, especially for the study of psychosis.
Barriers to sharing speech dataAcross disciplines, researchers are rapidly adopting Open Science principles for datasharing. This movement encourages researchers, clinicians, and institutions to provide fully open access to research data, programs, and publications. For example the National Institutes of Health's Strategic Plan for Data Science requires that newlyfunded research projects share data in accord with the FAIR principles [1] for open access and that they include in their budget requests for the resources necessary to complete open access. Although many disciplines, funding agencies, researchers, journals, libraries, and institutions have adopted this new model, the movement has also encountered significant resistance, particularly for open sharing of spoken language data, including spoken language data from clinical populations (SLDCP). We can identify at least six barriers to open sharing of SLDCP [2]. Some of these barriers come from interpretation of regulations by various institutions, while others pertain to the prevailing public perception regarding SLDCP. Here we consider each of these barriers and ways in which systems such as TalkBank [3] or Databrary[4] manage to overcome them. With emerging collaborative efforts to study language in psychosis (e.g., https://discourseinpsychosis.org/), we anticipate the commentary here to eventually inform 'speech bank' infrastructures for psychiatric disorders.1. Informed consent. A frequent objection to the sharing of SLDCP is that it violates participants' rights of privacy and confidentiality. Such usage would be a violation if there had been no informed consent from the participants for sharing of their data -this is unfortunately the case for many existing speech samples from clinical populations, precluding retrospective sharing. In these cases, recontacting participants to obtain consent for data-sharing is an option, if a consent for such re-contact is in place. In the absence of consent to re-contact, IRBs may be able to grant a 'waiver', i.e. modifying the initial consent parameters (see https://conp.ca/ethics-toolkit/). Some national laws also provide alternatives for re-consenting for scientific purposes [5] . Explicitly stating in informed consent forms that the data will be made available to qualified researchers (holding an identifiable position in an academic or research enterprise wherein research activities are governed by a code of conduct on academic integrity), and that it can be removed from a sharing portal if the participant requests removal, will address this barrier. Qualif...