Many research designs require the assessment of inter-rater reliability (IRR) to demonstrate consistency among observational ratings provided by multiple coders. However, many studies use incorrect statistical procedures, fail to fully report the information necessary to interpret their results, or do not address how IRR affects the power of their subsequent analyses for hypothesis testing. This paper provides an overview of methodological issues related to the assessment of IRR with a focus on study design, selection of appropriate statistics, and the computation, interpretation, and reporting of some commonly-used IRR statistics. Computational examples include SPSS and R syntax for computing Cohen's kappa and intra-class correlations to assess IRR.The assessment of inter-rater reliability (IRR, also called inter-rater agreement) is often necessary for research designs where data are collected through ratings provided by trained or untrained coders. However, many studies use incorrect statistical analyses to compute IRR, misinterpret the results from IRR analyses, or fail to consider the implications that IRR estimates have on statistical power for subsequent analyses. This paper will provide an overview of methodological issues related to the assessment of IRR, including aspects of study design, selection and computation of appropriate IRR statistics, and interpreting and reporting results. Computational examples include SPSS and R syntax for computing Cohen's kappa for nominal variables and intraclass correlations (ICCs) for ordinal, interval, and ratio variables. Although it is beyond the scope of the current paper to provide a comprehensive review of the many IRR statistics that are available, references will be provided to other IRR statistics suitable for designs not covered in this tutorial.
BackgroundMobile apps for mental health have the potential to overcome access barriers to mental health care, but there is little information on whether patients use the interventions as intended and the impact they have on mental health outcomes.ObjectiveThe objective of our study was to document and compare use patterns and clinical outcomes across the United States between 3 different self-guided mobile apps for depression.MethodsParticipants were recruited through Web-based advertisements and social media and were randomly assigned to 1 of 3 mood apps. Treatment and assessment were conducted remotely on each participant’s smartphone or tablet with minimal contact with study staff. We enrolled 626 English-speaking adults (≥18 years old) with mild to moderate depression as determined by a 9-item Patient Health Questionnaire (PHQ-9) score ≥5, or if their score on item 10 was ≥2. The apps were (1) Project: EVO, a cognitive training app theorized to mitigate depressive symptoms by improving cognitive control, (2) iPST, an app based on an evidence-based psychotherapy for depression, and (3) Health Tips, a treatment control. Outcomes were scores on the PHQ-9 and the Sheehan Disability Scale. Adherence to treatment was measured as number of times participants opened and used the apps as instructed.ResultsWe randomly assigned 211 participants to iPST, 209 to Project: EVO, and 206 to Health Tips. Among the participants, 77.0% (482/626) had a PHQ-9 score >10 (moderately depressed). Among the participants using the 2 active apps, 57.9% (243/420) did not download their assigned intervention app but did not differ demographically from those who did. Differential treatment effects were present in participants with baseline PHQ-9 score >10, with the cognitive training and problem-solving apps resulting in greater effects on mood than the information control app (χ22=6.46, P=.04).ConclusionsMobile apps for depression appear to have their greatest impact on people with more moderate levels of depression. In particular, an app that is designed to engage cognitive correlates of depression had the strongest effect on depressed mood in this sample. This study suggests that mobile apps reach many people and are useful for more moderate levels of depression.ClinicalTrialClinicaltrials.gov NCT00540865; https://www.clinicaltrials.gov/ct2/show/NCT00540865 (Archived by WebCite at http://www.webcitation.org/6mj8IPqQr)
Background The rate of participant attrition in alcohol clinical trials is often substantial and can cause significant issues with regard to the handling of missing data in statistical analyses of treatment effects. It is common for researchers to assume that missing data is indicative of participant relapse and under that assumption many researchers have relied on setting all missing values to the worst case scenario for the outcome (e.g., missing=heavy drinking). This sort of single imputation method has been criticized for producing biased results in other areas of clinical research, but has not been evaluated within the context of alcohol clinical trials and many alcohol researchers continue to use the missing=heavy drinking assumption. Methods Data from the COMBINE study, a multisite randomized clinical trial, were used to generate simulated situations of missing data under a variety of conditions and assumptions. We manipulated the sample size (n = 200, n = 500, and n = 1000) and dropout rate (5%, 10%, 25%, 30%) under three missing data assumptions (missing completely at random, missing at random, missing not at random). We then examined the association between receiving naltrexone and heavy drinking during the first 10 weeks following treatment using five methods for treating missing data (complete case analysis, last observation carried forward, missing=heavy drinking, multiple imputation, and full information maximum likelihood). Results Complete case analysis, last observation carried forward, and missing=heavy drinking produced the most biased naltrexone effect estimates and standard errors under conditions that are likely to exist in randomized clinical trials. Multiple imputation and maximum likelihood produced the least biased naltrexone effect estimates and standard errors. Conclusions Assuming that missing=heavy drinking produces biased results of the treatment effect and should not be used to evaluate treatment effects in alcohol clinical trials.
BackgroundAlcohol use disorder (AUD) is a highly prevalent public health problem associated with considerable individual and societal costs. Abstinence from alcohol is the most widely accepted target of treatment for AUD, but it severely limits treatment options and could deter individuals who prefer to reduce their drinking from seeking treatment. Clinical validation of reduced alcohol consumption as the primary outcome of alcohol clinical trials is critical for expanding treatment options. One potentially useful measure of alcohol treatment outcome is a reduction in the World Health Organization (WHO, International Guide for Monitoring Alcohol Consumption and Related Harm. Geneva, Switzerland, 2000) risk levels of alcohol use (very high risk, high risk, moderate risk, and low risk). For example, a 2‐shift reduction in WHO risk levels (e.g., high risk to low risk) has been used by the European Medicines Agency (2010, Guideline on the Development of Medicinal Products for the Treatment of Alcohol Dependence. UK) to evaluate nalmefene as a treatment for alcohol dependence (AD; Mann et al. 2013, Biol Psychiatry 73, 706–13).MethodsThe current study was a secondary data analysis of the COMBINE study (n = 1,383; Anton et al., 2006) to examine the association between reductions in WHO risk levels and reductions in alcohol‐related consequences and mental health symptoms during and following treatment in patients with AD.ResultsAny reduction in WHO risk drinking level during treatment was associated with significantly fewer alcohol‐related consequences and improved mental health at the end of treatment and for up to 1 year posttreatment. A greater reduction in WHO risk drinking level predicted a greater reduction in consequences and greater improvements in mental health.ConclusionsChanges in WHO risk levels appear to be a valid end point for alcohol clinical trials. Based on the current findings, reductions in WHO risk drinking levels during treatment reflect meaningful reductions in alcohol‐related consequences and improved functioning.
Some individuals who engage in heavy drinking following treatment for alcohol use disorder may function as well as those who are mostly abstinent with respect to psychosocial functioning, employment, life satisfaction and mental health.
Motivational Interviewing (MI) is an efficacious treatment for substance use disorders and other problem behaviors. Studies on MI fidelity and mechanisms of change typically use human raters to code therapy sessions, which requires considerable time, training, and financial costs. Natural language processing techniques have recently been utilized for coding MI sessions using machine learning techniques, rather than human coders, and preliminary results have suggested these methods hold promise. The current study extends this previous work by introducing two natural language processing models for automatically coding MI sessions via computer. The two models differ in the way they semantically represent session content, utilizing either 1) simple discrete sentence features (DSF model) and 2) more complex recursive neural networks (RNN model). Utterance- and session-level predictions from these models were compared to ratings provided by human coders using a large sample of MI sessions (N = 341 sessions; 78,977 clinician and client talk turns) from 6 MI studies. Results show that the DSF model generally had slightly better performance compared to the RNN model. The DSF model had “good” or higher utterance-level agreement with human coders (Cohen’s kappa > 0.60) for open and closed questions, affirm, giving information, and follow/neutral (all therapist codes); considerably higher agreement was obtained for session-level indices, and many estimates were competitive with human-to-human agreement. However, there was poor agreement for client change talk, client sustain talk, and therapist MI-inconsistent behaviors. Natural language processing methods provide accurate representations of human derived behavioral codes and could offer substantial improvements to the efficiency and scale in which MI mechanisms of change research and fidelity monitoring are conducted.
Background. Abstinence and no heavy drinking days are currently the only Food and Drug Administration (FDA) approved endpoints in clinical trials for alcohol use disorder (AUD). Many individuals who fail to meet these criteria may substantially reduce their drinking during treatment and most individuals with AUD prefer drinking reduction goals. One- and two-level reductions in World Health Organization (WHO) drinking risk levels have been proposed as alternative endpoints that reflect reduced drinking and are associated with reductions in drinking consequences, improvements in mental health, and reduced risk of developing alcohol dependence. The current study examined the association between WHO drinking risk level reductions and improvements in physical health and quality of life in a sample of individuals with alcohol dependence. Methods. Secondary data analysis of individuals with alcohol dependence (n=1142) enrolled in the longitudinal, prospective COMBINE study (Anton et al. 2006), a multi-site randomized placebo-controlled clinical trial, examining the association between reductions in WHO drinking risk levels and change in blood pressure, liver enzyme levels, and self-reported quality of life following treatment for alcohol dependence. Results. One- and two-level reductions in WHO drinking risk level during treatment were associated with significant reductions in systolic blood pressure (p<0.001), improvements in liver enzyme levels (all p<0.01), and significantly better quality of life (p<0.001). Discussion. One- and two-level reductions in WHO drinking risk levels predicted significant improvements in markers of physical health and quality of life, suggesting that the WHO drinking risk level reduction could be a meaningful surrogate marker of improvements in how a person “feels and functions” (FDA, 2015) following treatment for alcohol dependence. The WHO drinking risk levels could be useful in medical practice for identifying drinking reduction targets that correspond with clinically significant improvements in health and quality of life.
Couple therapy for women with alcohol use disorders (AUDs) yields positive drinking outcomes, but many women prefer individual to conjoint treatment. The present study compared conjoint cognitive behavioral therapy for women with AUDs to a blend of individual and conjoint therapy. Participants were 59 women with AUDs (95% Caucasian, mean age = 46 years) and their male partners randomly assigned to 12 sessions of Alcohol Behavioral Couple Therapy (ABCT) or to a blend of five individual CBT sessions and seven sessions of ABCT (Blended-ABCT). Drinking and relationship satisfaction were assessed during and for one year post-treatment. Treatment conditions did not differ significantly on number of treatment sessions attended, percent of drinking days (PDD), or heavy drinking days (PDH), during or in the 12 months following treatment. However, effect size estimates suggested a small to moderate effect of Blended-ABCT over ABCT in number of treatment sessions attended, d=−.41, and first- and second-half within treatment PDD, d=−.41, d=−.28, and PDH, d=−.46, d=−.38. Moderator analyses found that women lower in baseline sociotropy had lower PDH across treatment weeks 1–8 in Blended-ABCT than ABCT and that women lower in self-efficacy had lower PDH during follow-up in Blended-ABCT than ABCT. The two treatment groups did not differ significantly in within-treatment or post-treatment relationship satisfaction. Results suggest that blending individual and conjoint treatment yields similar or slightly better outcomes than ABCT, is responsive to women’s expressed desire for individual sessions as part of their treatment, and decreases the challenges of scheduling conjoint sessions.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.