Objective The personal statement is often an underutilized aspect of pediatric otolaryngology fellowship applications. In this pilot study, we use deep learning language models to cluster personal statements and elucidate their relationship to applicant rank position and postfellowship research output. Study Design Retrospective cohort. Setting Single pediatric tertiary care center. Methods Data and personal statements from 115 applicants to our fellowship program were retrieved from San Francisco Match. BERT (Bidirectional Encoder Representations From Transformers) was used to generate document embeddings for clustering. Regression and machine learning models were used to assess the relationship of personal statements to number of postfellowship publications per year when controlling for publications, board scores, Alpha Omega Alpha status, gender, and residency. Results Document embeddings of personal statements were found to cluster into 4 distinct groups by K-means clustering: 2 focused on “training/research” and 2 on “personal/patient anecdotes.” Training clusters 1 and 2 were associated with an applicant-organization fit by a single pediatric otolaryngology fellowship program on univariate but not multivariate analysis. Models utilizing document embeddings alone were able to equally predict applicant-organization fit (receiver operating characteristic areas under the curve, 0.763 and 0.750 vs 0.419; P values >.05) as compared with models utilizing applicant characteristics and personal statement clusters alone. All predictive models were poor predictors of postfellowship publications per year. Conclusion We demonstrate ability for document embeddings to capture meaningful information in personal statements from pediatric otolaryngology fellowship applicants. A larger study can further differentiate personal statement clusters and assess the predictive potential of document embeddings.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.