The purpose of the study was to develop a comprehensive program evaluation instrument. Following pilot work with residents, a 69-item instrument consisting of statements with 5-point strongly agree to strongly disagree response options was distributed to 107 residents; 104 responded. Psychometric analyses revealed no ceiling or floor effects; 9 items were deleted. R esident-generated program evaluation can identify strengths and weaknesses of a training program through residents' eyes, document the educational climate of the program, and give residents a voice in program structure. [1][2][3][4][5][6] Despite the importance of evaluation, comprehensive instruments are difficult to find. Many of those that exist are directed at specific aspects of residency training such as its stress, and the impact of introducing night-float systems or other changes. 3,[7][8][9][10][11] We describe the development of a program evaluation instrument that expands on three previously identified domains-workload, education, and lifestyle-to include specific factors that might be targeted for intervention. 12,13
METHODSA MEDLINE search using the search words "residency" and "program evaluation" from 1991 to 1996 identified 145 citations. One evaluation instrument developed by Seelig was found among these citations. 12,13 The 33-item questionnaire assessing resident satisfaction with workload, learning environment, and stress served as the foundation of our evaluation instrument. We supplemented this initial list with other questions based on our clinical and educational experiences.The initial draft instrument contained 63 items. All questions were written in statement form. Response options were formatted as a 5-point Likert-type scale, with strongly disagree to strongly agree scored from 1 to 5, respectively.The questionnaire was pilot tested on 20 residents and faculty members to judge completeness, readability, and accuracy in reflecting the residents' perceptions. On the basis of the pilot test responses, 6 questions were added. The final instrument contained 69 evaluation items and 11 demographic questions (Appendix A). It was distributed in May 1995 to all 107 residents in one university program; 104 residents responded.Psychometric analyses focused on examining the frequency distributions of items to identify those with large amounts of missing data, ensure that distributions were interpretable, identify items that were "reverse" scored, and look for items with ceiling effects. Following Seelig's conception of subscales, the workload subscale had 19 items, the educational environment subscale had 29 items, and the lifestyle subscale had 12 items. For each subscale we calculated an internal consistency reliability coefficient (Cronbach's ␣ ) and disattenuated correlations among the subscale scores. 14 When analyzing subscale scores, we substituted item mean values based on the responses of all who answered the item for missing data.Exploratory analyses consisted of comparing the three subscale mean scores, shown as the percent...