Abstract
Objective. The in-training evaluation report (ITER) is widely used to assess clinical skills, but has limited validity and reliability. The purpose of our study was to assess the feasibility, validity, reliability, and effect on feedback of using daily evaluation forms to evaluate residents in ambulatory rheumatology clinics.
Methods. An evaluation form was developed based on the Royal College of Physicians and Surgeons of Canada CanMEDS roles. There were 12 evaluation items including overall clinical competence. They were rated on a 5-point scale from unsatisfactory to outstanding. All internal medicine residents rotating on rheumatology were strongly encouraged to provide the form to their preceptor at the end of each clinic. A questionnaire was administered to residents and faculty.
Results. Seventy-three internal medicine residents completed a 1-month rotation at University of Ottawa (n = 26) and McMaster University (n = 47). Faculty members completed a total of 637 evaluation forms. The number of evaluation forms ranged from 2 to 16 (mean 8.73) per resident. At an average of 8.73 forms per resident the reliability was 0.71 for the composite score. Fourteen forms would be required for a reliability of 0.8. The correlation between the objective structured clinical examination scores and the forms was 0.48 (p = not significant). Faculty and residents reported increased feedback following implementation of the forms.
Conclusion. The use of daily evaluation forms is feasible and provides very good reliability. Use of the evaluation forms increases feedback to residents on their performance. The forms were well received by faculty and residents.
In the present model of medical education, students and postgraduate trainees learn their clinical skills by rotating in various clinical settings. The evaluation of trainee performance during these rotations has been challenging. Although there are several performance-based evaluation methods, the in-training evaluation report (ITER) is frequently used to document student or resident performance in day to day practice. An ITER is typically a rating form that is completed at the end of a rotation. ITER allow ongoing assessment of clinical practice performance, but have been noted to have limited reliability and validity. The ITER are often completed by rotation supervisors who have had very little direct contact with the trainee, and are often completed weeks after the end of a rotation1. The retrospective character of the evaluation often leads to a lack of specific examples of students’ strengths and weaknesses with inadequate provision of feedback.
Various evaluation tools such as encounter cards, portfolios, and the mini-clinical evaluation exercise (mini-CEX) have been developed to overcome the deficiencies inherent in the ITER. Encounter cards or interaction cards have been used in various inpatient or combined inpatient and ambulatory rotations. This method involves repeated documentation of performance by multiple observers2. Several articles have reported on the reliabilities of encounter cards. Hatala and colleagues report a reliability of 0.79 for 7.9 cards collected in an inpatient internal medicine 8-week rotation3. Another study from Kuwait found a reliability of 0.91 for 184 cards collected over multiple rotations (internal medicine, surgery, obstetrics and gynecology, and pediatrics) over 12 months4. A study of residents in obstetrics and gynecology determined the reliability was 0.73 for 8 encounter cards5. In this latter study, all encounters were directly observed, as opposed to only 60% in Hatala’s study.
Other studies have evaluated the effect of encounter cards on feedback. Feedback plays an essential role in medical education. Effective feedback should be immediate, specific, and corrective, and occur regularly. The use of encounter cards has the potential to improve the delivery of feedback. Paukert, et al documented that student satisfaction with the feedback process improved significantly with the use of 40 encounter cards during a 12-week surgery rotation6.
In several ambulatory rotations including rheumatology, trainees interact with multiple faculty members over a 4-week rotation. In many centers this differs from inpatient rotations where trainees may work with only 1 or 2 preceptors. Several teaching programs still rely primarily on the ITER as an evaluation method. To our knowledge, there are no studies to date that review the reliability of evaluation cards in a relatively short, strictly ambulatory setting and evaluate both resident and faculty perception on the effect the forms have on feedback.
The purpose of our study was to assess the reliability and effect on feedback of using daily evaluation forms to evaluate internal medicine residents in ambulatory rheumatology clinics in several Canadian universities. Other outcomes included the feasibility and validity of this evaluation method.
MATERIALS AND METHODS
Potential collaborators in Canadian university rheumatology programs were approached. Five programs agreed to participate, but 2 dropped out early for logistical reasons and 1 university dropped out because of an interruption in secretarial support. The medical schools at the University of Ottawa and at McMaster University completed the study. Research ethics board approval was received at all centers.
An evaluation form was developed based on the Royal College of Physicians and Surgeons of Canada CanMEDS roles7. CanMEDS is a framework of essential physician competencies needed for medical education and practice. They are part of the objectives of training and accreditation standards for postgraduate education in Canada. Based on these competencies, the form was developed by the authors, by consensus.
Figure 1 displays the version of the form that was used. There were 11 evaluation items (history, physical examination, clinical judgment, verbal communication, written records, humanistic qualities, collaborator, organization, scholar, advocate, procedural skills) plus a rating of overall clinical competence. A 5-point rating ranging from unsatisfactory to outstanding was associated with each item.
In addition to the 12 items on the form, faculty were asked to record the percentage of their evaluation that was based on direct observation, case review, or written note review. Also, at the top of the form residents were asked to list the diagnoses as well as any procedure performed during the encounter.
For 6 months prior to the implementation of the evaluation forms, a 6-item questionnaire was administered to internal medicine residents rotating through rheumatology and teaching faculty to determine their perceptions of provision of feedback and direct observation on a 5-point scale. Following implementation, a questionnaire was given to all rotating residents and faculty, which included questions from the pre-implementation questionnaire but also questions assessing the usefulness, perceived fairness, and effect on feedback resulting from implementation of the evaluation forms (Tables 2 and 3).
To assess the validity of the forms, the ratings from the evaluation forms were compared to the scores from an annual objective structured clinical examination (OSCE) that residents must complete. This OSCE is for formative purposes but is mandatory for all internal medicine residents. The OSCE consists of 10 stations that test physical examination skills, communication skill, procedural skills, and ability to manage typical general internal medicine scenarios. Measures of validity also included the perceptions of residents and faculty.
Most internal medicine residents complete a 1-month rotation in rheumatology during their core internal medicine training. During the rotation, residents work with multiple faculty members. All internal medicine residents rotating on rheumatology at both medical schools were strongly encouraged to provide the evaluation form to their preceptor at the end of each clinic. Faculty members were encouraged to complete all categories on the forms and to hand in to the rotation coordinator. Clinical faculty were introduced to the form but not formally trained; however, all had many years of teaching experience with internal medicine residents and were familiar with the CanMEDS roles. Forms were collected over an 18-month period at the 2 universities. Residents continued to receive the end of rotation ITER, as these are a requirement of the respective programs.
The number of forms collected from each resident over the month at both sites was recorded. For each form, a composite score was created by averaging the ratings assigned to the 11 evaluation items, and a generalizability analysis using the composite score and the overall rating was conducted to determine the reliability of the forms and the number of forms per resident required to achieve a reliability of 0.80. An independent t–test was used to compare the 6 ratings to 6 items that were identical on the pre- and post-questionnaires.
RESULTS
Seventy-three internal medicine residents completed a 1-month rotation in rheumatology at the University of Ottawa (n = 26) and at McMaster University (n = 47). At the University of Ottawa the percentage of first-year residents was 6.5%, second-year 36.5%, and third-year 57%. At McMaster the breakdown was 32.9%, 27.7%, and 39.5%, respectively. Faculty members completed a total of 637 evaluation forms for the 73 residents. The number of evaluation forms per resident ranged from 2 to 16. The mean number of forms collected at the University of Ottawa was 10.46 and at McMaster 7.76, for an overall mean of 8.73 forms per resident.
Table 1 displays the descriptive statistics for all the cards. As shown, not all items were filled out for each form. Most of the forms involved the assessment of the first 6 items on the form, with fewer forms having ratings for scholarly activity/literature reviews or procedural skills. The mean ratings for the items on the form ranged from 3.7 to 4.1, indicating that on average, the supervisors thought the residents’ performance was above expectations. That said, there was some variation within the items. As shown in Table 1, the ratings for each item ranged from either 2 to 5 or 3 to 5. More importantly, there was considerable variability in the correlations between ratings on the items. These results indicate that raters were willing to give relatively independent ratings for each of the 12 items.
To determine the reliability of the ratings on the forms, a composite score for each form was created by averaging the ratings on the first 11 items. A generalizability analysis was then conducted. For this analysis, the composite rating on each form was nested within resident, with resident treated as a between-subject factor. At an average of 8.73 forms per resident, the generalizability coefficient (g-coefficient) for the forms was 0.71. A decision study showed that it would require an average of 14 forms per resident to achieve a g-coefficient of 0.80. The generalizability analysis was repeated for the single item “overall rating of clinical competence.” The g-coefficient for this item was 0.50, and an average of 33 forms per resident would be required to achieve a g-coefficient of 0.80.
Faculty reported that only 10% of the evaluation was based on direct observation, with 80% resulting from case review and the remaining 10% from a review of the written note.
Resident pre-questionnaire responses (n = 27) were compared to resident post-responses (n = 70) for the first 6 items (see Table 2). Resident responses before the institution of the evaluation forms (pre-) and post-responses were not statistically different for any of the items. Table 1 displays the percentage of residents who agreed or strongly agreed with the comments listed in the post-survey. It appears that residents felt the form was a fair evaluation of their skills and should continue to be used. Over 80% of residents felt that they received more feedback and more timely feedback as a result of the form. Only a small percentage of the residents felt the form was intimidating.
Resident comments on the forms also supported the improved feedback. A few illustrative comments include: “timely feedback on the same day was very helpful”; “probably more accurate assessment, timely feedback and faculty forced to consider evaluation immediately”; “liked most to receive feedback at the end of each clinic and to see the progress”; “I truly appreciate feedback being given in an immediate and constructive fashion... allowed me to improve over course of rotation, as opposed to getting a generic ITER 1–2 month later which has no direct relevance”; “the forms provided an avenue for constructive feedback on an ongoing basis so changes could be implemented during the rotation.”
Faculty pre- and post-questionnaires were compared for the first 6 items that appeared on both forms (see Table 3). There were no statistically significant differences noted except for the statement, “I provide feedback to residents on their clinical skills on a regular basis,” with a score of 3.45 before the institution of the evaluation forms (pre-) versus 4.27 post (p = 0.02). For the statements on the post-form only, the percentage of faculty that agreed or strongly agreed with the statements is shown in Table 3. Sixty-four percent of faculty felt they provided more feedback to the residents as a result of the form and 82% felt the form was well suited to the outpatient setting.
Eight residents involved in our study also completed an OSCE. The number of residents participating in the OSCE was low because many of the residents were completing electives outside the academic center when the OSCE was administered. The Pearson correlation coefficient comparing the overall OSCE score to the composite score from the evaluation forms was 0.48. Although this correlation was not significant (p > 0.05), it is moderately high in magnitude despite the small number of residents and would suggest the encounter cards may be somewhat predictive of performance on the OSCE.
DISCUSSION
Our study is unique because the data were obtained from an ambulatory 1-month rotation of internal medicine residents and the evaluation was primarily based on case review and not direct observation. Direct observation is generally encouraged and increases validity, but requires more faculty time. We believe our study reflects a more realistic representation of what is feasible and actually occurring with the use of encounter cards in many universities.
Several positive findings were discovered. First, the evaluation forms demonstrated a high degree of reliability (0.71), with as few as 8.73 forms per resident collected over a 1-month rotation. It would take 14 forms per resident to achieve a reliability of 0.80. These findings are similar to previous reports that have used encounter cards3–5. Feasibility has also been demonstrated in our study. The number of forms required to be collected per resident is a realistic goal for most programs. In addition, faculty felt the forms were well suited to the ambulatory setting and both universities involved in the study have continued to use the forms although the study has ended. There were some cautions in terms of feasibility, however. Approximately 27% of faculty did report the form as time-consuming, so they may require ongoing encouragement. In addition, to ensure the use of the cards continues, we suggest that a dedicated administrative assistant and faculty representative are important factors for programs that wish to implement a similar program. That said, once the system is established, it requires minimal time from administrative assistants and rotation coordinators.
The evaluation forms also appear to have face validity, as demonstrated by the favorable ratings provided by residents and faculty. For example, 79.1% of residents agreed or strongly agreed that the evaluation form was a fair evaluation of clinical skills. From the faculty perspective, 72.2% agreed or strongly agreed that the evaluation form overall was a valuable tool for evaluation. Over 70% of faculty and over 80% of residents agreed or strongly agreed that the evaluation forms should continue to be used. Although not statistically significant, the correlation between the composite score on the evaluation form and a formative OSCE score was moderate and could indicate a degree of criterion validity.
Finally, both residents and faculty reported increased feedback as a result of the forms. This concurs with data from a previous study6. This formative aspect is important and was reflected in resident comments. The form did not increase direct observation, indicating that if this is a primary objective, then another method of evaluation such as the mini-CEX should be considered8.
Our study does have limitations. It was completed on internal medicine residents completing an ambulatory rotation in rheumatology. It is not clear if the results will be transferable to other outpatient settings and other trainees. The forms, although useful, did not increase observed clinical skills and should only be one of several methods of evaluation for any trainee. Faculty training would likely improve the performance of the forms.
Our study has demonstrated that the use of evaluation forms in a 1-month ambulatory clinical rotation is feasible, valid, and reliable and that it improved feedback on clinical performance. Evaluation forms provide an important method of evaluation for ambulatory rotations.
Footnotes
-
Supported by The Ottawa Hospital Academic Enrichment Fund. Drs. Humphrey-Murto and Khalidi acknowledge the support received as recipients of The Arthritis Society Clinician Teacher Award.
- Accepted for publication December 9, 2008.