The UCCS Faculty Assembly Women’s Committee has reviewed the academic literature for evidence of bias in student evaluations of teaching (which are called Faculty Course Questionnaires, or FCQs, at UCCS). The existence of bias in FCQs presents a serious hurdle for women across the campus (i.e., adjunct, non-tenure-track, and tenure-track faculty).
We believe this is a important issue and have consolidated the current evidence here for reference by faculty, deans, directors, and other campus decision-makers. Below, we provide links to the primary sources (in the near future, we will also provide brief summaries, compiled by FAWC board members).
Context and Summary
Faculty Course Questionnaires (FCQs) are used across the UCCS campus as a measure of teaching effectiveness (both in annual merit reviews and in promotion and tenure decisions). However, the literature on student evaluations of teaching (SETs) demonstrates they are deeply flawed instruments for evaluating teaching effectiveness for a variety of reason AND are subject to biases that disenfranchise certain protected classes, particularly women, raising concerns about equity. Furthermore, the literature indicates SETs should not be the primary deciding factor in making high stakes personnel decisions. A recent arbitration decision between the Ryerson Faculty Association and Ryerson University provides an excellent synthesis and outline of the problems with SETs and how they are generally used by university administrators. Below is a quick guide to the limitations of SETs based on the expert report prepared by Richard L. Freishtat, Ph.D., for this arbitration hearing that was recently decided in favor of the Ryerson Faculty Association.
Biases based on characteristics of the faculty member (Freishtat p. 5-7):
- SETs are influenced by gender, such that women tend to receive lower ratings than men
- This bias is evident even on seemingly “objective” measures, such as grading timeliness
- SETs are influenced by ethnicity and race, such that faculty of color tend to receive lower ratings
- Additionally, SETs reflect a bias against non-native English speakers
- SETs are influenced by attractiveness, such that more attractive faculty receive higher ratings
- SETs are influenced by other traits of the instructor, such as: attitude, perceived fairness, stereotypical fit (i.e., how well the faculty member matches the subjective image of an ideal teacher for that field), and likeability
Biases based on characteristics of the student (Freishtat p. 2, 5-8):
- SETs are influenced by traits of the student, such as: motivation, attendance, prior ability and education, and fit with the course (i.e., whether the course is in their major)
- SETs are biased by grade expectations and actual final course grades (i.e., expectations of higher grades translate into higher SETs) even when administered prior to final grade assignment
- SETs are negatively biased by feedback from other students (i.e., SETs are not independent assessments by each student)
Biases based on characteristics of the course (Freishtat p. 7-8):
- SETs are lower for required courses, compared to electives and courses within a student’s major
- SETs are lower for more quantitative courses (physical science < social science < humanities)
- SETs are higher for faculty who teach smaller classes (< 40 students)
- Despite greater learning, students rate active and innovative courses lower than lecture courses (disincentivizing course changes that benefit students)
- SETs are often negatively biased in courses that cover controversial or sensitive topics
SET validity (Freishtat p. 2-4, 10-14):
- SETs do not correlate with student learning or teaching effectiveness/quality
- SETs are primarily a measure of student satisfaction and liking
- Students lack the expertise to evaluate teachers (e.g., their methods, helpfulness, assessment/assignments) or courses (e.g., content, relevance, importance)
- Response rates are often lower than necessary to assume a representative sample of the class
- Factors unrelated to teaching, such as administration method (e.g., paper vs. online) and timing, impact SETs
- Numerical SETs are ordinal, not continuous – therefore averages should not be used (rather, look at distributions of scores)
- SETs should not be used to compare faculty or courses, due to confounding variables
- Recommended to use SETs as a formative, rather than summative, piece of faculty evaluation, relying on more comprehensive teaching dossiers in merit and P&T decisions
Ryerson University v Ryerson Faculty Association, 2018
In late June of 2018, an arbitration award was made in favor of the Ryerson Faculty Association regarding their objection to the use of student evaluations of teaching (SETs). The award dictated that 1) student evaluations not be used as a measure of teaching effectiveness in promotion and tenure processes and 2) decision-makers be educated about “the inherent and systematic biases” in student evaluations. The arbitration decision read:
“The expert evidence … demonstrates that the most meaningful aspects of teaching performance and effectiveness cannot be assessed by SETs. Insofar as assessing teaching effectiveness is concerned – especially in the context of tenure and promotion – SETs are imperfect at best and downright biased and unreliable at worst.”
“According to the evidence … numerous factors, especially personal characteristics … such as race, gender, accent, age and ‘attractiveness’ skew SET results. It is almost impossible to adjust for bias and stereotypes.”
Ryerson Faculty Association commissioned two expert reports about the limitations of student evaluations of teaching, including their propensity for bias again women faculty. These reports, authored by Richard L. Freishtat and Philip B. Stark, provided clear and compelling evidence of the weaknesses and misuse of SETs. Both reports are extremely thorough and cover the broad literature on SETs.
Evidence of bias against women in student evaluations of teaching (and other performance assessments):
Arbuckle, J., & Williams, B. D. (2003). Students' perceptions of expressiveness: Age and gender effects on teacher evaluations. Sex Roles, 49(9-10), 507-516.
Boring, A. (2017). Gender biases in student evaluations of teaching. Journal of Public Economics, 145, 27-41.
MacNell, L., Driscoll, A., & Hunt, A.N. (2015). What’s’ in a name: Exposing gender bias in student ratings of teaching. Innovative Higher Education, 40, 291-303.
Mengel, F., Sauermann, J., & Zolitz, U. (2018). Gender bias in teaching evaluations. Journal of the European Economic Association, 1-32
Evidence that student evaluations of teaching do not measure teaching effectiveness:
Boring, A., Ottoboni, K., & Stark, P. B. (2016). Student evaluations of teaching (mostly) do not measure teaching effectiveness. ScienceOpen Research, 1-11. [OPEN ACCESS]
Braga, M., Paccagnella, M., & Pellizzari, M. (2014). Evaluating students’ evaluations of professors. Economics of Education Review, 41, 71-88.
Hornstein, H. A. (2017). Student evaluations of teaching are an inadequate assessment tool for evaluating faculty performance. Cogent Education, 4, 1-8. [OPEN ACCESS]
Stark, P. B., & Freishtat, R. (2014). An evaluation of course evaluations. ScienceOpen Research, 1-7. [OPEN ACCESS]
Uttl, B., White, C. A., & Gonzalez, D. W. (2017). Meta-analysis of faculty’s teaching effectiveness: Student evaluation of teaching ratings and student learning are not related. Studies in Educational Evaluation, 54, 22-42.
Evidence of bias against women in performance assessments:
Bohnet, I., van Green, A., & Bazerman, M. (2016). When performance trumps gender bias: Joint vs. separate evaluation. Management Science, 62, 1225-1234.