The Weaponization of Student Evaluations of Teaching: Bullying and the Undermining of Academic Freedom

By Jason Rodriguez


Student evaluations of teaching (SETs) can be weaponized to justify undermining academic freedom and subjecting untenured and contingent faculty to surveillance and bullying. I use an autobiographical case study of my experience of the tenure process at a small liberal arts college to illustrate how SETs can enable these processes. SETs have the power they do, in part, because of an increasingly precarious academic job market and because SETs can be deployed to make forms of bullying appear to be based on data. Moreover, SETs are a particularly adaptable weapon because the data can be interpreted to justify a range of positions. As such, SETs can be wielded to exacerbate the asymmetrical character of the relations of power that untenured and contingent faculty experience and to counter diversification, interdisciplinarity, and other important transformations taking place in higher education.

Download "The Weaponization of Student Evaluations of Teaching: Bullying and the Undermining of Academic Freedom" or read below.


The misuse of student evaluations of teaching (SET) is not limited to tenure and promotion decisions. SET is also misused in post-tenure evaluations, in students' recommendations, and in rumorware (e.g., Web sites such as

The author notes that there is a large body of academic research showing that SET is unreliable for evaluating teaching, leading to many bad decisions in tenure, promotion, and post-tenure evaluation. Indeed, SET is so unreliable that its use is harmful to universities, which, as institutions partly devoted to research, should learn from research not to use SET.

Dear Colleague,
I am writing from Portugal.
I'm a retired Asst. Professor of German Linguistics. I have read your article with utmost attention and care.
Although no two countries and situations are fully alike, I cannot but entirely agree with you. This SET is being misused against demanding or unpopular faculty, or even as a missile with remote control by other faculty members.
If you'd like to talk further about this topic (and others that may arise), here's my email: <[email protected]>. I am also on Facebook (profile photo is of my dog and cover photo is of a votive inscription on stone in the language of our ancestors, the Lusitanians).
I look forward to hearing from you.
Francisco Espirito-Santo

First, and perhaps foremost, I’d like to apologize to Jason for the misery and hardship my generation has caused for many of those in his generation. I’d like to think that Jason’s experience is rare, but I’d be fooling myself – a lot goes on within departmental silos that administrators often ignore or plausibly deny. Whether the administration was simply ignorant of what was going on or maliciously conspired in the suppression of academic freedom and tenure-track torture, their policies created the administrative context within which Professors Hayes and Richard were operating.

Bullying is bad. Whether it is administrators bullying faculty members or faculty members bullying one another or students, it is a practice that is likely to be both ineffective and inappropriate. As Raven & French pointed out long ago, power (the ability to influence others) has many bases. These include reward, legitimacy, reference, and information. All these can be employed effectively to guide behavior toward common goals and reasonable expectations. However, the capacity to harm, injure, or punish (i.e., coercive power) is also a means of influence. Ironically, coercive power is often the first choice of those least capable of using any form of influence effectively. The use of coercion is almost always attended by collateral damage and unforeseen consequences.

I’m so old, I can recall when students almost never rated their teachers. Although, I led the effort at the Air Force Academy to develop and deploy student evaluations of teaching, I recognize their inherent lack of absolute validity and reliability. Asking students the right questions is important. If one starts by asking faculty what they would like to know about their students’ experiences or perceptions is a good place to start. There are many things that can affect SET ratings – some of them have nothing to do with learning: begging your class for higher ratings to help you achieve tenure, remaining in the classroom and “circulating” while the students are completing the forms, or bringing in a box of donuts to share with the class the day they complete the SET forms; all are likely to increase average ratings significantly. Nonetheless, there are some things that influence ratings that are associated with learning. Typically, distributions of SET scores across faculty members have a decidedly negative skew (most of the scores are in the 4.0-4.5/5.0 range, but there are likely to be a few scores that are much lower (i.e., 2.5-3.5/5.0. Every time I’ve observed scores this low and investigated, I’ve found some sort of calamity that had clearly disrupted student learning. Student Evaluations of Teaching are an imperfect measure, however, just as a cracked mirror can be helpful, so can an imperfect instrument such as SET scores be used well under certain circumstances.

I served as a department chair at the Air Force Academy for nearly a decade. Compared to most of academia, the military academies employed far fewer lifelong professional educators. At the beginning of each fall semester, the overall average faculty teaching experience was about two years; at the end of the spring semester it was nearly three years. As my friend Tom Angelo once quipped, military academies are like academic fruit flies – the faculty turnover is so rapid, experiments can be conducted that other, more seasoned and professional faculties, would thwart. At the Academy most of the classroom teaching was done by junior officers with master’s degrees. The courses themselves were often designed collaboratively with senior officers and/or experienced civilian professors. However, unlike many civilian institutions, most academic departments at the military academies prioritize the responsibility of developing junior faculty members. Becoming an effective teacher involves many of the same skills relevant to mid-level leadership. In this context, input from students was helpful and relevant.

As department chair, I tried to visit the classroom of each of the 45 assigned faculty members at least once a year. A classroom visit consisted of three parts: preparation, observation, & review, each of which lasted about an hour. During the preparation period, the instructor told me about themselves and the course they were teaching. Previous student ratings were a part of this conversation, but these were always shared and considered in context. Together, the instructor and I would select a specific class and lesson for me to visit and agree on any particular class dynamics I should be watching. After the class, we would meet and “debrief”: “What went well? What didn’t? How does this compare to other classes/courses you’ve taught? What is teaching this course teaching you? What would you like to do in the future? How can I help?” Our department was widely recognized for consistent excellence in classroom teaching. These sessions typically went well, but there were occasional exceptions. I realized that it was especially important for me to listen; being a teacher in a class that is just not going well is undoubtedly one of the most frightening (i.e., precarious) situations imaginable. There are many other ways to integrate feedback from students to help faculty become more effective classroom teachers but giving faculty members themselves a larger role in interpreting and explaining them is essential. Prescribing ubiquitous general standards and using them to evaluate courses with only five students is ludicrous. If you’re going to cut something with an ax, you should not measure it with a micrometer.

Grades and grading are also an important topic. Many students perceive grades the same way faculty members see SETs. At their best, both instruments can be used to provide objective information; at their worst, they are used to coerce and punish those with less power for insufficient obsequiousness. Once again, institutional administrators create the organizational rules that determines how these instruments will be used. At the Air Force Academy, there was a clear expectation that overall average grades would be about 2.70, and this was reiterated each semester as each department head presented the overall distribution of grades in his or her department to the dean and all the other department chairs. When I first began teaching four decades ago, instructors were given specific grade quotas for each section. Depending on the number of students enrolled, there was an absolute cap on the number of A and B grades that could be awarded. The caps went away, but the onus of providing evidence of increased learning or performance still fell to any department chair whose overall grade distribution exceeded the institutional average. I didn’t like the system, but it was not until I retired from the military and came to Berea College as Academic Vice President that I came to recognize its advantages.

One of the first tenure files I reviewed as college provost was of a faculty member who’s SET scores were about average, but, in the six years she had been teaching, it appeared that fewer than 10% of the grades she assigned were not As. I invited her for an interview and asked her to help me understand why the grades she awarded were so high. I learned that this was not only allowed but encouraged by her department and, to some, was seen to be a necessary adjustment to ensure that students could earn professional certification. Grades were not something anyone else wanted to discuss, including the college president.

Several years later, after I had returned to the classroom and the college had a new president, I was asked to co-chair a committee looking at the quality of liberal arts education we provided. As part of this effort, we conducted a comprehensive review of grades and grading by departments. Significant disparities remained. Over half the grades awarded in the Theater, English, and Education Departments were As. In contrast, the average proportion of As across the four physical science departments was less than one quarter, and, for the Political Science Department, their three-year average was only about half that. It seemed to me that these disparities must be distorting institutional processes, like students’ selection of courses, minors, and majors. Further investigation did not turn up much. In fact, the only other educational measure we found to be significantly related to the proportion of A grades awarded in departmental course was the perceived quality of the program provided by students surveyed just prior to graduation. Surprisingly, the correlation was significantly negative (r = -.38, p<.05). Students enrolled in programs that gave fewer As, rated the quality of these programs higher than students enrolled in programs that awarded a higher proportion of grades.

Several times over the last decade, I’ve worked with students to examine the predictors of retention and academic success. Berea College does not admit students unless they show significant financial need. Consequently, Expected Family Contribution (one of the best predictors of subsequent “academic” success) combined with high school GPA and ACT scores predict less than 10% of the variance in student retention. However, student performance in their first general studies course (as measured by the grade they received) predicted about a quarter of the variance in retention and also a significant portion of their eventual graduation GPA. What surprised everyone was the fact that the grades themselves (rather than a student’s rank order within their respective section) was a much better predictor of their retention and academic performance. Good grades, especially when they’ve been earned, have positive consequences. I think this may also be true of SET scores.

So, it’s complicated. However, bullying is bad. SETs or Grades can be used to punish or coerce rather than inform or enlighten. When they are, they are likely to be perceived as bullying. Understanding these things can help everyone become more effective learners and educators. I sincerely hope that Professor Rodriguez’s experiences will help him develop better policies and practices for those who will be following him.

Add new comment

We welcome your comments. See our commenting policy.

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.