The Troubled History of Teaching Evaluation

By Derek Gottlieb

Grading the College: A History of Evaluating Teaching and Learning by Scott M. Gelber. Baltimore: Johns Hopkins University Press, 2020.

Near the end of every semester, in a ritual repeated across the higher education landscape, all the students in my courses receive a university-sponsored survey in their inboxes. There are thirteen items that ask students to rate me and my course on a scale of one to five and a pair of short-answer questions that ask students for the “greatest strengths” of the instructor and “constructive comments” to help me improve. In graduate courses, the response rate varies dramatically, but it can sometimes reach 80 percent. Undergraduate courses are lucky if they hit 50 percent. The numerical averages and the comments from these anonymous surveys are the bulk of what we discuss in our annual and biennial faculty evaluation processes and in tenure and promotion discussions.

In the final class meeting, I also have my students fill out another anonymous survey, this time for my own use. The first three questions ask students to assess on a one-to-five scale the difficulty of the course, the amount of required reading, and the value of what they learned. The rest are short-answer questions asking about the specific structure of the course, particular assignments and readings, and what they considered most and least useful. Because this survey is done in class, the response rate is 100 percent. The results will not come up in tenure and promotion considerations or in faculty evaluation processes. They are institutionally invisible.

Why is it that broadly aimed student surveys with shaky response rates play such a large role in evaluating the curriculum and the performance of individual faculty members yet provide such thin information that professors sometimes need to create another instrument to help improve their courses and pedagogy? Is this the best we can hope for?

Scott M. Gelber’s Grading the College examines the seesawing history of efforts to assess what happens in the classrooms of American colleges and universities. In doing so, he answers both of the questions I ask above. The dominance of student surveys should not be the limit of our aspirations, he shows, but we rely on them mainly because they have emerged as the “least bad” option. He endorses an anonymous riff on Winston Churchill’s famous remark about democracy as the worst form of government: “Student evaluations are the worst form of teaching evaluation, except for all the others.” Gelber’s history traces the inherent tension between the local validity of evaluation instruments—their utility for faculty professional development tied to disciplines, departments, and courses—and external demands for reliability and comparability, a single scale that can report on the quality of the “teaching” and “learning” that occurs in lecture halls, in seminars, and in small discussion sections across all fields of study from art history to zoology.

Grading the College is compact at 156 pages of main text, which belies the evident rigor of its research: the endnotes add 50 percent to the length of the book. The skill it requires to interweave so many archival sources and so much secondary literature into a highly readable account of higher education’s fraught relationship with evaluation from 1920 to 1980 (and beyond) is impressive, and the book itself is a major accomplishment.

The book features an introduction, six main chapters, and a conclusion. The main chapters are divided into three parts: Teaching (two chapters), Learning (three chapters), and Accountability (one chapter). Structuring the book this way serves Gelber’s primary thesis: while faculty members tend to associate evaluation with the rise of neoliberal, corporate-style accountability and the erosion of shared governance, evaluation efforts have actually been widespread since the early twentieth century—originally initiated, designed, and encouraged by professors themselves. Reminding the professoriate of the fact that we have historically led the evaluation charge, and for good reason, strikes me as the major aim of the book. Gelber emphasizes it in the introduction, in every one of its main chapters, and even in the book’s final sentence: “We have played our part.” That final line, as I take it, is meant to call us to our responsibilities—to acknowledge our implication in the current state of affairs and to reclaim a better vision of what evaluation might yet become.

Despite referencing that thesis throughout the book, Gelber turns his sustained attention to it only in chapter 6 and in the conclusion. Each of the early chapters focuses on the history of a particular object or method of evaluation: an overview of teacher evaluation is followed by the history of student surveys, which is followed by a history of testing, and so on. These are individual strands that Gelber draws out in order to weave them together in the book’s final two chapters. The early chapters reveal both what is new and what is recycled in post-1980 evaluation efforts, and they also contextualize the recommendations he offers in the conclusion. His ultimate hope is to overcome the persistent tension between the internal and external needs that evaluation serves, between what he sometimes calls “improvement” and “compliance.” To start down that path, faculty members must rediscover the potential of evaluations to enable meaningful improvement, which austerity, enrollment pressures, and institutional expectations of return on investment have obscured. In order to restore evaluation to its proper place in higher education, we must once more learn to see evaluation instruments as faculty tools for self-improvement, and not merely as administrative weapons for unending cuts.

This is a noble aim, to be sure. Defusing faculty antipathy toward evaluation—by reminding the professoriate not only of the hopes that attended the rise of evaluation generally but also of the fact that these hopes were theirs—might help to make evaluation a fulcrum for mobilizing faculty, a medium through which we could articulate a vision of our professional purposes beyond the limited imagination of the neoliberal university.

An odd effect of devoting the first five chapters to different facets of the history of evaluation, however, is that the book covers the same historical ground five different times, which reveals patterns other than the ones Gelber stresses. We find, for instance, that if there was a golden age of evaluation from a faculty perspective, it coincided with what many consider the golden age of the university in general: “the postwar era when enrollments surged and public confidence in American higher education ran high.” In the first five chapters, we see repeatedly that faculty support for evaluation correlates with two contextual features: (a) the limited power of external agencies to demand specific information on faculty, academic programs, or institutional performance; and (b) the resulting internal freedom to build evaluation instruments that prioritize validity over comparability. Gelber frequently notes that faculty find the evaluation process most useful when its instruments are “locally constructed” and “administered locally” for the purpose of “informing internal conversations.” I am not necessarily convinced that we have ever lost sight of evaluation’s potential, or the larger aims of the university, that Gelber calls us to remember. Through budget cuts, adjunctification, and political vilification, we may have lost only the power to make our internal, professional purposes count. Grading the College shows us that there has never been a time when internal and external needs or purposes converged; there has only ever been a time when structural forces beyond academia gave us the power to prioritize our own aims.

While every work of historical scholarship must wrestle with where to begin and end the period under study, and what exactly to focus on, Gelber glosses over two events that might have provided fuller context for this evaluation story. Of course, including them would also have expanded his project threefold, so their omission is not exactly a fault. I have in mind both the rise of the social sciences in the American academy—of which the techniques and methodological frameworks of evaluation are one articulation—and the transition to the research university model, which gained steam in the postwar era. The way we think about what evaluation is, and about the practices that constitute it, has been shaped by changing sensibilities around what it means to practice social science responsibly. Similarly, as the traditional American college—which Paul Reitter and Chad Wellmon described in their book Permanent Crisis as being “organized around a fixed curriculum designed to form good Protestant gentlemen”—has merged with the form and purposes of global research institutions, evaluation instruments and their relevance for tenure and promotion have generally reflected changing institutional priorities. Social expectations of the college experience rooted in the past have remained largely intact, though official expectations of faculty have privileged outcomes aligned with the elevation of research. Gelber’s decision to tell the story from within efforts to evaluate postsecondary education leaves institutional transformations—which might provide helpful context—hovering just beyond the frame.

Chronicling these shifts is not necessary, however, to the main point that Gelber’s history reveals: evaluation can be both tool and weapon. Gelber wants us to reclaim a Progressive Era optimism toward evaluation’s role in fulfilling our professional calling. But with the balance of power tilted to its current extent away from faculty, the kind of locally valid evaluations that we can use to improve our own practices seem likely to depend on ad hoc, institutionally invisible, professor-by-professor efforts. It is not that we have lost the ability to imagine evaluation as a tool; it is rather that it requires no imagination at all to understand what happens when our tools fall into the wrong hands. That, too, is a service that Gelber’s excellent book provides.

Derek Gottlieb is assistant professor of educational foundations and curriculum studies at the University of Northern Colorado. His books include Education Reform and the Concept of Good Teaching and A Democratic Theory of Educational Accountability. His email address is [email protected].