A Proposed Revision of the UCO Classroom Evaluation Process for Tenured and Nontenured Faculty

by John R. Wood, Associate professor, UCO¹

In this white paper, I would like to make a case for the University of Central Oklahoma to revise their classroom evaluation process for tenure and nontenure faculty. We need to address the need to change the classroom evaluation requirements for UCO tenure track and non-tenure-track faculty because summative evaluations of faculty create unneeded stress for faculty and students. Regrettably, many, if not most, UCO faculty are required to conduct summative classroom evaluations in every class, every year.

Simply, for a summative evaluation, the quantitative measure’s value is limited to “information needed to make a personnel decision – for example, hiring, promotion, tenure, merit pay” (Chism, 2007,  5). Theoretically, the correlation between higher grades and higher evaluations would indicate more learning; however, this is not necessarily the case (See Marsh and Roche 1997, 2000). Unfortunately, “an evaluation often tells more about a student’s opinion of a professor than about the professor’s teaching effectiveness” (Williams, 2007, 171). Student evaluations also tend to be more positive toward instructors who display a certain level of warmth in their behavior (Best & Addison, 2000). Moreover, faculty perceived as caring receive higher numerical outcomes on evaluations, and this also affects student perceptions of their own cognitive learning, signifying that an instructor’s personal characteristics can influence evaluations (Teven & McCroskey, 1997).

Education scholars also find that internal factors, i.e., professor’s sexual orientation, ethnicity, gender, or in the course content, or the nature of the course itself, the rigor of the course and the faculty member’s teaching style, less demanding courses, influence numerical outcomes in these types of numerical evaluations. Internal factors consider such aspects, as the professor’s sexual orientation, ethnicity, gender, or in the course content, or the nature of the course itself, can influence the numerical outcomes of the evaluation (Basow & Howe, 1987; Hobson, 1997; Sidanius & Crane, 1989; Williams, 2007; Flaherty, 2018). According to Centra (2003, 498), there is a bias “when a student, teacher, or course characteristic affects the evaluations made, either positively or negatively, but is unrelated to any criteria of good teaching, such as increased student learning.” For example, prior work has found that student evaluations are related to grading leniency (Greenwald & Gilmore, 1997) and the gender of the instructor (Basow & Silberg, 1987). Other factors can also influence these numerical outcomes, such as the rigor of the course and the faculty member’s teaching style. Likewise, the personal appearance of the instructor, including student perceptions of gender or race of the instructor, can influence these evaluations. Findings in the broader literature support claims both for and against the notion that the appearance of the instructor influences student scores of instructor effectiveness (Campbell, Gerdes, & Steiner, 2005). There is also research suggesting that instructors who teach fewer demanding courses receive evaluations that are abnormally high (Overbaugh, 1998). In addition, not surprisingly, a student’s class grade & Gillmore, 1997; Tang, 1999).

In fact, the evidence opposing the validity of these evaluations is compelling.

Two types of evaluation of teaching – summative and formative.

1.    Indirect assessment (summative) of teaching captures students’ perceptions of their learning (UCO’s SPIES). It is a proxy for student learning

2.    Direct assessments (formative) of student learning reflect a demonstration/evidence of student learning (Maki, 2010). Indirect measures are tests of validity in the sense that there is a discrepancy between the judgmental measure (usually a rating of achievement) and the criterion measure (a score on a standardized achievement test). Direct measures are a more direct test of validity in that teachers are directly asked to estimate the achievement test performance of their students. On the whole, the results revealed high levels of validity for the teacher-judgment measures. Best practices cited in Maki (2010) are teaching portfolios (including a teaching philosophy and evidence of practice (samples direct assessment).

While summative classroom evaluation is helpful to assess faculty, it does little in terms of learning and reflecting on improving teaching (Chism, 2007). However, a formative evaluation “describes activities that provide teachers with information that they can use to improve their teaching” (Chism, 2007, p. 5). In balance, Lewis and Benson (1998; Youmans & Lee, 2007) find that summative faculty evaluations persist to provide balance to the hotly debated question whether student evaluations are either valid or reliable indicators of either the faculty’s effectiveness or the course’s quality. However, as long as universities continue to regard student input as important, instruments measuring student perception, inadequate they may be, will presumably remain part of the process. Even so, they should not be the primary and only process of class evaluation, which then presents a circular argument to support a second imperfect evaluative process, i.e., peer evaluation among faculty. Revising the nature of faculty peer evaluations would retain the feedback benefits to the observed while potentially also generating more instructional dialogue among faculty (Chism, 2007). Similarly, the American Association of University Professors’ (AAUP), argue that any system developed for evaluation are best directed toward constructive measures for improvement (Euben, 2005).


Thus, the following suggestions for revision should be considered:

Tenure track and nontenure track faculty should be assessed with a summative classroom evaluation only in the 1st, 3rd, 5th, and every third year for post tenure review and for promotion to full professor. Summative evaluation focuses on information “needed to make a personnel decision – for example, hiring, promotion, tenure, merit pay” (Chism, 2007, 5).

a. Often quantitative

b. For public inspection

c. Determine quality of teaching performance compared to peers

d. Conducted at given intervals, i. e., Annual or promotion and tenure reviews


Faculty should supplement summative classroom evaluation with a formative classroom evaluation with combination of peer review and a teaching portfolio. Formative evaluation provides teachers with information they can use to improve their teaching (Chism, 2007, 5).

a. For personal use, private and confidential.

b. Qualitative

c. Peer reviews & Portfolios

d. Focused on continuous learning & improvement

During the 2nd, 4th, and every year summative classroom evaluations are not required. The faculty member would create a teaching portfolio that will create a conversation with their respective department chair and dean on how to promote continuous improvement.

The faculty member can use peer evaluation, their own questionnaire, etc. The teaching portfolio will not be a part of an official document unless the faculty member so desires.

In this way, a combination of a summative evaluation will make administrators happy with accountability in mind and easy numerical measures to manage a large number of faculty.  In balance, a formative evaluation is best when the faculty continuously learn about their students and facilitate innovation in the classroom.


Basow, Susan. A., and Karen. G. Howe. 1987. “Evaluations of college professors: Effects of professors’ sex-type, and sex, and students’ sex.” Psychological Reports 60:671-8.

Campbell, Heather, Karen Gerdes, and Sue Steiner. 2005. “What’s Looks Got to Do with It? Instructor Appearance and Student Evaluations of Teaching.” Journal of Policy Analysis and Management, 24 (3): 611-620.

Centra, John A. 2003 “Will Teachers Receive Higher Student Evaluations by Giving Higher grades and Less Course Work?” Research in Higher Education 44(5) Oct. 03: 495-518.

Chism, Nancy Van Note. 2007. Peer Reviewing of Teaching: A Sourcebook. 2nd Ed. Indiana University-Purdue University Indianapolis: Anker Publishing.

Euben, Donna. 2005. “Post-Tenure Review: Some Case Law” AAUP Counsel https://www.aaup.org/issues/tenure/some-case-law (accessed May 30, 2020).

Flaherty, C. 2018. “Same Course, Different Ratings Study says students rate male instructors more highly than women even when they’re teaching identical courses. Inside Higher Ed. https://www.insidehighered.com/news/2018/03/14/study-says-students-rate-men-more-highly-women-even-when-theyre-teaching-identical

Greenwald, Anthony. G., & Gilmore, Gerald. M. 1997. “Grading leniency is a removable contaminant of student ratings.” American Psychologist 52, 1209-1217. http://faculty.washington.edu/agg/pdf/Gwald_Gillmore_AmPsychologist_1997.OCR.pdf (accessed December 20, 2019).

Hobson, Suzanne. M. 1997. “The impact of sex and gender-role orientation on student evaluations of teaching effectiveness in counselor education and counseling psychology.” Ed.D. diss. Western Michigan University, Kalamazoo.

Lewis, Jerry M. and Denzel E. Benson. 1998. “Section Eight. Course Evaluations,”: 99- 114 in Tips for Teaching Introductory Sociology, edited by Jerry M. Lewis. Belmont, CA: Wadsworth

Maki, Peggy. 2010.  Assessing for learning: building a sustainable commitment across the institution. 2nd edition. Sterling, VA: Stylus Publishing.

Marsh, Herbert. W., and Lawrence. A. Roche. 2000. “Effects of grading leniency and low workload on students’ evaluations of teaching: Popular myth, bias, validity, or innocent bystanders?” Journal of Education Psychology 92: 202-28.

Overbaugh, Richard. C. 1998. “The effect of course rigor on preservice teachers’ course and instructor evaluation.” Computers in the Schools 14:13-23.

Sidanius, Jim, and Marie. Crane. 1989. “Job evaluation and gender: The case of university faculty.” Journal of Applied Social Psychology 19:174-97.

Stumpf, Steven. A., and Richard. D. Freedman. 1979. “Expected grade covariation with student ratings of instruction: Individual versus class effects.” Journal of Educational Psychology 71:293-302.

Tang, Shengming. 1999. “Student evaluation of teachers: Effects of grading at college level.” Journal of Research and Development in Education 32:83-8.

Teven, Jason. J., and James. C. McCroskey. 1997. “The relationship of perceived teacher caring with student learning and teacher evaluation.” Communication Education 46:1-9.

Williams, Dana A. 2007. “Examining the Relation between Race and Student Evaluations of Faculty Members: A Literature Review.” Profession: 168-173.

Youmans, Robert J. and Benjamin D. Lee. 2007. “Fudging the Numbers: Distributing Chocolate Influences Student Evaluations of an Undergraduate Course.” Teaching of Psychology 34 (4): 245-247.


¹ This white paper was created after an UCO CETTL workshop on the book by Chism, Nancy Van Note. 2007. Peer Reviewing of Teaching: A Sourcebook in the fall 2018.

Leave a Reply