ASA Connect

 View Only
Expand all | Collapse all

Teaching Evaluation Rubrick

  • 1.  Teaching Evaluation Rubrick

    Posted 09-10-2020 08:37
    Dear Colleagues, does anyone have a teaching evaluation rubric, specific to statistics classes, that they can share?  Thanks, John Kolassa

    ------------------------------
    John Kolassa
    Rutgers, the State University of NJ
    ------------------------------


  • 2.  RE: Teaching Evaluation Rubrick

    Posted 09-11-2020 12:48

    There is always W. Edwards Deming's approach.

    Deming recommended that universities not solocit feedback from schools about their professors until at least ten years after they graduated, perhaps more. At that point, they would be in a position to know which professors made a difference in their lives and which ones didn't.


    In the meanwhile, how can a student possibly predict how valuable a professor's influence and teaching will turn out to be in his or her life?  Deming noted that the teacher he identified later in life as his most important teacher, Sir Ronald Fisher, was a terrible lecture deliverer who was very hard to understand. But what he had to teach made up for everything, and he realized later that the difficulties were minor compared to Fisher's overall value.

    What criteria could a student use? How entertaining the teacher is? How easy? If the student finds that what a teacher is saying is easy to comprehend, that might be a sign of a good teacher. But it might also mean that the student's existing worldview and abilities are being coddled rather than fundamentally challenged and stretched. The problem is that we can only know later on in life who is being difficult because they are being unnecessarily and umproductively obtuse, and who is being difficult because they are stretching you in a way you will only later recognize as life-changing. 

    Perhaps every truly great teacher must sometimes be a difficult teacher. 


    In Deming's view, performance evaluations tend to downgrade difficult teachers, and tend to pick the teachers who more easily and glibly fit into the students' existing framework of thought.

    In Deming's view, mediocre teachers. 


    Deming's views on education, like his views on management, were rather radical. But his arguments are always worth considering. 



    ------------------------------
    Jonathan Siegel
    Director Clinical Statistics
    ------------------------------



  • 3.  RE: Teaching Evaluation Rubrick

    Posted 09-13-2020 10:16
    Dear Jonathan,
        Thanks for your thoughts.  I'm trying to construct a rubric for peer evaluation.  I take your point that relying on student rating is problematic (although I still see them I see them as an essential part of teaching evaluation, even, or perhaps particularly, when they have been critical of my own teaching).  We really need something that can be used to flag deficiencies within a semester, and can't wait for promotion or retention decisions to see long-term faculty effects.  John

    ------------------------------
    John Kolassa
    Rutgers University
    ------------------------------



  • 4.  RE: Teaching Evaluation Rubrick

    Posted 09-14-2020 11:27
    I have some thoughts to share about teaching evaluation data that I examined many years ago – back in the late 1970s when I was a greenhorn, untenured assistant professor. It was time for a major review on me. I knew that the tenure panel gave serious consideration to student evaluations. The evaluations that students filled out, near the end of every course, asked for reactions to many prompts, such as "Professor is prepared for every class," "Professor is able to explain abstract ideas clearly," and so on. To every prompt, a student bubbled in a rating from 1 to 6, where 6 was the most desirable from the professor's point of view. Section means and standard deviations to each prompt were reported back to the department head and professor shortly after the semester was over. Almost exclusively, tenure panels looked at "Question 16: Overall quality of instruction." When I learned this, I went to the department head, saying that I was concerned that they looked at only one question and that there were a handful of questions that I thought were just as important. He said, "You're in statistics. I'll give you the data summaries for all sections of Calculus I and of Precalculus from last semester, with the names of the instructors omitted, and you tell me how you think we can do it better." My take from both courses was the same, and I will use the twenty or so sections of calculus to describe it. Let n be an integer from 1 to 20 or so. Different questions had different means and different standard deviations. But almost without exception, the instructor who ranked nth best on question 16 out of all sections also ranked nth best on every other question. On a question like 16, which allowed for a student to somehow incorporate hating the professor's guts, section means ranged from around 2 up to around 6. But the same students would admit the professor was prepared for each class, but low section means would now be around 4 for that question. The questionnaires were indicating that the nth best prepared was also the nth best at explaining abstract ideas, the nth best overall, and the nth best at, basically, everything. After looking at the data, I told the department head that looking only at the responses to question 16 was OK, and that they could be using any one of the other questions instead, because the questions all measured the same thing – just on different scales. My own view was that many students were not in a position to know if they got a good course … but they could tell you if they were happy with the course. "Happiness level" was my own inference as to what the questionnaire measured, something that is worth knowing, as long as one understands that the numbers don't reflect peer review of the quality of instruction, or a student's much-later realization of said quality.

    ------------------------------
    [Mick] [Norton]
    [Professor Emeritus]
    [College of Charleston][]
    ------------------------------



  • 5.  RE: Teaching Evaluation Rubrick

    Posted 09-21-2020 09:34
    Reliability is necessary but not sufficient for validity. 

    I do think "Happiness Level" is much nearer to what is truly measured by these evals than is "Effective Teaching." This is to say nothing about the research on student bias in student evals.

    ------------------------------
    Robert Pearson
    Asst. Professor
    Grand Valley State University
    ------------------------------



  • 6.  RE: Teaching Evaluation Rubrick

    Posted 09-13-2020 07:11
    John, I'd be happy to help you develop something based on the notion of a Customer Value tree. I've designed and implemented a number of processes based on similar instruments and they work well. Deming was strong on theory but some of his practical suggestions appeared to contradict his own philosophy.  If one is serious about continuous improvement, how can it make sense to wait 10 or 15 years to get feedback about what needs to be improved?
    Regards ... Nick.

    ------------------------------
    Nicholas Fisher
    Visiting Professor
    University of Sydney
    ------------------------------



  • 7.  RE: Teaching Evaluation Rubrick

    Posted 09-13-2020 10:10
    Dear Nick, 
         Thanks for your offer.  To be specific, I'm looking for something that I can use for peer evaluation of teaching, and not for student evaluation.  If you can point me to a reference, I'd appreciate it.  Where should I go to get started?  Thanks, John

    ------------------------------
    John Kolassa
    Rutgers University
    ------------------------------



  • 8.  RE: Teaching Evaluation Rubrick

    Posted 09-22-2020 12:22
    If you want to know how good a teacher is, look at how their students do in follow-up courses. Also look at what percent of students finish the class. 

    As an education researcher, I got to look at 100,000+ student records for various course curricula at 4 different colleges/universities. In a typical course, there will be some professors, say Professor A, that have say 40% of their students pass the class (C or better) and other professors, say Professor B, have 80% pass that same class. (I am also assuming there are large numbers to look at too, like 100+ student records.) 

    If I am looking at a typical course sequence, say Algebra 1 to Algebra 2, An "A student" in Algebra 1 passes Algebra 2 about 80% of the time. A "B student" in Algebra 1 passes Algebra 2 60% of the time. A "C student" in Algebra 1 passes Algebra 2 40% of the time. On average, a student will pass Algebra 1 and 2 about 55% of the time. 

    With all of that knowledge, when you look at how Professor A and Professor B students do in follow up courses, those students that move on, pass at the same rate. Going from Algebra 1 to Algebra 2, its about 55%. Given that is true for most of the professors I looked at, (very few had lower than average outcomes and NO professor had better outcomes), how would you rate Professor A and Professor B? 

    If Professor A's students pass the follow up class at a much higher rate, like 65%, is that because they are better at teaching the material or did the scare away all the "C students"?

    ------------------------------
    Andrew Ekstrom

    Statistician, Chemist, HPC Abuser;-)
    ------------------------------



  • 9.  RE: Teaching Evaluation Rubrick

    Posted 09-23-2020 09:32
    Interesting thoughts.

    An excellent metric for college classes is the dropout rate for different sections (instructors) of required courses with common exams. 
    As was appropriately said, selection bias has a huge impact on who goes on to the follow-up course. Further, student evaluations  and grades are biased by dropouts.  Students that drop out cause harm to themselves and to the loss of availability of slots that could have been filled by other students.  

    Finally, peer review of teaching at every level is a vital component of good teaching. At the University of Florida, Department of Statistics, I did quite a bit of such peer-review with a random mix of announced and unannounced attendance.  Good student evaluations vs. good peer-review evaluation do not always go hand-in-hand.

    ------------------------------
    Jon Shuster
    ------------------------------



  • 10.  RE: Teaching Evaluation Rubrick

    Posted 09-24-2020 09:08
    What I haven't seen mentioned yet in this thread is the effect of other departments' schedules.  For example, the professor who teaches the section of Stat 101 that's offered at the same time the Math Dept teaches Calculus 101 won't have the Calc students in their section.

    ------------------------------
    Michael Lavine
    Army Research Office
    ------------------------------



  • 11.  RE: Teaching Evaluation Rubrick

    Posted 09-23-2020 11:05
    Hi John:
    Good to see your question on teaching evaluation. Over the last fifty years of teaching graduate and undergraduate statistics classes, I find that the student responses are genuine only for small classes, not more than 10-20 students seriously interested in the subject. Responses to most of the questions are usually random. The questions that should be  included is: Did you see the instructor if you have any questions? If you did, were the answers satisfactory and helpful. The responses of only the students answering theses questions should be counted. I believe that every instructor tries very hard provide excellent lectures. Good responses come from the genuine students. Negative responses come from only the irresponsible students and those who are only interested in getting the degree, bur not in learning the subject. You can take the horse to the water, but can't make it drink.  I have been lucky to have taught good students throughout the years. The evaluations to the Zoom classes need careful and relevant wording.
    Thank you

    ------------------------------
    [S.R.S. Rao] [Poduri][Professor of Statistics][University of Rochester][Rochester, NY 14627]
    ------------------------------



  • 12.  RE: Teaching Evaluation Rubrick

    Posted 09-23-2020 11:15
    Dear Sam, 
         Thanks for your reply, and it's good to hear from you after all these years.
         Thanks also for your thoughts on how the perspective of students matters in how we evaluate faculty, and on which student reactions will give us the most valid reflection of the lecturers' performance.
         I remain convinced that student reaction matters, but, as with most of those who responded, these reactions must be interpreted with care, and cannot be the full story.  The rubrick I'm trying to create will be for peer, rather than student, feedback.  John

    ------------------------------
    John Kolassa
    Rutgers University
    ------------------------------



  • 13.  RE: Teaching Evaluation Rubrick

    Posted 09-24-2020 13:46

    A challenge to the development of a meaningful rubric comes from the Desirable Difficulties literature.

    It turns out that time and time again subjective ratings of effective teaching are poorly, sometimes inversely, correlated with objective ratings.

    Things that are done to make the material easier to learn are generally the enemy of deep learning.  So using slide fonts that are a bit hard to read is good for learning.  A lecture that is somewhat disorganized is good for learning. 

    Somehow your rubric needs to capture that sort of information.  For example, on a scale of 1-5 for presentation organization, a "3" is better than a "5." 

    Other things are a bit easier to code for a faculty, as opposed to student, evaluation.  Requiring students to generate knowledge through activities is likely to map to a 1-5 scale relatively well.  (Many students will express frustration at being forced to teach themselves.)

    A brief intro to the desirable difficulties literature:  "Desirable Difficulties" can Lead to Deeper Learning and Better Retention 

    Michael



    ------------------------------
    Michael Granaas
    Univ. of South Dakota
    ------------------------------



  • 14.  RE: Teaching Evaluation Rubrick

    Posted 09-25-2020 11:17
    I'm curious: no one has mentioned Kirkpatrick's four-level evaluation model.  Is that used at all in academic settings?

    If you're worried that Kirkpatrick's model is too oriented towards training instead of education, at least skim the entirety of Don Clark's article.  He makes the claim that it is about much more than training (including education in any form, for example), and he extends the model in ways that seem useful.

    Bill

    ------------------------------
    Bill Harris
    Data & Analytics Consultant
    Snohomish County PUD
    ------------------------------



  • 15.  RE: Teaching Evaluation Rubrick

    Posted 09-28-2020 13:21
    Hi John,

    I'm in a combined math/stat department, but we do peer teaching evaluations. The eval form is short and to the point - there are open response questions on preparation/organization, presentation of subject matter, and student involvement. There's also space for general comments and to comment on "overall effectiveness". 

    Our student eval form is similarly straightforward, but most of the questions are less open ended. I appreciate that the student eval form and peer eval form get at similar concepts! If you're looking to build your own peer eval rubric, you might start with the student eval questions.

    ------------------------------
    Lauren Cappiello
    ------------------------------



  • 16.  RE: Teaching Evaluation Rubrick

    Posted 09-28-2020 14:58
    Are the questions that the faculty and students answer even relevant? 

    If you look at the data i posted in the data science education section(at least I think it was there) you'll find student level data that allows you to track a student from their first math class to their last and see the effect of the professor on whether or not the student passes. Using a metric like, the percentage of students that passed a class based upon the number that started gives you gar more inside into a faculty members performance than some potentially biased reviewer. 

    The first time I was reviewed, the prof that sat in didn't like anything I did. At the end of the term, I had 32 of 35 students take the final. My class average was 2 points lower than the dept average. In the reviewers class, they started with the same 35 students. 14 took the final. From what I heard half of them usually don't pass. 

    So, for me to get reviewed by someone that sends the vast majority out of the classroom (drops and other things) to tell me I'm not good, should I really care? Why is their opinion worth more than used toilet paper? 

    I think the metrics we as faculty need to be evaluated on are:

    1) How well did the students do in YOUR class?
    2) How well did the students do in follow up classes that use yours as a pre-req? 
    3) What percentage of the students you started with pass your class? 

    Give that data a review after the faculty member has taught a few sections of a course. 

    If you failed 7 of the 14 students at the final, your students are average in the next class and only 25% of your students pass, you have issues. 

    If you want to use a common final as a benchmark for teaching effectiveness for part 1 above, fine. Just make sure the grades on that exam actually correlate to grades in follow up classes. Since no one does that, using it as a benchmark or guide is meaningless.

    ------------------------------
    Andrew Ekstrom

    Statistician, Chemist, HPC Abuser;-)
    ------------------------------



  • 17.  RE: Teaching Evaluation Rubrick

    Posted 09-28-2020 15:19
    Thanks to Andrew and everyone else who replied.  My situation is somewhat different from Andrew's, in that
    1. None fo the courses that I'm evaluating have a common final.  Maybe they should, but they don't.
    2. Many of the students we teach will not take another course in statistics, or may take another course as an elective.  While we intend for students to carry information forward into their later coursework, use of this is not concentrated tightly enough to make later grades a reliable indication of teaching quality.
    3. We have three sets of undergraduate courses that have a logical follow-on course:  A first methods course that is followed by one of our own second methods courses, a different methods course that is followed by a business school methods course, and a probability course followed by a mathematical statistics course.  We have a probability- math stat sequence for MS students as well, and some PhD sequences.  Audiences here are different enough that cross-comparisons would be difficult, and instructor experience with these courses is not intensive enough for us to be able to assess everyone who needs assessment.  For example, in my almost 21 years at my current institution, I am only now teaching one of the first parts of such a sequence.

    I think that the things we can observe really are relevant.  Is the instructor organized?  Does the instructor explain things clearly?  Does the instructor address questions adequately?  Is the instructor courteous and respectful towards students?  Are assignments given relevant?  Are they of a length appropriate to the course?  Do the exams align with the stated goals for the course?  I understand that all of these assessments are subjective, but they are more reliably observed than what we could infer from performance in later courses, and have the advantage that they can serve as feedback to improve teaching.

    ------------------------------
    John Kolassa
    Rutgers University
    ------------------------------



  • 18.  RE: Teaching Evaluation Rubrick

    Posted 09-29-2020 11:41
    Many years ago, while a faculty in the Math Dept. at SUNY Cortland, I was part of the faculty evaluation committee. We had a very long (and tedious) meeting regarding student evaluation results (we had, on the table, literally dozens of teacher-course evaluations for all Department members and their courses for the year).

    I took a look at these forms and saw two entries that raised my interest: (1) what is your expected grade in the course; and (2) what is your overall evaluation of the Instructor. They were rated 1 through 5, with 1 being failed (E) and terrible Instructor, and 5 being an A, and great Instructor.

    I took the trouble of recording scores of such pairs for all the forms available and took them to my office. There, I inputted the data into Minitab and implemented the non-parametric rank correlation procedure. Needless to say that significance level was 0.01 (as expected).

    My conclusion from this (non representative but large) sample of student evaluations was that, those students who were expecting a good grade in the course (and most likely understood, liked the topic and worked hard) had a good take of the instructor. And vice versa.

    Hope this helps/jorge.

    ------------------------------
    Jorge L. Romeu
    Emeritus SUNY Faculty
    Adjunct Professor, Syracuse U.
    https://www.researchgate.net/profile/Jorge_Romeu
    ------------------------------