Top 10 Reasons You Can’t Fairly Evaluate Teachers on Student Test Scores

Screen Shot 2018-08-02 at 12.49.24 AM

 

I’m a public school teacher.

 

Am I any good at my job?

 

There are many ways to find out. You could look at how hard I work, how many hours I put in. You could look at the kinds of things I do in my classroom and examine if I’m adhering to best practices. You could look at how well I know my students and their families, how well I’m attempting to meet their needs.

 

Or you could just look at my students’ test scores and give me a passing or failing grade based on whether they pass or fail their assessments.

 

It’s called Value-Added Measures (VAM) and at one time it was the coming fad in education. However, after numerous studies and lawsuits, the shine is fading from this particularly narrow-minded corporate policy.

 

Most states that evaluate their teachers using VAM do so because under President Barack Obama they were offered Race to the Top grants and/or waivers.

 

Now that the government isn’t offering cash incentives, seven states have stopped using VAM and many more have reduced the weight given to these assessments. The new federal K-12 education law – the Every Student Succeeds Act (ESSA) – does not require states to have educator evaluation systems at all. And if a state chooses to enact one, it does not have to use VAM.

 

That’s a good thing because the evidence is mounting against this controversial policy. An evaluation released in June of 2018 found that a $575 million push by the Bill and Melinda Gates Foundation to make teachers (and thereby students) better through the use of VAM was a complete waste of money.

 

Meanwhile a teacher fired from the Washington, DC, district because of low VAM scores just won a 9-year legal battle with the district and could be owed hundreds of thousands of dollars in back pay as well as getting his job back.

 

But putting aside the waste of public tax dollars and the threat of litigation, is VAM a good way to evaluate teachers?

 

Is it fair to judge educators on their students’ test scores?

 

Here are the top 10 reasons why the answer is unequivocally negative:

 

 

1) VAM was Invented to Assess Cows.

I’m not kidding. The process was created by William L. Sanders, a statistician in the college of business at the University of Knoxville, Tennessee. He thought the same kinds of statistics used to model genetic and reproductive trends among cattle could be used to measure growth among teachers and hold them accountable. You’ve heard of the Tennessee Value-Added Assessment System (TVAAS) or TxVAAS in Texas or PVAAS in Pennsylvania or more generically named EVAAS in states like Ohio, North Carolina, and South Carolina. That’s his work. The problem is that educating children is much more complex than feeding and growing cows. Not only is it insulting to assume otherwise, it’s incredibly naïve.

 

2) You can’t assess teachers on tests that were made to assess students.

This violates fundamental principles of both statistics and assessment. If you make a test to assess A, you can’t use it to assess B. That’s why many researchers have labeled the process “junk science” – most notably the American Statistical Association in 2014. Put simply, the standardized tests on which VAM estimates are based have always been, and continue to be, developed to assess student achievement and not growth in student achievement nor growth in teacher effectiveness. The tests on which VAM estimates are based were never designed to estimate teachers’ effects. Doing otherwise is like assuming all healthy people go to the best doctors and all sick people go to the bad ones. If I fail a dental screening because I have cavities, that doesn’t mean my dentist is bad at his job. It means I need to brush more and lay off the sugary snacks.

 

3) There’s No Consistency in the Scores.

Valid assessments produce consistent results. This is why doctors often run the same medical test more than once. If the first try comes up positive for cancer, let’s say, they’re hoping the second time will come up negative. However, if multiple runs of the same test produce the same result, that diagnosis gains credence. Unfortunately, VAM scores are notoriously inconsistent. When you evaluate teachers with the same test (but different students) over multiple years, you often get divergent results. And not just by a little. Teachers who do well one year may do terribly the next. This makes VAM estimates extremely unreliable. Teachers who should be (more or less) consistently effective are being classified in sometimes highly inconsistent ways over time. A teacher classified as “adding value” has a 25 to 50% chance of being classified as “subtracting value” the next year, and vice versa. This can make the probability of a teacher being identified as effective no different than the flip of a coin.

 

4) Changing the test can change the VAM score.

If you know how to add, it doesn’t matter if you’re asked to solve 2 +2 or 3+ 3. Changing the test shouldn’t have a major impact on the result. If both tests are evaluating the same learning and at the same level of difficulty, changing the test shouldn’t change the result. But when you change the tests used in VAM assessments, scores and rankings can change substantially. Using a different model or a different test often produces a different VAM score. This may indicate a problem with value added measures or with the standardized tests used in conjunction with it. Either way, it makes VAM scores invalid.

 

5) VAM measures correlation, not causation.

Sometimes A causes B. Sometimes A and B simply occur at the same time. For example, most people in wheelchairs have been in an accident. That doesn’t mean being in a wheelchair causes accidents. The same goes for education. Students who fail a test didn’t learn the material. But that doesn’t mean their teacher didn’t try to teach them. VAM does not measure teacher effectiveness. At best it measures student learning. Effects – positive or negative – attributed to a teacher may actually be caused by other factors that are not captured in the model. For instance, the student may have a learning disability, the student may have been chronically absent or the test, itself, may be an invalid measure of the learning that has taken place.

 

6) Vam Scores are Based on Flawed Standardized Tests.

When you base teacher evaluations on student tests, at very least the student tests have to be valid. Otherwise, you’ll have unfairly assessed BOTH students AND teachers. Unfortunately standardized tests are narrow, limited indicators of student learning. They leave out a wide range of important knowledge and skills leaving only the easiest-to-measure parts of math and English curriculum. Test scores are not universal, abstract measures of student learning. They greatly depend on a student’s class, race, disability status and knowledge of English. Researchers have been decrying this for decades – standardized tests often measure the life circumstances of the students not how well those students learn – and therefore by extension they cannot assess how well teachers teach.

 

7) VAM Ignores Too Many Factors.

When a student learns or fails to learn something, there is so much more going on than just a duality between student and teacher. Teachers cannot simply touch students’ heads and magically make learning take place. It is a complex process involving multiple factors some of which are poorly understood by human psychology and neuroscience. There are inordinate amounts of inaccurate or missing data that cannot be easily replaced or disregardedvariables that cannot be statistically controlled for such as: differential summer learning gains and losses, prior teachers’ residual effects, the impact of school policies such as grouping and tracking students, the impact of race and class segregation, etc. When so many variables cannot be accounted for, any measure returned by VAMs remains essentially incomplete.

 

8) VAM Has Never been Proven to Increase Student Learning or Produce Better Teachers.

That’s the whole purpose behind using VAM. It’s supposed to do these two things but there is zero research to suggest it can do them. You’d think we wouldn’t waste billions of dollars and generations of students on a policy that has never been proven effective. But there you have it. This is a faith-based initiative. It is the pet project of philanthrocapitalists, tech gurus and politicians. There is no research yet which suggests that VAM has ever improved teachers’ instruction or student learning and achievement. This means VAM estimates are typically of no informative, formative, or instructional value.

 

9) VAM Often Makes Things Worse.

Using these measures has many unintended consequences that adversely affect the learning environment. When you use VAMs for teacher evaluations, you often end up changing the way the tests are viewed and ultimately the school culture, itself. This is actually one of the intents of using VAMs. However, the changes are rarely positive. For example, this often leads to a greater emphasis on test preparation and specific tested content to the exclusion of content that may lead to better long-term learning gains or increasing student motivation. VAM incentivizes teachers to wish for the most advanced students in their classes and to push the struggling students onto someone else so as to maximize their own personal VAM score. Instead of a collaborative environment where everyone works together to help all students learn, VAM fosters a competitive environment where innovation is horded and not shared with the rest of the staff. It increases turnover and job dissatisfaction. Principals stack classes to make sure certain teachers are more likely to get better evaluations or vice versa. Finally, being unfairly evaluated disincentives new teachers to stay in the profession and it discourages the best and the brightest from ever entering the field in the first place. You’ve heard about that “teacher shortage” everyone’s talking about. VAM is a big part of it.

 

10) An emphasis on VAM overshadows real reforms that actually would help students learn.

Research shows the best way to improve education is system wide reforms – not targeting individual teachers. We need to equitably fund our schools. We can no longer segregate children by class and race and give the majority of the money to the rich white kids while withholding it from the poor brown ones. Students need help dealing with the effects of generational poverty – food security, psychological counseling, academic tutoring, safety initiatives, wide curriculum and anti-poverty programs. A narrow focus on teacher effectiveness dwarfs all these other factors and hides them under the rug. Researchers calculate teacher influence on student test scores at about 14%. Out-of-school factors are the most important. That doesn’t mean teachers are unimportant – they are the most important single factor inside the school building. But we need to realize that outside the school has a greater impact. We must learn to see the whole child and all her relationships –not just the student-teacher dynamic. Until we do so, we will continue to do these children a disservice with corporate privatization scams like VAM which demoralize and destroy the people who dedicate their lives to helping them learn – their teachers.

 


NOTE: Special thanks to the amazingly detailed research of Audrey Amrein-Beardsley whose Vamboozled Website is THE on-line resource for scholarship about VAM.


 

Like this post? I’ve written a book, “Gadfly on the Wall: A Public School Teacher Speaks Out on Racism and Reform,” now available from Garn Press. Ten percent of the proceeds go to the Badass Teachers Association. Check it out!

WANT A SIGNED COPY?

Click here to order one directly from me to your door!

book-1

Teachers Don’t Want All This Useless Data

26948475_l-too-much-data

One of the most frustrating things I’ve ever been forced to do as a teacher is to ignore my students and concentrate instead on the data.

 

I teach 8th grade Language Arts at a high poverty, mostly minority school in Western Pennsylvania. During my double period classes, I’m with these children for at least 80 minutes a day, five days a week.

 

During that time, we read together. We write together. We discuss important issues together. They take tests. They compose poems, stories and essays. They put on short skits, give presentations, draw pictures and even create iMovies.

 

I don’t need a spreadsheet to tell me whether these children can read, write or think. I know.

 

Anyone who had been in the room and had been paying attention would know.

 

But a week doesn’t go by without an administrator ambushing me at a staff meeting with a computer print out and a smile.

 

Look at this data set. See how your students are doing on this module. Look at the projected growth for this student during the first semester.

 

It’s enough to make you heave.

 

I always thought the purpose behind student data was to help the teacher teach. But it has become an end to itself.

 

It is the educational equivalent of navel gazing, of turning all your students into prospective students and trying to teach them from that remove – not as living, breathing beings, but as computer models.

 

It reminds me of this quote from Michael Lewis’ famous book Moneyball: The Art of Winning an Unfair Game:

 

“Intelligence about baseball statistics had become equated in the public mind with the ability to recite arcane baseball stats. What [Bill] James’s wider audience had failed to understand was that the statistics were beside the point. The point was understanding; the point was to make life on earth just a bit more intelligible; and that point, somehow, had been lost. ‘I wonder,’ James wrote, ‘if we haven’t become so numbed by all these numbers that we are no longer capable of truly assimilating any knowledge which might result from them.'”

 

The point is not the data. It is what the data reveals. However, some people have become so seduced by the cult of data that they’re blind to what’s right in front of their eyes.

 

You don’t need to give a child a standardized test to assess if he or she can read. You can just have them read. Nor does a child need to fill in multiple choice bubbles to indicate if he or she understands what’s been read. They can simply tell you. In fact, these would be better assessments. Doing otherwise, is like testing someone’s driving ability not by putting them behind the wheel but by making them play Mariocart.

 

The skill is no longer important. It is the assessment of the skill.

 

THAT’S what we use to measure success. It’s become the be-all, end-all. It’s the ultimate indicator of both student and teacher success. But it perverts authentic teaching. When the assessment is all that’s important, we lose sight of the actual skills we were supposed to be teaching in the first place.

 

The result is a never ending emphasis on test prep and poring over infinite pages of useless data and analytics.

 

As Scottish writer Andrew Lang put it, “He uses statistics as a drunken man uses lamp posts – for support rather than for illumination.”

 

Teachers like me have been pointing this out for years, but the only response we get from most lawmakers and administrators is to hysterically increase the sheer volume of data and use more sophisticated algorithms with which to interpret it.

 

Take the Pennsylvania Value Added Assessment System (PVAAS). This is the Commonwealth’s method of statistical analysis of students test scores on the Pennsylvania System of School Assessment (PSSA) and Keystone Exams, which students take in grades 3-8 and in high school, respectively.

 

It allows me to see:

  • Student scores on each test
  • Student scores broken down by subgroups (how many hit each 20 point marker)
  • Which subgroup is above, below or at the target for growth

 

But perhaps the most interesting piece of information is a prediction of where each student is expected to score next time they take the test.

 

How does it calculate this prediction? I have no idea.

 

That’s the kind of metric they don’t give to teachers. Or taxpayers, by the way. Pennsylvania has paid more than $1 billion for its standardized testing system in the last 8 years. You’d think lawmakers would have to justify that outlay of cash, especially when they’re cutting funding for just about everything else in our schools. But no. We’re supposed to just take that one on faith.

 

So much for empirical data.

 

Then we have the Classroom Diagnostic Tools (CDT). This is an optional computer-based test given three times a year in various core subjects.

 

If you’re lucky enough to have to give this to your students (and I am), you get a whole pile of data that’s supposed to be even more detailed than the PVAAS.

 

But it doesn’t really give you much more than the same information based on more data points.

 

I don’t gain much from looking at colorful graphs depicting where each of my students scored in various modules. Nor do I gain much by seeing this same material displayed for my entire class.

 

The biggest difference between the PVAAS and the CDT, though, is that it allows me to see examples of the kinds of questions individual students got wrong. So, in theory, I could print out a stack of look-a-like questions and have them practice endless skill and drills until they get them right.

 

And THAT’S education!

 

Imagine if a toddler stumbled walking down the hall, so you had her practice raising and lowering her left foot over-and-over again! I’m sure that would make her an expert walker in no time!

 

It’s ridiculous. This overreliance on data pretends that we’re engaged in programming robots and not teaching human beings.

 

Abstracted repetition is not generally the best tool to learning complex skills. If you’re teaching the times table, fine. But most concepts require us to engage students’ interests, to make something real, vital and important to them.

 

Otherwise, they’ll just go through the motions.

 

“If you torture the data long enough, it will confess,” wrote Economist Ronald Coase. That’s what we’re doing in our public schools. We’re prioritizing the data and making it say whatever we want.

 

The data justifies the use of data. And anyone who points out that circular logic is called a Luddite, a roadblock on the information superhighway.

 

Never mind that all this time I’m forced to pour over the scores and statistics is less time I have to actually teach the children.

 

Teachers don’t need more paperwork and schematics. We need those in power to actually listen to us. We need the respect and autonomy to be allowed to actually do our jobs.

 

Albert Einstein famously said, “Not everything that can be counted counts, and not everything that counts can be counted.”

 

Can we please put away the superfluous data and get back to teaching?

Data Abuse – When Transient Kids Fall Through the Cracks of Crunched Numbers

Screen shot 2015-05-16 at 9.13.34 AM

I was teaching my classes.

I was grading assignments.

I was procrastinating.

I should have been working on my class rosters.

My principals wanted me to calculate percentages for every student I had taught that year and submit them to the state.

How long had each student been in my grade book? What percentage of the year was each learner in my class before they took their standardized tests?

If I didn’t accurately calculate this in the next few days, the class list generated by the computer would become final, and my evaluation would be affected.

But there I was standing before my students doing nothing of any real value – teaching.

I was instructing them in the mysteries of subject-verb agreement. We were designing posters about the Civil Rights movement. I was evaluating their work and making phone calls home.

You know – goofing off.

I must not have been the only one. Kids took a half-day and the district let us use in-service time to crunch our numbers.

Don’t get me wrong. We weren’t left to the wolves. Administrators were very helpful gathering data, researching exact dates for students entering the building and/or transferring schools. Just as required by the Commonwealth of Pennsylvania.

But it was in the heat of all this numerological chaos that I saw something in the numbers no one else seemed to be looking for.

Many of my students are transients. An alarming number of my kids haven’t been in my class the entire year. They either transferred in from another school, transferred out, or moved into my class from another one.

A few had moved from my academic level course to the honors level Language Arts class. Many more had transferred in from special education courses.

In total, these students make up 44% of my roster.

“Isn’t that significant?” I wondered.

I poked my head in to another teacher’s room.

“How many transient students are on your roster?” I asked.

She told me. I went around from room-to-room asking the same question and comparing the answers.

A trend emerged.

Most teachers who presided over lower level classes (like me) had about the same percentage of transients – approximately 40%. Teachers who taught the advanced levels had a much lower amount – 10% or below.

Doesn’t that mean something?

Imagine if you were giving someone simple instructions. Let’s say you were trying to tell someone how to make a peanut butter and jelly sandwich. But in the middle of your instruction, a student has to leave the room and go right next door where someone is already in the middle of trying to explain how to do the same thing.

Wouldn’t that affect how well a student learned?

If someone was trying to give me directions how to get somewhere under those circumstances, I’m willing to bet I’d get lost.

And this assumes the break between Teacher A and Teacher B is minimal, the instruction is disrupted at the same point and both teachers are even giving instruction on the exact same topics.

None of that is usually true.

I did some more digging. Across the entire building, 20% of our students left the district in the course of this school year. About 17% entered mid-year. So at least 37% of our students were transients. That’s 130 children.

The trend holds district wide. Some schools have more or less transients, but across the board 35% – 40% of our students pop in and out over the year.

Taking an even broader view, student mobility is a national problem. Certainly the percentage of student transience varies from district to district, but it is generally widespread.

Nationally, about 13 percent of students change schools four or more times between kindergarten and eighth grade, according to a 2010 Government Accountability Office analysis. One-third of fourth graders, 19 percent of eighth graders, and 10 percent of twelfth graders changed schools at least once over two years, according to the 1998 National Assessment of Educational Progress (NEAP).

And it gets worse if we look at it over a student’s entire elementary or secondary career. In fact, more students moved than remained in a single school, according to a national longitudinal study of eighth graders.

This problem is even more widespread among poor and minority students. The type of school is also a factor. Large, predominantly minority, urban school districts attract the most student mobility. In Chicago public schools, for instance, only about 47 percent of students remained in the same school over a four-year period. Fifteen percent of the schools lost at least 30 percent of their students in only one year.

And this has adverse affects on children both academically and psychologically.

Several studies at both the elementary and secondary levels conclude student mobility decreases test scores and increases the drop out rate.

A 1990s Baltimore study found, “each additional move” was associated with a .11 standard deviation in reading achievement. A similar 1990s Chicago study concluded that students with four or more moves had a .39 standard deviation. Highly mobile students were as much as four months behind their peers academically in fourth grade and as much as a full year behind by sixth grade, according to a 1993 Chicago study by David Kerbow.

It just makes sense. These students have to cope with starting over – fitting in to a new environment. They have to adjust to new peers and social requirements.

Moreover, transients have an increased likelihood of misbehaving and participating in violence. After all, it’s easier to act out in front of strangers.

What causes this problem? Most often it is due to parental job insecurity.

Parents can’t keep employment or jobs dry up resulting in the need to move on to greener pastures.

In my own district, one municipality we serve is mostly made up of low-cost housing, apartments and slums. It is a beacon  for mobility. Few people who haven’t lived here their whole lives put down roots. We’re just another stop on a long and winding road.

“We should be doing something about this,” I thought.

Our legislators should help promote job security. We should make it easier to afford quality housing. We should try to encourage new-comers to become part of the community instead of remain eternal outsiders.

At our schools, we need resources to help this population make the necessary adjustments. We should encourage them to participate in extra-curricular activities, provide counseling and wraparound services.

But we don’t do any of that.

Instead, we gather mountains of data.

We sort and sift, enter it into a computer and press “submit.”

And off it goes to the Pennsylvania Value Added Assessment System (PVAAS).

We don’t use it to help kids.

We use it to blame school teachers for things beyond their control.

Data has value but that doesn’t mean all data is valuable.

We need to know what we’re looking for, what it means and how to use it to make our world a better place.

Otherwise it’s just a waste of precious class time.

And an excuse to continue ignoring all the children who fall through the cracks.


NOTE: This article also was published on the LA School Report and the Badass Teachers Association blog.