Greater Test Scores Often Mean Less Authentic Learning

Screen Shot 2019-10-04 at 4.25.32 PM

 
The main goal of schooling is no longer learning.

 

It is test scores.

 

Raising them. Measuring growth. Determining what each score means in terms of future instruction, opportunities, class placement, special education services, funding incentives and punishments, and judging the effectiveness of individual teachers, administrators, buildings and districts.

 

We’ve become so obsessed with these scores – a set of discrete numbers – that we’ve lost sight of what they always were supposed to be about in the first place – learning.

 

In fact, properly understood, that’s the mission of the public school system – to promote the acquisition of knowledge and skills. Test scores are just supposed to be tools to help us quantify that learning in meaningful ways.

 
Somewhere along the line we’ve misconstrued the tool for the goal. And when you do that, it should come as no surprise that you achieve the goal less successfully.

 

There are two kinds of standardized assessment – aptitude and achievement tests. Both are supposed to measure scholarship and skill – though in different ways.

 

Aptitude tests are designed to predict how well a student will do in the future. Achievement tests are designed to determine how much a student knows now.

 

There is, of course, intense overlap between these two types because aptitude tests base their predictions on assessment of achievements. So they’re basically achievements tests that go one step further. They ask questions designed to give more information than just the present state but also about whether a student has progressed to a state which is most likely to then give way to another state in the future.

 

Either way, standardized assessments are supposed to be based on what students have learned. But the problem is that not all learning is equal.

 

For example, a beginning chef needs to know how to use the stove, have good knife skills and how to chop an onion. But if you give her a standardized test, it instead might focus on how long to stir the risotto.

 

That’s not as important in your everyday life, but the tests make it important by focusing on it.

 

The fact of the matter is that standardized tests do NOT necessarily focus on the most important aspects of a given task. They focus on obscurities – things that most students don’t know.

 

This is implicit in the design of these exams and is very different from the kinds of tests designed by classroom teachers.

 

When a teacher makes a test for her students, she’s focused on the individuals in her classes. She asks primarily about the most essential aspects of the subject and in such a way that her students will best understand. There may be a few obscure questions, but the focus is on whether the test takers have learned the material or not.

 

When psychometricians design a standardized test, on the other hand, they aren’t centered on the student. They aren’t trying to find out if the test taker knows the most important facts or has the most essential skills in each field. Instead, there is a tendency to eliminate the most important test questions so that the test – not the student – will be better equipped to make comparisons between students based on a small set of questions. After all, a standardize test isn’t designed for a few classes – it is one size fits all.

 

New questions are field tested. They are placed randomly on an active test but don’t count toward the final score. Test takers aren’t told which questions they’ll be graded on and which are just practice questions being tried out on students for the first time. So students presumably give their best effort to both types. Then when the test is scored, the results of the field test questions determine if they’ll be used again as graded questions on a subsequent test.

 

According to W. James Popham, professor emeritus at the University of California and a former president of the American Educational Research Association, standardized test makers take pains to spread out the scores. Questions answered correctly by too many students – regardless of their importance or quality – are often left off the test.

 

If 40 to 60 percent of test takers answer the question correctly, it might make it onto the test. But questions that are answered correctly by 80 percent or more of test takers are usually jettisoned.

 

He writes:

 

“As a consequence of the quest for score variance in a standardized achievement test, items on which students perform well are often excluded. However, items on which students perform well often cover the content that, because of its importance, teachers stress. Thus, the better the job that teachers do in teaching important knowledge and/or skills, the less likely it is that there will be items on a standardized achievement test measuring such knowledge and/or skills.”

 

Think about what this means.

 

We are engaged in a system of assessment that isn’t concerned with learning so much as weeding people out. It’s not about who knows what, but about which questions to ask that will achieve the predetermined bell curve.

 

We talk about leaving no child left behind, and making sure all students do better on standardized tests, but these tests are norm-referenced. By definition, all students cannot score well no matter how great their knowledge or skills. If you gave a standardized test to a class of genius-level intellects, there would still be the same percentage of failures and outstanding scores with the majority clustered in the middle. That’s how the tests are designed.

 

And if this highly suspect method of question selection, alone, doesn’t achieve that end, the test companies have a way to correct the scores at the end of the process through the way they grade them.

 

These tests are graded with cut scores. In other words, the state or the testing company or the graders, themselves, decide anew each year which scores are passing and which failing.

 

One year a 1200 might be proficient. Another year it’s basic. It all depends on what the decision makers come up with on a given year.

 

What do they base this on? No one has ever given a definitive answer. In fact, I doubt there is one. In each case, the deciding body just makes it up.

 

We’ve seen countless times when state scores are criticized for being too low one year, and then they miraculously bounce up the next. It’s not that students score differently, it’s that the cut score was raised. Why? Perhaps to stifle questions about the test’s validity. After all, people are less angry when more students pass.

 

The goal is always getting the bell curve. That is what validates the tests. But it’s a human construction, not a function of assessment. It says less about the test takers than the test makers and their enablers.

 

This has huge implications for the quality of education being provided at our schools. Since most administrators have drunk deep of the testing Kool-Aid, they now force teachers to use test scores to drive instruction. So since the tests don’t focus on the most essential parts of Reading, Writing, Math, and Science, neither does much of our instruction.

 

We end up chasing the psychometricians. We try to guess which aspects of a subject they think most students don’t know and then we teach our students that to the exclusion of more important information. And since what students don’t know changes, we end up having to change our instructional focus every few years based on the few bread crumbs surreptitiously left for us by the state and the testing corporations.

 

That is not a good way to teach someone anything. It’s like teaching your child how to ride a bike based on what the neighbor kid doesn’t know.

 

It’s an endless game of catch up that only benefits the testing industry because they cash in at every level. They get paid to give the tests, to grade the tests and when students fail, they get paid to sell us this year’s remediation material before kids take the test again, and – you guessed it – the testing companies get another check!

 

It’s a dangerous feedback loop, a cycle that promotes artificially prized snippets of knowledge over constructive wholes. But this degradation of education isn’t even the worst part.

 

The same method of question selection also builds economic and racial bias into the very fabric of the enterprise.

 

According to Prof. Martin Shapiro of Emory University, when test makers select questions with the greatest gaps between high and low scorers, they are selecting against minorities. Think about it – if they pick questions based on the majority getting it right, which minority got it wrong? In many cases, it’s a racial minority. In fact, this may explain why white students historically do better on standardized tests than black and Hispanic students.

 

This process may factor non-school learning and social background into the questions. They are based on the experiences of white middle-to-upper class children.

 

So when we continually push for higher test scores, not only are we ultimately dumbing down the quality of education in our schools, but we’re also explicitly lobbying for greater economic and racial bias in our curriculum trickling down from our assessments.

 

As Ibram X. Kendi, author of “How to be an Antiracist” puts it:

 

“Standardized tests have become the most effective racist weapon ever devised to objectively degrade Black minds and legally exclude their bodies.”

 

 

71087927_10157069943303860_6347601423106048000_n

Popham is less critical of high stakes testing. He sees more of a problem in using student test scores to assess teacher performance. But even he thinks the tests and the scores are being over valued and misunderstood in a wider context.

 

He writes:

 

“Merely because these test scores are reported in numbers (sometimes even with decimals!) should not incline anyone to attribute unwarranted precision to them. Standardized achievement test scores should be regarded as rough approximations of a student’s status with respect to the content domain represented by the test.”

 

I’d go even further.

 

Standardized test scores are tools used by big business to make money. That is as far as their validity goes.

 

And the fact that we make so many vital educational decisions on them is nothing less than criminal.

 

The tests are bogus nonsense at best and a conspiracy against the poor and minorities at worst.

 

When well-meaning people let themselves get wrapped up in knots over low scores and what that means for student learning, they are actually hurting the very thing that they value.

 

Student learning is not bettered by higher test scores. It is often made worse by them.

 

High test scores don’t mean greater learning. They often mean learning the knowledge du jour to the detriment of what’s really important. They mean biased education against the poor and minorities.

 

And they make those with real concerns complicit in a sham being perpetrated on our children and our society.

 


 

 

Like this post? I’ve written a book, “Gadfly on the Wall: A Public School Teacher Speaks Out on Racism and Reform,” now available from Garn Press. Ten percent of the proceeds go to the Badass Teachers Association. Check it out!

book-4

 

16 thoughts on “Greater Test Scores Often Mean Less Authentic Learning

  1. Steven Singer writes about one more pervasive and inflexible, yet undebated aspect of public education: standardized testing. Since NCLB was enacted, corporate reformers imposed the use of standardized testing with the argument that it would help to gather information about students and schools, and thus it would be a most necessary tool to judge and make critical decisions about interventions and funding. At this point in time, standardized testing has become so much a part of the public education system that making a case against it risks being considered a heretic in this neoliberal public education establishment.
    I appreciate Steven Singer’s pointing out a few interestingly neglected points. One is the wrongness of scores having trumped actual learning, which is the real goal of education. A twist that by extension elevated standardizing tests to the unjustified status that makes them the immovable fixtures they are now. Another is the fallacy of standardized testing’s validity to measure or evaluate. Aptitude tests are not to be confused with standardized testing, as it is arbitrarily happening right now. These two aspects would make a strong case against the current use of standardized testing.
    However, Singer’s third and most obscure point about the actual making of these tests presents a disturbing aspect of the standardized testing industry. It is easy to understand that a teacher’s test designed with students in a particular class and subject is definitely different from a psychometric test that looks not for a particular knowledge. A teacher selects questions to help him evaluate learning. Can standardized tests designers assure to share the same intention and pragmatic design? For most people, educators included, it is not so easy to detect whether the questions in a standardized testing are valid. How do we know that these questions are valid, reliable, useful, and relevant? How can we be sure that these tests are not unfair or biased? The truth is that it is very difficult to find these answers!
    In the past fifteen years of using standardized testing as indicated by the corporate reformers, standardized testing has done little to nothing to improve public education, schools, teaching, and learning. As a matter of fact, the use or abuse of standardized testing has been a most damaging instrument against public schools. The chronic lower scores have served well corporate reformers to capriciously exert undue stress and threats to all public school teachers, causing demoralization and even depression. As direct result of using scores, public schools teachers have been fired, schools closed and turned into charter schools (which have not done any better). Indeed, perversely using the scores from standardized testing, corporate reformers have managed to weaken and even dismantle public school systems, as it happened in New Orleans.
    What standardized testing has remarkably achieved in a few years is to build a billion dollar industry that takes most of its money from public education systems in a variety of ways –tests, books, and consultants among other ways. Every state, every district, every school spend a serious amount of money to testing. From internet provision to the distribution of tablets, or having IT departments, the public school system now is spending regularly to maintain the standardized testing industry. Isn’t it time for public school advocates to do something about a flawed system that has been used arbitrarily mostly to undermine public schools, and that is extremely expensive?
    Who wins, who loses, who cares?
    In solidarity
    Sergio Flores

    Like

  2. This is a comprehensive, yet succinct and accessible, explainer. I think I will be pointing to it in future organizing efforts. Thanks!

    Like

  3. Steven, I would encourage you and your readers to think critically about some of these claims.
    I’ll start with three and let you know that there are many other claims in this piece that are not true and lead to a Trump-like distortion field around how tests work. It’s fine to hate tests, but some of the theories you are positing here are as crazy as Pizza-Gate.

    1. “These tests are graded with cut scores. In other words, the state or the testing company or the graders, themselves, decide anew each year which scores are passing and which failing.”
    WRONG. Cut scores are set during the first year a new test is established during standard setting.
    “One year a 1200 might be proficient. Another year it’s basic. It all depends on what the decision makers come up with on a given year.” Wrong. 1200 is a scaled score and the scaled scores have the same meaning every year… In Massachusetts 500 is always the lowest scaled score for “Meeting Expectations.” The Knowledge, skills and abilities needed for a 500 was established in 2017 in grades 3-8 by content experts. That score connotes the same level of KSA each year. The number of questions needed to get a 500 can vary depending on student’s performance on matrix equating items that measure whether they are gaining or losing ability.

    What do they base this on? No one has ever given a definitive answer. In fact, I doubt there is one. In each case, the deciding body just makes it up. WRONG: this is all documented in each state or national test’s technical reports that are reviewed by national Technical Advisory Committees. The ones for Massachusetts are all here: http://www.doe.mass.edu/mcas/tech/?section=techreports

    Comment: Why on earth would measurement experts conspire with presumably evil commissioners of education and elected officials to create arbitrary standards as you suggested?

    2. You quote Jim Popham about how items are selected then claim, “If 40 to 60 percent of test takers answer the question correctly, it might make it onto the test. But questions that are answered correctly by 80 percent or more of test takers are usually jettisoned.”

    False, there are sometimes restrictions on using multiple choice items answered by <25-30% of students, (you don't get much information about a student's knowledge skills and ability if the population could get the same score by answering the question as they would by random guessing.
    There also should be a limitation on the number of items over 90% because they are very easy and don't provide much information about the ability of top students. A good state test needs to include a variety of items with a wide variety of p-values (percent correct). The goal here is to get information at the top, middle and bottom of the ability spectrum. If all your items have p-values between 40-60% you will only be accurately measuring the middle to top-level kids and you would get terrible accuracy and reliability . On such a test a low ability student would probably get nothing correct, which doesn't happen on a balanced test. We allow a few items up to 97%, but if the whole test is made of 80-99% correct items you will have huge proportions getting a perfect score.

    It's not a problem per se if 1/3 of your students are getting a perfect score. I have a hard time believing there are any states with challenging frameworks that 1/3 of their students can get a perfect score on. The first principle of test design is to sample thoroughly from the curriculum frameworks, the second principle is to vary the difficulty.

    Comment: Do you really want curriculum frameworks and tests that are so easy that 20-50% of the students can answer every question? You could ask 3rd graders to identify the letters in the alphabet and get results like that. Is that the kind of curriculum we want?

    2. "“Standardized tests have become the most effective racist weapon ever devised to objectively degrade Black minds and legally exclude their bodies.”…

    "And the fact that we make so many vital educational decisions on them is nothing less than criminal."
    "The tests are bogus nonsense at best and a conspiracy against the poor and minorities at worst."

    Perhaps these claims of criminal, racist conspiracy should be addressed. If such a conspiracy exists, it is an exceedingly well hidden, and ineffective.

    In the years since state and federal testing mandates were established high school graduation rates, college completion rates have improved dramatically for African Americans.
    I'll let you research your own state's data from the past decade https://datacenter.kidscount.org/locations but I haven't seen any jurisdiction where high school graduation rates or college attendance rates are going down over the past 25 years. African American SAT scores are up from 862 to 933 between 2007 and 2019 (https://blog.prepscholar.com/average-sat-scores-over-time) as participation has increased and college attendance and completion rates have risen.

    The high school graduation rate for African Americans in Massachusetts in 2008 was 64.4% last year it was 80.1%
    http://profiles.doe.mass.edu/grad/grad_report.aspx?orgcode=00000000&orgtypecode=0&
    Meanwhile the proportion of African American attending college after high school reached 68.8% up from 58.9 percent in 2004. http://profiles.doe.mass.edu/nsc/gradsattendingcollege_dist.aspx?orgcode=00000000&fycode=2018&orgtypecode=0&amp;

    The number of African Americans attending college after high school has risen 38% from 3,063 to 4,227 since 2006, during which time the number enrolled in each class beginning as 8th graders rose from 6,646 to 7,019.

    Many of the biggest gains in our state were made before 2006, but that was before our student tracking systems were in existence.

    Comment: You measure what you treasure and if the results are not to your satisfaction you take steps to address them.

    I suppose nobody would be concerned about achievement, high school completion and college graduation gaps if we didn't measure them, but that wouldn't make them disappear.

    Like

    • Hey, everyone. Meet Robert Lee, Chief Analyst at the Massachusetts Department of Education and co-author and manager of the Massachusetts Comprehensive Assessment System (MCAS), the test all public school students in the state have to pass in order to graduate. I’m always a bit surprised when people like him with such distinguished backgrounds and titles opt for anonymity when commenting on my blog instead of introducing themselves. It’s almost like he’s ashamed of having readers know that he’s responsible for many of the high stakes testing decisions in his state, like knowing that would hurt his credibility.

      Oh my, my, my. It appears I’m guilty of spreading falsehoods about Mr. Lee’s industry. Let’s see. He says cut scores are not changed from year to year but only from one test to another. Very interesting since MA has had the same accountability system – the MCAS – since 1993 but has changed the test multiple times basing it on various versions of the ever changing Partnership for Assessment of Readiness for College and Careers (PARCC) exam. So in MA when they change the cut score they claim it’s a different version of the same test. That’s not the MCAS! It’s the MCAS.2! Sounds like double-talk to me, but I’m just a school teacher.

      What else have I gotten wrong? I said scaled scores could have different meanings depending on the cut score. He says the scaled scores are consistent between tests. It’s just when they modify the tests – as they did in MA as recently as 2017 – that they change. Again, doesn’t seem like much of a difference to me.

      Oh! Mr. Lee has given us a link to the MA Dept of Ed explaining in baroque detail how his state’s cut scores were determined this last time. And the answer is… to get the bell curve just like I said in the first place.

      When it comes to whether field testing selects questions with the highest difference between correct and incorrect answers, Mr. Lee confides that “there are sometimes restrictions” against this. Well that is a relief! Psychometricians do this occasionally, maybe routinely, maybe even a lot – BUT THEY DON’T DO IT FOR EVERY QUESTION! That must be a real comfort to minority kids labelled failing because of the color of their skin.

      Ah! So high school graduation and college acceptance rates are going up! So testing didn’t stop all these kids from pursuing a future! Of course, an increasing number of colleges are making the SAT and ACT optional every year. High schools are offering alternative paths to graduation that don’t require standardized test scores – portfolio projects, for instance. In fact, that’s why many in the testing industrial complex where Mr. Lee works are calling for “higher standards.” They want to stop just such shenanigans.

      Mr. Lee asks, “Why on earth would measurement experts conspire with presumably evil commissioners of education and elected officials to create arbitrary standards as you suggest?” Good question. Some clearly profit from the industry. They either take kick backs or campaign contributions – you know, the way government works. Others have drunk the Kool-Aid. They truly believe this stuff. That doesn’t make them any less wrong.

      Mr. Lee worries that if we didn’t measure student achievement, nothing would get better. The racial proficiency gap would persist but be invisible. This is a straw man argument. No one is claiming we stop measuring student learning. People like me are suggesting we use a different tool than standardized testing. Use classroom grades, portfolio projects, things that authentically demonstrate learning not testing.

      I suppose the biggest thing we can take away from Mr. Lee’s comment is that articles like the one above are getting under the skin of bureaucrats in the testing industrial complex. And for that knowledge, I am truly thankful.

      Like

      • Keep in mind I’m just a parent, interested taxpayer, & Spanish special to PreK/K’s [no stdzd tests], but answer me this. Mr Lee says: “Do you really want curriculum frameworks and tests that are so easy that 20-50% of the students can answer every question? You could ask 3rd graders to identify the letters in the alphabet and get results like that. Is that the kind of curriculum we want?” I can hardly make heads nor tails of it, but it seems a peculiar conflation of curriculum and standardized tests. My curriculum outlines what I intend to teach—obviously not including a review of what students already know cold. I will be overjoyed if all the tots learn all that I teach. Despite lack of stdzd tests, I assess how they’re doing in every class, & strive for best results. I have ways of ensuring that quick learners get pushed a little harder & strugglers get encouragement & extra 1-on-1. (Assess that, Mr Lee.) Let them all get an A+ regardless of what test-makers consider normal for every 3-4y.o. in the state. Mr Lee’s statement infers stdzd testing “is” the curriculum, which itself can be graded on how easy or hard students find test questions—yet his details do not disprove Singer’s illustration of how test-Q selection drives curriculum into the ditch. Stdzd testing is the tail wagging the dog these days, to every learner’s detriment.

        Like

  4. Not to get picky on semantics but we should absolutely stop trying to “measure” learning. It can’t be measured. There is no standard of measurement for learning. What is it? Who determines it? If it can be changed or manipulated in any way, then by definition it can’t be a standard. We need to stop kidding ourselves that these tests have any use other than weapons for privatization or the Almighty Dollar for Pearson.

    Liked by 1 person

    • Thanks for the comment, Oakland Mom. I see where you’re coming from, but I can’t entirely agree. I don’t think the problem is measurement, per se. It’s the degree of precision standardized tests pretend to have over the degree of learning that has or has not happened. We aren’t entirely ignorant of when learning takes place and even about how much has happened. But any attempt to quantify this can only be a loose approximation. And there are better approximations than standardized tests.

      Liked by 1 person

    • Oakland Mom is flipping a philosophical rock. Can we measure health? happiness? good government? In all cases the answer is yes, but.

      For health, you can measure many indicators of disease, and wellness (running speed, heart rate, reaction strength) but you may miss aspects of well-being.

      If you only measure health by blood oxygen rates during exercise you will miss a lot. If you measure what you value and are concerned about today, you may miss growing problems tomorrow.

      There are some great philosophical questions about the measurement of knowledge that rise from turning over this rock. One of the ones I’m most concerned with is whether behaviorism (right, wrong external signals about knowledge) can be carried too far and stunt constructivism (self knowledge and ownership of learning.)

      At most levels I assume that both behaviorists and constructivists are working together to pull children away from ignorance, naivete and superstition. But they often fight each other so fiercely that one gains the impression that they are at opposite ends and that the other is in favor of ignorance.

      Like

      • From an article I wrote called “Don’t Worry About Grade Inflation. Worry About Grading Fairly”:

        “…A test is a snapshot of student learning. It has its place, but the information it gives you is very limited.

        Most of my grades are based on projects, homework, essays, class discussion, creative writing, journaling, poetry, etc. Give me a string of data points from which I can extrapolate a fair grade – not just one high stakes data point.

        This may work to some degree because of the subject I teach. Language arts is an exceptionally subjective subject, after all. It may be more challenging to do this in math or science. However, it is certainly attainable because it is not really that hard to determine whether students have given you their best work.

        Good teaching practices lend themselves to good assessment.

        You get to know your students. You watch them work. You help them when they struggle. By the time they hand in their final product, you barely need to read it. You know exactly what it says because you were there for its construction.

        For me, this doesn’t mean I have no students who fail. Almost every year I have a few who don’t achieve. This is usually because of attendance issues, lack of sleep, lack of nutrition, home issues or simple laziness.

        I only have control over what happens in the classroom, after all. I can call home and try to work with parents, but if those parents are – themselves – absent, unavailable or unwilling to work with me, there’s little I can do.

        And before you start on about standardized testing and the utopia of “objectivity” it can bring, let me tell you about one such student I had who was not even trying in my class.

        He never turned in homework, never tried his best on assignments, rarely attended and sleepwalked through the year. However, he knew his only chance was the state mandated reading test – so for three days he was present and awake. The resulting test score was the only reason he moved on to the next grade.

        Was he smart? Yes. Did he deserve to go on to the next grade not having learned the important lessons of his classmates? No. But your so-called “objective” measure valued three days of effort over 180.

        The problem is that we are in love with certain academic myths.

        MYTH 1: Grading must be objective.

        WRONG! Grading will never be objective because it is done by subjective humans. These standardized tests you’re so in love with are deeply biased on economic and racial lines. Whether you pass or fail is determined by a cut score and a grading curve that changes from year-to-year making them essentially useless for comparisons and as valid assessments. They’re just a tool for big business to make money off the academic process…”

        Source: https://gadflyonthewallblog.com/2018/06/11/dont-worry-about-grade-inflation-worry-about-grading-fairly/

        Liked by 1 person

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.