Lies, damned lies, and standardized tests

Article by Gary Norton

Education – September 1997 – Colorado Central Magazine

My summer afternoon was disturbed when I got a call from Colorado Central — they had some 45 pages of school statistics, mostly achievement test scores, on seven school districts in Central Colorado.

The data had been collected by the Independence Institute and sent to the magazine, and Ed and Martha, and at least one educator they asked, had difficulty finding rhyme or reason in the numbers. Ed asked me to go over them and try to make some sense out of it all.

After going over the pages of statistics, I appreciated their quandary. I have been working regularly with such numbers for 17 years now as a school counselor.

I concluded it was a pretty vanilla report.

If one wanted to force a conclusion on the stack of papers, it would be that Central Colorado students are pretty average. The pages were certainly of no help to a parent trying to evaluate the quality of any particular school system. I do have some reactions and insights to those statistics that may be helpful to parents and laymen.

On the subject of student achievement in America, it is appropriate to challenge a misconception often printed in the media. Contrary to what one might think, student achievement in America is actually rising. Part of this confusion comes from SAT scores — results that go up and down over the years. The downs get publicized heavily, the media ignore the ups, and the troubled SAT deserves further commentary later here.

Back to all that local data. The pages of achievement test scores from Leadville in the 1980s are pretty useless now to parents or anyone else.

From Fairplay, I saw pages and pages of achievement test results from Park County schools. Graphs helped underscore that some some grade levels performed well and others poorly. Someone who wasn’t a statistician could get lost in these data.

The five pages from the Moffat schools were similar; they were the complete accountbility reports from 1992 and 1993. It appears that Moffat, like Salida, uses the California Achievement Test, which is being phased out and replaced by its publisher with newer tests. For Buena Vista, they had three pages of Stanford Achievement Test results from 1994 and the 199596 school year.

Perhaps the most insightful jewel came in the seven pages on the Custer County schools in Westcliffe. In addition to pages of achievement test results and college admissions test results, they provided the percentage of the senior class taking the ACT test.

This is useful information I will elaborate on later. Nationally, we hear a lot about schools not encouraging females to aspire to college as well as we do males. If this is a problem, Westcliffe may have the problem in reverse with only 33% of males taking the ACT test and 86% of females.

The Cotopaxi pages were a Cotopaxi school newsletter dated October, 1994, and the school improvement plans from 199495 and 199596. They are very confusing to laymen. Ed asked if they were trying to hide anything.

No, not in this report anyway. They chose to measure school improvement in student achievement by the percentages of students falling in the broad categories of high, average, and low compared to national norms. The product is a report that is very boring, even for us number crunchers.

I checked the “report card” issued by the Independence Institute for Salida High School. It had the scantiest amount of raw data of any of the schools — twelve numbers representing ninth-grade achievement test results for the three most recent school years.

Based on this, the Independence Institute gave Salida High a C+. The main problem is all 12 numbers are wrong. I and another school counselor administered those tests. I received the test results, compiled the report on them, and passed them along to the administration and accountability committee. The administration and accountability committee published the accurate figures in The Mountain Mail and elsewhere.

Ten of the twelve numbers reported by the Institute are lower than Salida freshmen actually scored on the tests; two are higher. As a statistician, I don’t believe the discrepancies are significant. However, using the Institute’s own bizarre grading scale, the real scores warrant a B, rather than a C+. Some people would feel that was very important. I am not too concerned about the discrepancies, but will probably see that the Institute gets more data, and accurate data, in the future.

Achievement test scores have always been reliable and valid for analyzing individual performance and individual improvement or decline over a period of years.

But it has never been appropriate to evaluate schools or teachers on the unreliable and often invalid group data. Test publishers always warn against these unethical uses of test data.

In the highly mobile American society of the 1990s, group data are even more worthless. In the middle and late 1980s, I reported to the Salida School Board that high achievement test scores for junior high and senior high students in Salida was also a tribute to the elementary curriculum and teachers as well.

The community was more stable then. Sadly, we are now as mobile as the rest of the country. In the summer of 1996, for the six grade levels 712 in the Salida schools, about 15% of each class moved away. Those students were replaced by nearly identical numbers of students moving in. It was uncanny that there wasn’t much variance in the 15% figure for each grade level and that the total class sizes remained basically unchanged. That reflects only summer moves. Students and their families continue to move in and out every month of the year.

That can quickly change a class. When you follow a class over a three- to five-year period, you often find a ninth- or tenth-grade class that is almost completely different than the group was in the fifth or sixth grade. Such group scores and individual school district curriculum evaluations, in a highly mobile society like ours, are almost completely useless.

Colorado has been “discovered.” Since 1988, enrollment has grown 14.1%, and the number of students identified as “at risk” has increased 46.4%. The number of classroom teachers per 1,000 students has decreased 3.5%.

Contrary to what you may have read or heard in the media, achievement by students in America’s public schools is actually rising. Commercial tests of achievement are routinely recalibrated about every seven years. All of these tests demonstrate an upward spiral in student achievement.

In layman’s terms, about every seven or eight years, a student has to improve his/her score of correct answers by 13 questions per section of the test to remain “average.”

This is true of the California Achievement Test, Iowa Test of Basic Skills, the Stanford Achievement Test, the Metropolitan Achievement Test, the Comprehensive Tests of Basic Skills, and many others.

The publishers of the Iowa test declared in the mid-1980s that composite achievement was at an all-time high in all test areas. This contradicts conventional wisdom. In 1990, the Educational Testing Service reviewed findings from twenty years of the National Assessment of Educational Progress and concluded “achievement levels are quite stable.”

When the 1990 NAEP data on mathematics were released, they showed growth in average scores over every previous administration of the test.

Perhaps the myth of declining American student achievement stems from the ups and downs of the SAT college admissions test. Until quite recently, the SAT had negligently not been renormed since 1941! The point of reference and “average” referred to for decades was a small group of about 10,000 test takers, before World War II, mostly white, wealthy, males, mostly from Northeastern states, applying to the most prestigious colleges in the country. The average score for this group was 500. Over the years the average SAT score has changed from 500 to 424 for verbal and 478 for math.

In April, 1995, the College Board “recentered” the average based on the high school senior class of 1990, the most recent group on which the College Board had complete information when preparation began several years ago.

From 1950 to 1980, the college-bound population grew from 10 percent to near 50 percent of all high school graduates, as more students aspired to college. The SAT score “decline” does not reflect a decline in ability but rather, a dramatic increase in the total percentage of the population taking the test. Only 10,000 took it in 1941, compared to more than 2 million now.

There are many problems with the SAT test. Students answer 138 multiple choice questions (78 verbal, 60 math). The number of right answers is converted, by a process only a statistician could love, to a scale score from 200 to 800 points. A student who answered 77 of the 78 verbal questions correctly, would receive a 750, 50 full points below the perfect 800. In 1963, a modest decline of about 5% began, then stopped in the mid-1970s.

The SAT was designed to predict success or failure for college freshmen. It is a voluntary test. In years when more students take the test, the average declines. In years when fewer students take the test, the average rises. There is an incredibly strong positive correlation between family income and SAT scores. The average SAT score earned by students goes down by fifteen points for each decrease of $10,000 in family income.

The SAT’s competitor, the American College Test, revises topics and questions each year to correlate with the evolving curricula at American colleges and universities. In my opinion, it is a better test.

That is, it’s better than the SAT at predicting how one student might do in college. It doesn’t tell you much about a high school, and when you aggregate it with a bunch of other test scores, it tells you even less.

Gary Norton has been the counseler at Kesner Junior High School in Salida since 1983.