ID Bar
Feature headline
Volume III, Number 9, October, 2005

Mountains out of Molehills:
Spinning the National Report Card to Make NCLB Look Good

By Jamie McKenzie (About Author)

On July 14, when the announcement came that 9 year olds had made modest gains in reading and math on the 2004 National Report Card and 13 year olds had made modest gains on math but none on reading, the White House and the Ed Department declared victory for NCLB and started tooting their horns gleefully as if they had achieved something amazing, even though NCLB had not even been in effect for most of the years measured.



The modest gains reported above look like mole hills when the full scale of the possible test scores are shown, but a favorite technique of propagandists is to cut off the base to make small changes appear bigger than they are.

Click here for large version. Click here for very large version.

It is important to read past the executive summaries of these reports to check out the methodology and read the warnings usually skipped in the executive summaries.

This Report Card shows that the White House and the Ed Department have actually failed AYP - their own test of Acceptable Yearly Progress. The gains are too modest to warrant bragging rights. Indeed, the gains have been blown way out of proportion and the announcements out of the Ed Department read more like propaganda than carefully worded educational research.

Note below how the chart issued in the report is manipulated so as to create the impression of dramatic change when change is modest, a mere ripple in the 35 year history of such tests. Compare the manipulated chart below with the one above. The top and bottom ranges are compressed so the range of movement in the middle looks larger.

We should all applaud growth and improvement, but claiming victory for NCLB is an outrageous misuse of the data and the report. It is fuzzy math, propaganda and distortion. It is spinning.

The authors of the report specifically warn against drawing any such conclusions regarding causes:

Cautions in Interpretations

As previously stated, the NAEP reading and mathematics trend scales make it possible to examine relationships between students’ performance and various background factors measured by NAEP. However, a relationship between achievement and another variable does not reveal its underlying cause, which may be influenced by a number of other variables.

Source: NAEP 2004 TRENDS IN ACADEMIC PROGRESS - Page 117

When did NCLB begin and what years are covered by this testing?

Voted into law by Congress in 2002, NCLB did not really begin to shift school practice until the 2002-2003 school year, but the Report Card covers the school years since the 1999 Report:

Years Covered by the Report Card Years NCLB was Effecting School Programs
1999-2000 .
2000-2001 .
2001-2002 .
2002-2003 .
2003-2004 2003-2004

Laid out coldly in black and white, it is obvious that the White House and the Ed Department have stretched the truth dramatically.

How Big are the Gains Really?

The point gains on the NAEP scaled scores amount to each child in the study getting one or two extra items correct compared to the last teasting in 1999.

This hardly a revolutionary shift in the reading and math performance of our students.

Conflicting Results

Issues of Methodology

The Report Card uses sampling techniques that are quite complicated and the methodology section is hard to read and understand, but it is time well spent in order to measure the reliability of the findings.

The 2004 Report Card emerges in two versions - a summary and a detailed report. While there have been some dramatic shifts in the way this report and its sampling were conducted since the 1999 Report, newspaper coverage ignored these changes and the potential validity issues emerging from such changes.

Although the authors warn that the Report's value depends upon consistent methods being applied across the years, they go on to report quite a few changes in this Report that could undermine confidence in the findings if one ever took the time to read about them.

Measuring trends of student achievement, or change over time, requires the precise replication of past procedures. Since their inception, the design and methodology of the NAEP long-term trend assessments have remained constant, to the extent feasible, thereby enabling the continuous monitoring of a fixed set of curriculum topics.


Source: NAEP 2004 TRENDS IN ACADEMIC PROGRESS - Page 91

Sadly, the methodology is rarely reviewed or considered before the findings are passed along as gospel. Even though the authors make the statement about precise replication above, they list many potentially troubling changes that were made in the 2004 Report. The 2004 Report is not a precise replication.

The following are examples of methodology issues drawn from a careful reading of the report, listed here first as headings while linked to explanatory sections.

  1. Sampling techniques have changed. Explanation.
  2. The percentage of Hispanic 9 year olds rose dramatically since 1999. Explanation.
  3. The assessment was explicitly scaled in a cross-age manner only in the base year (1971). Explanation.
  4. Students answer far fewer items than they would in the real NAEP tests. Explanation.
  5. Math items changed from year to year. Explanation.
  6. The basis for geographic primary sampling units changed since 1999. Explanation.
  7. Target population sample size was fewer than 15,000 for 9 year olds. Explanation.
  8. Impact of the weighting system for sample selection. Explanation.
  9. Nonpoststratified weights have been used in the 2004 analysis. Explanation.
  10. Response rates for nonpublic schools selected for participation in the 2004 trend assessments failed to reach the necessary threshold for reporting. Explanation.
  11. Item response theory (IRT) was used to estimate average proficiency for the nation and various student groups of interest within the nation. Explanation.
  12. Degree of uncertainty. Explanation.
  13. NAEP results, like those from all surveys, are also subject to other kinds of errors, including the effects of necessarily imperfect adjustments for student and school nonresponse and other largely unknowable effects associated with the particular instrumentation and data collection methods used. Explanation.
  14. Nonsampling errors can be attributed to a number of sources. Explanation.
  1. Sampling techniques have changed.
  2. The percentage of Hispanic 9 year olds rose dramatically since 1999.
  3. The assessment was explicitly scaled in a cross-age manner only in the base year (1971).
  4. Students answer far fewer items than they would in the real NAEP tests.
  5. Math items changed from year to year.
  6. The basis for geographic primary sampling units changed since 1999.
  7. Target population sample size was fewer than 15,000 for 9 year olds.
  8. Impact of the weighting system for sample selection.
  9. Nonpoststratified weights have been used in the 2004 analysis.
  10. Response rates for nonpublic schools selected for participation in the 2004 trend assessments failed to reach the necessary threshold for reporting.
  11. Item response theory (IRT) was used to estimate average proficiency for the nation and various student groups of interest within the nation.
  12. Degree of uncertainty.
  13. NAEP results, like those from all surveys, are also subject to other kinds of errors, including the effects of necessarily imperfect adjustments for student and school nonresponse and other largely unknowable effects associated with the particular instrumentation and data collection methods used.
  14. Nonsampling errors can be attributed to a number of sources.

© 2005, Jamie McKenzie, all rights reserved. This article may be e-mailed to individuals by individuals, but all other duplication, distribution, publication and use is prohibited without first receiving explicit permission. Contact for information.