ID Bar
Feature headline
Volume I, Number 5, May, 2003

False Claims About Phonemic Awareness, Phonics, Skills
vs. Whole Language, and Recreational Reading

by Stephen Krashen (about author)


There is insufficient evidence to support the National Reading Panel's claims that phonemic awareness training significantly improves children's reading, that systematic phonics instruction is superior to less intensive instruction, and that skills-based approaches are superior to whole language. Also, contrary to the conclusions of the National Reading Panel, there is abundant evidence that encouraging children to read more in school is beneficial.

© 2003, Stephen Krashen, all rights reserved.

The recent National Reading Panel report (National Reading Panel, 2000) contained a number of claims about reading and reading instruction. In this paper, I review four of them:

1. The claim that phonemic awareness training significantly improves children's reading ability.
2. The claim that systematic phonics instruction is more effective than less systematic phonics instruction.
3. The claim that "skills"- based approaches are superior to whole language approaches.
4. The claim that there is no clear evidence that encouraging children to read more in school improves reading achievement.

I argue below that there is sufficient evidence to challenge all four of these claims.


Phonemic awareness is the ability to divide a word into its component sounds. A number of studies have been done in which children are given direct teaching or "training" in phonemic awareness, and the claim has been made that such training is "clearly effective," that it helps children "learn to read and spell" and benefits reading comprehension as well as word reading (National Reading Panel, 2000, 2-40).

I recently performed a review of the research on phonemic awareness (PA) training studies (Krashen, 2001a). In this review, I attempted to find studies that met two conditions: (1) they were really studies of "pure" PA, not of PA combined with phonics (sometimes referred to as PA instruction taught with letters). By definition, PA is an aural ability. Many studies, however, combine PA training with phonics. (2) students were tested on reading comprehension, not just on tests of PA or on tests in which they read lists of words in isolation without meaningful context.

There were very few studies. After reviewing the research myself, asking colleagues on various listservs, and reviewing the voluminous report of the National Reading Panel, I was able to find only six published studies of pure PA training using tests of reading comprehension. These six studies contained a total of 11 comparisons of PA-trained children and non-trained children. Of the six studies, only three dealt with English-speaking children. Only one of these three was done in the United States. The three other studies were of Spanish-speaking children, Hebrew-speaking children and Norwegian-speaking children, languages that happen to be phonetically much more regular than English.

The overall results were unimpressive. Effect sizes were calculated by subtracting the mean of the comparison group from the mean of the experimental group, then dividing the result by the pooled standard deviation. All effect sizes were weighted for sample size (Wolf, 1986). In no case were pretest scores of experimental and comparison groups obviously different. The average effect size for all eleven comparisons was +.35 in favor of phonemic awareness. An effect size of .35 is generally considered to be between a small (d = .2) and medium (d = .5) effect (Wolf, 1986). There were, however, a number of individual finding that should shake confidence in the value of PA training:

1. In one study (Weiner, 1994), the effect size was positive for one of two comparisons (.40) but negative for the other (-.41).
2. In three comparisons, effect sizes were very low, .13 or less (two from Defior and Tudela, 1994, one comparison from Hatcher, Helm and Ellis, 1994).
3. In four studies (six comparisons), the number of children who underwent PA training was very small: Bradley and Bryant, 1983, 13 children; Defior and Tudela, 1994, nine children, Wiener, 1994, five children, Kozminsky and Kozminsky, 1995, 15 children.
4. Only one study reported substantial effect sizes as well as statistically significant results in favor of those trained in phonemic awareness, a study done in Israel with Hebrew-speaking children, involving only 15 children who underwent PA training.

My summary is presented in Table 1, from Krashen (2001a).

Table 1. Effects of "Pure" PA Studies
study n duration control group effect size/significance level
. exp,cont . . first test delayed test interval
Bradley & Bryant (1983) 13,26 2 yrs conceptual training .54/ns . .
Bradley & Bryant (1983) 13,13 no training .96/.05 . .
Hatcher, Helm & Ellis (1994) 30,31 20 wk regular .08/ns .11/ns 9m
Defior & Tudela (1994) 9,12 6 m manipulation .05/ns .00/ns 2m
Defior & Tudela (1994) 9,12 . classification .13/ns .13/ns .
Weiner (1994): low achievers 5,13 6 wk regular instruction - 41/ns . .
Weiner (1994): middle achievers 5,13 . regular instruction .40/ns . .
Lie (1991): positional 45,51 4 m neutral activities .21/ns .33/ns 1.5 yrs
Lie (1991): sequential 51,51 . neutral activities .62 /.05 .41/.10 .
Kozminsky & Kozminsky (1995) 15,15 8 m general enrichment .59/.05 .61/.05 3 yrs
Kozminsky & Kozminsky (1995) 15,15 . unseen .50/.05 .79/.05 .
"First test" given immediately after training, except for Kozminsky and Kozminsky (1 year delay) and Lie (1 semester delay). Interval: interval between end of training and administration of delayed test.
manipulation: cutting, coloring, etc.
positional: training on initial, final, medial sounds
sequential: training on sounds as they appear in sequential order
"unseen": investigators did not inspect comparison group treatment
n = sample size of experimental group/control group
from: Krashen (2001a

In other words, I found no studies using English that were clearly and strongly supportive of PA training. I found only one study that was clearly and strongly supportive of PA training, and it was done with Hebrew with very few subjects.

One cannot conclude on the basis of this evidence, as many have, that PA training is essential, or even very important. Evidence supporting the PA hysteria that appears to have gripped the schools should be made of much sterner stuff.

The NRP Responds

The National Reading Panel devoted about sixty pages to reviewing the research on PA, and members of the panel published another version of their report in the Reading Research Quarterly (Ehri, Nunes, Willows, Schuster, Yaghoub-Zadeh, and Shanahan, 2001). They did not mention my 2001 paper, but this is understandable, because when their paper was written my article was probably not available to them. They did, however, attempt to represent my position, stating that I held that PA training helps in decoding nonsense words, but has no effect on tests of reading comprehension. The only citation given of my work was an email message posted on a listserv. I wrote the Reading Research Quarterly, asking if I could publish a letter with a fuller explanation of my position. The Quarterly agreed to do this, but added that the NRP researchers would have a chance to reply.

In my letter (Krashen, 2002a), I briefly reviewed some of the points presented above, focusing on the fact that I found only 11 comparisons, with some reporting very low effect sizes, many involving languages other than English, some with very few subjects, and only one presenting statistically significant results for both conditions of the study.

The NRP's response appeared in the same issue (Ehri, Shanahan, and Nunes, 2002). A careful reading of their response shows no real disagreement with my conclusions.

1. They claimed that I relied only on statistical significance and ignored the use of meta-analysis. Clearly, Ehri et. al. had not yet read Krashen (2001a) in which I performed a meta-analysis. In fact, their overall results were identical to mine, with an average effect size of .35. My point was that this overall average, itself not very impressive, hides some embarrassing details.
2. Ehri et. al. reported that for the studies involving only English-speaking subjects, the average effect size was +.28, which they note falls short of statistical significance. They conclude that this "supports Krashen's claim" but add that "more comparisons would yield a firmer conclusion" (p. 129). Of course I agree.
3. Ehri et. al. claim that when PA is combined with phonics, the results are stronger. But when tests of reading comprehension are used, this is not the case. They report that for seven comparisons of the effect of PA plus phonics training on reading comprehension, the overall effect size was .28. This figure is statistically significant, but it is quite small. In fact, it is nearly identical to the effect size reported for the impact of intensive phonics on tests of reading comprehension (Ehri et. al., 2001), d = .27, and even this relationship drops to .12 for children beyond grade 1 (Garan, 2001a).

Most important, the NRP scholars did not contest the claim that so few studies have been done even testing the impact of PA training on reading comprehension. Once again: There are only eleven comparisons, and only five comparisons deal with English-speaking children.

Comparing PA Results with Recreational Reading Results

The NRP found that students in in-school recreational reading programs did better on tests of reading comprehension than comparison children in four comparisons, and there was no difference in 10. They concluded from this that the evidence does not support in-school reading "in a clear and convincing manner" (National Reading Panel, 2000, 3-3), a conclusion I strongly disagree with (see below). Although this conclusion was not based on a meta-analysis with effect size calculations, it is obvious that the PA results are nearly identical: PA trained children read significantly better in four comparisons and there was no difference in seven. Nevertheless, NRP was convinced that PA training is "highly effective" (National Reading Panel, 2000, 2-3).

Additional Evidence: Low PA Can Read OK

Not only does the PA training evidence fail to provide strong support, there are other reasons to suspect that PA is not a crucial element in learning to read: Children without PA or with very low PA often learn to read quite well. Bradley and Bryant (1985) reported that of a group of 316 children, 25 performed especially poorly on a test of PA (one standard deviation below their expected score, based on a test of verbal skills) at ages four and five. Of these, only seven turned out to be poor readers (scoring one standard deviation below their expected reading score, based on IQ) three years later. Thus, 72% of those with low PA were not delayed in learning to read. Stuart-Hamilton (1986) found that 20 five year old children who demonstrated zero phonemic awareness performed adequately on a word identification task, and were judged by their teachers to be making near-normal progress in learning to read.

Also, some adults who are excellent readers do very poorly on tests of PA. Campbell and Butterworth (1985)'s subject R.E. was a university student who "reads as least as well as her fellow undergraduates" (p. 436); she graduated London University with second-class honors in psychology and performed above average on standardized tests of reading. She had great difficulty in reading nonsense words, and while she knew the names of all the letters, she had difficulty making the sounds corresponding to the letters. She also performed poorly on tests of phonemic awareness and phonemic segmentation. Campbell and Butterworth conclude that "Since R.E.'s word reading and spelling are good, strong claims based on the necessity of a relationship between phonemic segmentation and manipulation skills, on the one hand, and the development of skilled reading and writing, on the other, must be weakened" (p. 460). Additional studies of this kind are reviewed in Krashen (2001b).

PA Can Develop Without Training

There are good reasons to suspect that PA can develop quite nicely without training: Comparison groups in nearly all PA training studies show gains in PA (Ehri et. al., 2001, p. 276), and several longitudinal studies reveal growth in PA without training (e.g. Fox and Routh, 1975).

PA: The Result of Reading

PA beyond the initial levels appears to be the result of reading, not the cause. This conclusion is consistent with studies showing low levels of PA among adult illiterates (Morais, Bertelson, Cary and Algeria, 1986, Lukatela, Carello, Shankweiler, and Liberman, 1995), and the observation that all but the most rudimentary aspects of phonemic awareness emerge at about the age children learn to read (Wagner and Torgesen, 1987).

The usual argument supporting the idea that PA is the cause of reading, or a prerequisite, are studies showing that "early" PA is a predictor of "later" reading, that is, PA measured at one point in time correlates positively with reading ability measured at a later time. If, however, one controls for "early" reading ability, this relationship disappears.

Lundberg, Olafsson and Wall (1980, cited in Wagner and Torgesen, 1987) reported a median correlation of .45 between measures of PA at kindergarten (age seven in Sweden) and reading achievement measured one year later. Wagner and Torgesen (1987) calculated partial correlations between PA and first grade reading on this data, with reading level at kindergarten held constant; some of the kindergarten children had already learned to read to some extent. The resulting correlation was only .06, consistent with the hypothesis that reading proficiency at kindergarten may have been the true predictor of both first grade reading and PA at kindergarten. Ellis (1990) also reported no relationship between PA measured at age 5 with reading at age 6, when reading at age 5 was included as a predictor. Similarly, PA at age 6 was not a predictor of reading at age 7 when reading at age 6 was included as a predictor. Several other studies report very low or reduced relationships between early PA and later reading when measures of early reading ability are taken into consideration (Wagner, Torgesen, Rashotte, Heckt, Barker, Burgess, Donahue and Garon, 1997; de Jong and van der Leij, 2002). The usual tests used to detect early reading, which often involve reading words in isolation, may, in addition, be too stringent to detect much of early literacy development (Barron, 1998, p. 158). More sensitive measures may result in more evidence that the relationship between early PA and later reading is spurious.

Evidence suggesting that reading experience alone, and not phonics instruction, may be the cause of the development of phonemic awareness comes from Foorman et. al. (1993) who reported no difference in growth in PA during grade one between classes with more or less direct teaching of letter-sound correspondences, and Murray, Stahl, and Ivey (1996), in which gains in PA were seen from storybook reading alone. Neuman (1999) increased the number of books available to preschoolers and provided ten hours of training of staff with a focus on ways of encouraging interaction with books, especially reading aloud. The treatment lasted eight months. Neuman reported a clear improvement in the print environment at the end of treatment in the classrooms that received the "books aloud" treatment. On a delayed post test, six months later, participant children did significantly better than comparison children on tests of phonemic awareness (d = .57 and .54 for tests of rhyming and alliteration, my calculations). It is highly likely that the children had some direct instruction in letters and their sounds, but they certainly did not have the kind of "phonemic awareness training" some people are calling for. This result is consistent with the hypothesis that phonemic awareness is the result of reading and experiences with print.

I have informal evidence to add to this: I have asked audiences to perform the classic PA task of stripping the initial consonant from a word like "pit." Of course, everybody gets this right with no problem. Then I ask them to do the same with "split." After some hesitation, most people get it right. I then ask them how they did it. Universally, people report that they spelled the word in their mind's eye, removed the /p/ sound, and pronounced the remainder. This confirms that the ability to do complex PA activities is dependent on the ability to read. (See Tumner and Nesdale, 1982 for evidence of similar behavior in first graders, and Ehri and Wilce, 1980, in fourth graders.)

The Unbearable Coolness of Phonemic Awareness

Why is there so much enthusiasm for PA training? Besides the obvious advantages to those who create PA programs, I suspect that a major reason is the fact that PA appears to fit so well into a bottom-up skill-building model of reading, one in which readers must first learn sound-spelling correspondences (phonics) in order to learn to read. If this is true, it makes intuitive sense that learning to isolate sounds is a prerequisite to phonics.

There is, however, another possibility, that we learn to read by reading, by making sense of what is on the page. We learn to read without nonsense, to paraphrase Frank Smith, not by first learning to read nonsense. While some knowledge of phonics can occasionally help make texts more comprehensible (see below), according to the "Comprehension Hypothesis" most of our knowledge of phonics, and our ability to perform complex PA tasks, emerges as a result of reading. Unfortunately, for many people, the skill-building hypothesis is not a hypothesis at all, it is an axiom. It is obviously true and is beyond question. PA training fits into the skill-building view very nicely, and is thus irresistible, despite the weak evidence in training studies and the additional evidence showing that PA is clearly not a prerequisite for learning to read.


The National Reading Panel claimed to find "solid support for the conclusion that systematic phonics instruction makes a bigger contribution to children's growth in reading than alternative programs providing unsystematic or no phonics instruction" (National Reading Panel, 2000, p. 2-92).

The NRP's conclusions on the impact of phonics actually show that systematic phonics instruction has a very limited advantage. The NRP reported an overall effect size of d = .46 in favor of programs that provided systematic, intensive phonics, as compared to programs providing less or no phonics. As seen in Table 2, the effect was dependent on the kind of measure used, with systematic phonics showing a greater effect on reading single regularly spelled words out loud, and a smaller effect on tests involving reading texts (for details, see Garan, 2001a, 2001b, 2002).

Table 2 - Impact of Intensive Phonics on Different Kinds of Tests
test effect size
regular words: .67
pseudo words: .60
oral reading of text: .25
comprehending text: .27
(from: National Reading Panel, 2000, Table 3, appendix E)

The .27 effect size for reading comprehension deserves more comment. The panel found an effect size of d = .51 between intensive phonics instruction and performance on tests of reading comprehension for younger children (K-1), and no significant relationship for older children (grades 2-6; d = .12). Garan (2001a, 2001b, 2002) argues that the stronger relationship between intensive phonics instruction and reading comprehension for younger children may be due to the fact that reading comprehension tests for young children contain very short passages with many phonetically regular words. The relationship is greatly diminished when passages become more complex and more words with irregular sound-spelling correspondences are included, as is the case with reading comprehension measures used with older children.

Shanahan (2001), in a response to Garan (2001b), notes that reading comprehension tests given to young children must be short and contain phonetically regular words, otherwise they would be too hard. Shanahan's statement tells us why the tests are the way they are; it does not, however, diminish the force of Garan's observation. It remains the case that intensive phonics instruction has little impact on tests of reading comprehension when passages and the words used become more complex.

The NRP also found that the effect of systematic phonics instruction faded with time. For studies that included both immediate post tests and delayed post tests, the effect size dropped from .51 to .27 (six comparisons; all measures combined). The time interval between the immediate and delayed tests ranged from four months to one year.

Before considering what these results imply for theory, it should be pointed out that they are not new (nor does the NRP claim they are). The systematic phonics advantage was also reported by Chall (1967), and others have reported that the superiority of intensive phonics fades with time, with the advantage disappearing by third grade (Chall, 1967; Dykstra, 1974).

There is, thus, a temporary advantage for systematic phonics, one that appears to be quite modest when tests involve reading real texts. The usual interpretation of this result is that this superiority provides support for the "Skill-Building Hypothesis," the view that language is learned by first mastering the parts, and then, through drill and exercise, working up to larger units. But the results of these studies are also consistent with the Comprehension Hypothesis, the view that language is acquired by understanding messages (Krashen, 1985); in the case of reading, this means the comprehension of texts (Smith, 1994; Goodman, 1982).

Smith (1994) explains that some conscious knowledge of phonics rules can help make texts more comprehensible. He provides the following example. A child is reading the sentence "The man is riding on the h ---." and cannot, at first, read the final word. But if the child knows what sound /h/ makes, this information, along with context, will help reduce the possibilities and thus help the child identify the word. The combination of some conscious knowledge of phonics, along with context, will not help every time, but it helps enough to make learning some phonics worthwhile. Thus, children who know more phonics will be somewhat better off than those who know less.

But the effect is limited, because there are severe limits as to how much phonics can be consciously learned. Many phonics rules are extremely complex and have numerous exceptions (Adams, 1990; Smith, 1994). Most of our knowledge of phonics, Smith argues, is the result of reading and not the cause. In terms of language acquisition theory, most of our knowledge of phonics is subconsciously acquired as the result of comprehensible input (reading).

Shanahan (2001) defends giving phonics instruction a major role, despite its limitations, because "more than 90 percent of English words are phonetically regular" (p. 70). Shanahan does not provide a source for this statistic, but even if it were true, in order to accomplish this, one needs a staggering collection of rules that include many that are highly complex. Berdiansky, Cronnell, and Koehler (1969, cited in Smith, 1994) concluded that 166 different rules were necessary to account only for the 6000 one- and two- syllable words in the comprehension vocabularies of six to nine year old children (children also knew another 3000 more complex words of three or more syllables that were not considered in the analysis). Moreover, even these 166 rules did not cover all the sound-spelling correspondences; 45 more correspondences were classified as exceptions. Adams (1990) provides a summary of research on the complexity (and unreliability) of many phonics rules. Adams also cites studies showing that those phonics rules that are the most reliable are generally those that apply to infrequent words, while those rules that apply to frequent words are typically the less reliable rules.

In addition, the effect of phonics is exaggerated in tests of reading comprehension. To see why this is so, consider what takes place when fluent readers encounter words that they do not recognize. Smith (1994) points out that fluent readers generally skip words that they do not know and try to confirm predictions about meaning without them. If skipping fails, that is, the text is incomprehensible without the skipped word, the fluent reader will attempt to guess the word from context. If the guess is correct, the passage will make sense. If the guess is not correct, the passage will not make sense, and the reader will try again. When guessing fails, the reader can look the word up or ask someone what it means. The last resort, Smith points out, is to "sound out" the word and to try to identify it from its pronunciation.

Now let us consider what happens when a young child does not recognize a word on a typical grade 1 or grade 2 reading comprehension test. Beginning readers are often taught to carefully examine every word, that is, not to skip. The context is usually so impoverished (a few sentences or even less) that guessing will not be very productive. The child can't ask anyone, because it's a test. "Sounding out," the last alternative in the real world, becomes the first alternative. Thus, a child who knows more phonics will have a bit of an edge on these tests, which is exactly what the research shows when reading comprehension is tested.

The NRP report does not show that intensive phonics is clearly superior to less intensive phonics. It shows only that there is a slight early advantage for those with more phonics, an advantage that fades with time. Moreover, this advantage is exaggerated by the kinds of tests used. The results are entirely consistent with the Comprehension Hypothesis.


Claim #3 is not correct. When one considers tests of reading comprehension, and the amount of real reading done by children, whole language emerges as the winner in these studies.

The National Reading Panel (2000) concluded that systematic phonics approaches were superior to whole language approaches, claiming that the average effect size in favor of phonics was .32 (based on twelve comparisons). In their analysis, however, effect sizes were not analyzed separately for each kind of measure used. Some were measures of reading single words in isolation, some involved real texts. Also, the issue is not whether a treatment is labeled "whole language" or "skills" but how much reading the children actually do. In Evans and Carr (1985), for example, the so-called "traditional" group actually did significantly more silent reading than the "whole language" group. The whole language group did more oral reading of stories the children had written themselves or dictated to the teacher, an activity that entails less new meaning and, most likely, more focus on form.

In Krashen (2002c), I re-analyzed this data with two alterations
(1) Considering only tests of reading comprehension.
(2) Considering not whether a treatment is labeled "whole language" or "phonics" but whether the children in the treatment were actually doing more real reading than the children in the other treatment.

In addition, I included some studies that the NRP had missed.

My results were dramatically different. I found a small advantage favoring whole language on tests of reading comprehension (d = .17).

It should be noted that the studies reviewed by the NRP were not done with what I consider to be the crucial variable in mind: The amount of genuinely interesting, real reading that children did. Thus, my conclusions are post-hoc and are only suggestive. What is clear, however, is that the National Reading Panel's interpretation of the results is not the only possible one.

It is also interesting that studies done with older readers show the same thing: Students who participate in sustained silent reading and self-selected reading programs outperform comparison students (Krashen, 1993). (The NRP has disagreed with this conclusion as well, arguing that evidence is insufficient to arrive at a conclusion. Their analysis, however, omitted several important studies. For discussion, see Krashen, 2001a, and below).


The National Reading Panel (2000) reached this startling conclusion on the basis of only 10 studies and 14 comparisons of sustained silent reading (SSR) with control groups. In SSR, some class time is set aside for free voluntary reading with little or no "accountability." Of these 10 studies, three had positive results, with the students who were engaged in free voluntary reading outperforming control groups. Another study showed positive results for one condition but not for other conditions, and the other studies showed no difference or no gains. Table 3 summarizes these outcomes.

Table 3 - Summary of National Reading Panel results: Duration of treatment and outcomes
Duration Positive No difference Negative
Less than seven months 2 8 0
7 months - 1 year 2 2 0

positive = students in sustained silent reading programs outperform comparisons
Results include ten studies, 14 comparisons

It is not clear whether additional studies were omitted because they did not meet the NRP's criteria or because the NRP scholars did not look very hard. (Some, but only a few, were clearly left out because they did not appear in a refereed journal. Shanahan (2000) claimed that most omitted studies of sustained silent reading using native speakers of English were unpublished dissertations. None of the studies included in Krashen (2001c) are unpublished dissertations. All were published in refereed journals, except for two studies published in the National Reading Handbook Yearbook, one from a book published by the International Reading Association, one study from the Claremont Conference Handbook (Cyrog), and one school district report.)

In Krashen (2001c) I presented additional studies of SSR and similar in-school recreational reading programs. Table 2 presents the "expanded" set of studies, using tests of reading comprehension. Many of the studies summarized in Table 2 meet the four criteria of the NRP and were apparently missed, but there were some "violations" of their criteria: A few were done with students slightly older than the age limit imposed by the NRP; in all of these cases, the subjects were undergraduate college students. Subjects in some of the studies were students of English as a second or foreign language. In several studies, students read in Spanish, not English; in these cases, the students were native speakers of Spanish. Finally, some studies were not published in refereed journals. See Krashen (2001c) for references. Table 4 includes studies included by the NRP as well as those that the NRP did not include.

Table 4 - Duration of treatment and outcomes of SSR studies: Expanded set
Duration Positive No difference Negative
Less than 7 months 7 13 3
7 months - 1 year 9 11 0
Greater than 1 year 8 2 0

In the studies in Table 4, SSR students did as well or better than comparison students in 50 out of 53 comparisons. For longer term studies (those longer than one year), SSR students were superior in eight out of ten studies, and there was no difference in the other two. Note that the NRP did not include any studies lasting longer than one year.

Moreover, there are plausible reasons why the results were not even more positive, eg. documented lack of fidelity of treatment in some studies and students who already read at high levels.

Even applying the NRP's stricter criteria, SSR does very well, with readers doing as well or better than comparisons in 35 out of 36 comparisons. This suggests that the "violations" do not affect the central issue of whether encouraging recreational reading impacts literacy development. Even if one only allows studies that strictly meet the NRP's criteria, the result still favors recreational reading.

Misinterpreted Studies

In addition to excluding relevant studies, the NRP misinterpreted some of the studies that it did include, and included some that it should not have included. In one study (Carver and Liebert, 1995), students were limited to only 135 titles ("the regular library stacks were off limits during the study," p. 33), were provided with incentives, had to take tests on what they read, and had to read in two hour blocks. Successful sustained silent reading programs allow access to any books readers want to read, do not use extrinsic motivators, do not make students accountable for what they read, provide a wide variety of books, and typically meet for a short time each day over a long period. In several other cases, the NRP report was simply inaccurate, both in terms of descriptions of the programs and results (for the gruesome details, see Krashen, 2001c).

Additional Evidence

It should also be pointed out that the case for reading does not rest entirely on studies of sustained silent reading. In "read and test" studies subjects show clear gains in vocabulary and spelling after a brief exposure to comprehensible text. (For a review, see Krashen, 1993). It is hard to attribute these gains to anything but reading.

There are, in addition, compelling case histories that cannot be easily explained on the basis of the competing Skill-Building Hypothesis, cases such as Richard Wright, who credits reading with providing him with high levels of literacy development: "I wanted to write and I did not even know the English language. I bought English grammars and found them dull. I felt that I was getting a better sense of the language from novels than from grammars" (Wright, 1966, p. 275).

Or consider the case of Ben Carson (Carson, 1990), a neurosurgeon who says that his mother's insistence that he read two books a week (of his own choosing) when he was in the fifth grade was a turning point in his life. Carson credits reading with improving his reading comprehension, vocabulary, and spelling, and it helped him move from the bottom of his class in grade five to the top in grade seven. Yes, I know; there was no control group, no tests were given, and the results were not in a refereed journal. But it is hard to imagine any other source for this obvious improvement, and cases like these are not uncommon.

What does "no difference" mean?

The NRP concluded that "the handful of experimental studies" on voluntary reading "raise serious questions" about its efficacy. There are more than a handful of studies. Moreover, the addition of more studies to the analysis provides substantial evidence in support of the effectiveness of recreational reading.

Even a finding of "no difference" between free readers and students in traditional programs suggests that free reading is just as good as traditional instruction, which confirms that free reading does indeed result in literacy growth, an important theoretical and practical point. Because free reading is so much more pleasant than regular instruction (for both students and teachers), and because it provides students with valuable information and insights, a finding of no difference provides strong evidence in favor of free reading in classrooms. Recall also that the results of the NRP's own analysis are nearly identical to the results of their analysis (and mine) of phonemic awareness training; yet the NRP recommends PA training, but does not recommend devoting time in school to recreational reading.


The NRP's conclusions have virtually become "the law of the land." State and local reading plans mirror the NRP's conclusions, and federal funding requires allegiance to them. In fact, as noted earlier, they have become axiomatic, considered by some to be proven facts rather than hypotheses. Obviously, if the above arguments are correct, if the NRP's claims really are false, the implications are staggering. At a minimum, the concerns presented here should demote the status of the NRP's conclusions from axiom back to the level of hypothesis.



Adams, M. (1990). Beginning to read. Cambridge: MIT Press.

Barron, R. (1998). Proto-literate knowledge: Antecedents and influences on phonological awareness and literacy. In C. Hulme and R. M. Joshi (Eds.) Reading and spelling: development and disorders (pp. 153-173). Malwah, NJ: Erlbaum

Bradley, L. & Bryant, P. (1985)..Rhyme and reason in reading and spelling. Ann Arbor, MI: The University of Michigan Press.

Bradley, L. & Bryant, P. (1983). Categorizing sounds and learning to read - a causal connection. Nature, 301, 419-421.

Campbell, R. & Butterworth, B. (1985). Phonological dyslexia and dysgraphia in a highly literate subject: A developmental case with associated deficits and phonemic processing and awareness. The Quarterly Journal of Experimental Psychology, 37A, 435-475.

Carson, B. (1990). Gifted hands. Grand Rapids, MI: Zondervan Books.

Carver, R. & Liebert, R. (1995). The effect of reading library books at different levels of difficulty upon gains in reading. Reading Research Quarterly, 30, 26-48.

Chall, J. (1967). Learning to read: The great debate. New York: McGraw Hill. Updated edition, 1983.

Defior, S. & Tudela, P. (1994). Effect of phonological training on reading and writing acquisition. Reading and Writing, 6, 299-320.

DeJong, P. & van der Leij, A. (2002). Effects of phonological abilities and linguistic comprehension on the development of reading. Scientific Studies of Reading 6(1), 51-77.

Dykstra, R. (1974). Phonics and beginning reading instruction. In C. Walcutt, J. Lamport, and G. McCracken (Eds.) Teaching reading: A psycholinguistics approach to developmental reading (pp. 373-397). New York: Macmillan.

Ehri, L. & Wilce, L. (1980). The influence of orthography on readersÕ conceptualization of the phonemic structure of words. Applied Psycholinguistics 1, 371-385.

Ehri, L., Nunes, S., Willows, D., Schuster, B., Yaghoub-Zadeh, Z., & Shanahan, T. (2001). Phonemic awareness instruction helps children learn to read: Evidence from the National Reading Panel's meta-analysis. Reading Research Quarterly, 36, 250-287.

Ehri. L., Nunes, S., Stahl, S., & Willows, D. (2001). Systematic phonics instruction helps students learn to read: Evidence from the National Reading Panel's meta-analysis. Review of Educational Research, 71(3), 393-447.

Ehri, L., Shanahan, T., & Nunes, S. 2002. Response to Krashen. Reading Research Quarterly, 37(2), 128-129.

Ellis, N. (1990). Reading, phonological skills, and short-term memory: interactive tributaries of development. Journal of Research in Reading, 13, 107-122.

Evans, M, & Carr, T. (1985). Cognitive abilities, conditions of learning, and the early development of reading skill.. Reading Research Quarterly, 20 (3), 327-350.

Foorman, B., Jenkins, L., & Francis, D. (1993). Links among segmenting, spelling, and reading words in first and second grades. Reading and Writing, 5, 1-15.

Fox, B. & Routh, D. (1975). Analyzing spoken language into words, syllables, and phonemes: A developmental study. Journal of Psycholinguistic Research, 4(4), 331-342.

Garan, E. (2001a). Beyond the smoke and mirrors: A critique of the Ntional Reading Panel report on phonics. Phi Delta Kappan, 82(7), 500-506.

Garan, E. (2001).  What does the report of the National Reading Panel really tell us about teaching phonics? Language Arts, 79(1), 61-70.

Garan, E. 2002. Resisting Reading Mandates. Portsmouth, NH: Heinemann.

Goodman, K. 1982. Language, Literacy, and Learning. London: Routledge Kagan Paul.

Hatcher, P., Helm, C. & Ellis, A. (1994). Ameliorating early reading failure by integrating the teaching of reading and phonological skills: The phonological linkage hypothesis. Child Development, 65, 41-57.

Kozminsky, L. & Kozminsky, E. (1995). The effects of early phonological awareness training on reading success. Learning and Instruction, 5, 187-201.

Krashen, S. (1985). The input hypothesis: Issues and implications. Beverly Hills: Laredo Publishing Company.

Krashen, S. (1993). The power of reading. Englewood, CO: Libraries Unlimited

Krashen, S. (2001a). Does ÒpureÓ phonemic awareness training affect reading comprehension? Perceptual and Motor Skills, 93, 356-358.

Krashen, S. (2001b). Low PA can read OK. Practically Primary, 6(3), 17-20.

Krashen, S. (2001c). More smoke and mirrors: A critique of the National Reading Panel report on fluency. Phi Delta Kappan, 83, 119-123.

Krashen, S. (2002a). Phonemic awareness training necessary? Reading Research Quarterly, 37(2), 128.

Krashen, S. (2002b). The great phonemic awareness debate. WSRA Journal (Wisconsin Reading Association), 44(2), 51-55.

Krashen, S. (2002c). The NRP comparison of whole language and phonics: Ignoring the crucial variable in reading. Talking Points, 13(3), 22-28.

Lukatela, K., Carello, C., Shankweiler, D., & Liberman, I. (1995). Phonological awareness in illiterates: Observations from Serbo-Croatian. Applied Psycholinguistics, 16, 463-487.

Lie, A. (1991). Effects of a training program for stimulating skills in word analysis in first grade children. Reading Research Quarterly, 26(3), 234-250.

Morais, J., Bertelson, P., Cary, L. & Alegra, J. (1986). Literacy training and  speech segmentation. Cognition, 24, 45-64.

Murray, B., Stahl, S. & Ivey, M.G. (1996). Developing phoneme awareness through alphabet books. Reading and Writing, 8, 307-322.

National Reading Panel. (2000). Teaching children to read: an evidence-based assessment of the scientific research literature on reading and its implications for reading instruction. Washington: National Institute of Child Health and Human Development.

Neuman, S. (1999). Books make a difference: A study of access to literacy. Reading Research Quarterly, 34(3), 286-311.

Shanahan, T. (2000). Reading panel: A member responds to a critic. Education Week,  May 31, 2000, 39.

Shanahan, T. (2001). Response to Elaine Garan. Language Arts, 79(3), 70-71

Smith, F. (1994).Understanding reading. Fifth Edition. Hillsdale, NJ: Erlbaum.

Stuart-Hamilton, I. (1986). The role of phonemic awareness in the reading style of beginning readers. British Journal of Educational Psychology, 56, 271-285.

Tumner, W. & Nesdale, A. (1982). The effects of digraphs and pseudowords on phonemic segmentation in young children. Applied Psycholinguistics 3, 299-311.

Wagner, R. & Torgesen, J. (1987). The nature of phonological processing and its causal role in the acquisition of reading skills. Psychological Bulletin 101, 192-212.

Wagner, R., Torgesen, J., Rashotte, C., Hecht, S., Barker, T., Burgess, S., Donahue, J., & Garon, J. (1997). Changing relations between phonological processing abilities and word-level reading as children develop from beginning to skilled readers: A five-year longitudinal study. Developmental Psychology 33, 468-479.

Weiner, S. (1994).. Effects of phonemic awareness training on low- and middle-achieving first gradersÕ phonemic awareness and reading ability. Journal of Reading Behavior, 26(3), 277-300.

Wright, R, (1966). Black boy. New York: Harper and Row.

Yatvin, J. (2002). Babes in the woods: The wanderings of the National Reading Panel. In R. Allington (Ed.) Big brother in the national reading curriculum: How ideology trumped evidence (pp. 125-136). Portsmouth, NH: Heinemann.

About the Author

Stephen Krashen is Professor Emeritus of Education at the University of Southern California. After serving the Peace Corps in Ethiopia, he earned a Ph.D. in Linguistics from UCLA, and was a post-doctoral fellow at the UCLA Neuropsychiatric Institute. Before joining the USC School of Education, he was a professor of Linguistics at Queens College in New York and at USC.

Krashen is best known for his work in establishing a general theory of second language acquisition, as the co-founder of the Natural Approach, and as the inventor of sheltered subject matter teaching. His most recent books include Three Arguments Against Whole Language and Why They are Wrong (Heinemann) , Foreign Language Education: The Easy Way. (Language Education Associates) ,Condemned Without a Trial: Bogus Arguments Against Bilingual Education (Heinemann), and Explorations in Language Acquisition and Use: The Taipei l Lectures (Heinemann). He also holds a black belt in Tae Kwon Do and was the 1978 Incline Bench Press Champion of Venice Beach, California.

What can you do to change this law before it does great damage to the schools and children in your state and town?
  1. Subscribe to "No Child Left" to stay informed about efforts to repeal NCLB. Click here.
  2. Speak with the school board members, administrators and teachers in your community to learn how NCLB will change schools and learning in your town.
  3. Start communicating with your Senators and Representatives to let them know you want this law changed to put more emphasis on capacity building and support rather than testing and punishment.
  4. Write letters to the editor of your local newspaper expressing your concerns. Illustrate the dangers of this law with specific and compelling examples.
  5. Emphasize concrete alternatives that would do more to improve the futures of disadvantaged children.

A List of ESEA (NCLB) Amendments

1. Fund social programs that impact school readiness so that all children actually enter school ready to learn as the first President Bush promised long ago.

2. Fund capacity building (enhanced teaching and learning) in districts and districts for several years before engaging in punishing labels and reckless choice provisions. Capacity building might mean providing hundreds of hours of training in effective reading strategies, for example. But it does not mean training everybody in a single highly scripted program endorsed by the administration for pseudo-scientific reasons.

3. Devote public money to truly public schools. Be careful not to divert funds to reckless experiments or diploma mills.

4. Fund enough construction of new schools within public systems so parental choice is real.

5. Support informed school choice within public systems.

6. Emphasize rewards and incentives rather than sanctions.

7. Hold all publicly funded schools to standards for performance and quality, whether actually private, charter or truly public. Be careful about simplistic notions of high stakes testing.

8. Fund recruitment and preparation of effective teachers and aides from all racial and economic groups to close the gap between current staffing levels and what is desirable.

9. End the insulting, broad brush assaults on teachers and administrators struggling against difficult challenges.

10. Capitalize on the good research conducted to discover what works best in schools and avoid simplistic panaceas and platitudes imported from the world of business and medicine.

11. Enrich the options available to all children. Forswear tightly scripted, robotic programs and the fast food approaches to school improvement.

12. Build school improvement on a richly defined foundation of alternatives and strategies.

13. Eliminate Trojan horses, hidden agendas and shameful politics from ESEA.

14. Stop using Madison Avenue techniques to hide the harsh realities of so-called compassionate conservatism.