This study investigated the effects of mathematics textbook programs on student performance. Eighthgrade math students were assigned to either a study group (using the Scott ForesmanAddison Wesley Middle School Math, Course 3 ©1999 textbook program) or control group (using their current mathematics textbook program). Students were tested at the start of the academic year with a nationally normed standardized test (the TerraNova™ CTBS Complete Battery Plus, Level 18, Form A). At the end of the fullyear treatment period, students were retested with the same standardized test. Both the study students utilizing the Scott ForesmanAddison Wesley Middle School Math, Course 3 ©1999 textbook program and the control students using other mathematics textbook programs showed significant learning improvement over the course of the school year.
The main objective of this project was to determine whether students who were enrolled in classes using the Scott ForesmanAddison Wesley Middle School Math, Course 3 ©1999 program significantly increased their mathematics knowledge and skills after using the prepublication chapters for a full school year. An additional objective was to measure whether students using their incumbent textbook program significantly increased their mathematics knowledge and skills during the same time period. The measures that were used were the NCE (Normal Curve Equivalent) and the OPI (Objectives Performance Index) of the TerraNova™ CTBS Complete Battery Plus.
The study was designed so that there was a study group to whom the Scott ForesmanAddison Wesley program was administered and a control group to whom the program was not administered. The control group used whatever mathematics program had previously been adopted for use in the school.
Both groups were tested in September 2000, prior to the program's introduction, and then again at the end of the school year in May 2001. Therefore, for both the study group and the control group there was a pretest score and a posttest score. Only students who completed both the pre and posttests were included in this analysis.
A total of ten eighth grade math classes comprising 185 students participated in the study:
Study Group SFAW Middle School Math Textbook Program 
Control Group Other Textbook Programs 

PreTest (administered during 1st week of school)  5 classes 100 students 
5 classes 85 students 
PostTest (administered at end of school year)  5 classes 100 students 
5 classes 85 students 
In all cases, one teacher taught both study and control classes. Both study and control classes were in the same school building. Study and control classes were selected to be similar in student ability levels. Of the five schools participating in this research project, three schools were in suburban settings, and two were in urban settings. These schools had a range of sizes. Study participants were from three states: Colorado, New Jersey, and Washington.
Statistical tests were used to examine the following issues with regard to program effectiveness:
This step consisted of an analysis of whether or not there was a statistically significant difference between the study group and the control group on the pretest score. For this, a ttest on the difference between the study and control mean pretest scores was used. The measure used in this analysis was the NCE (Normal Curve Equivalent) of the TerraNova™ CTBS Complete Battery Plus.
The hypothesis of the ttests shown below is that the pretest means of the study and control groups are equal; the alternative hypothesis is that they are not equal, for which a two tail test is appropriate. If the significance of the tvalue is less than or equal to .05 then the hypothesis is rejected at the 95% (5% significance) level and the alternate hypothesis is accepted.
Since the significance of the tvalue is greater than .05 (that is, 0.306, as shown below), we accept the hypothesis; that is, that the pretest means of the study and control groups are equal at the 95% level of confidence, showing no significant differences at the starting point of the study.
Subject, Grade  Study Pretest NCE (base) 
Control Pretest NCE (base) 
Absolute difference  tvalue  Sig (tvalue) 
Math, 8th Grade  47.06 (100) 
44.95 (85) 
2.11  1.03  0.306 
This step consisted of an analysis of whether or not there was a statistically significant difference between the pretest and posttest scores within the study group and within the control group. For each group, a ttest on the difference between pretest and posttest scores was used. The NCE was the measure used in this analysis.
The hypothesis of the ttests shown below is that the pretest and posttest means are equal; the alternative hypothesis is that they are not equal. If the significance of the tvalue is less than or equal to .05 then the hypothesis is rejected at the 95% (5% significance) level and the alternate hypothesis is accepted.
For the study group, the significance of the tvalue is less than .05 (that is, 0.001, as shown below). Therefore we accept the alternative hypothesis: the pretest and posttest means of the study group are not equal at the 95% level of confidence.
For the control group, the significance of the tvalue is also less than .05 (that is, 0.003 as shown below).
Subject, Grade  Study Pretest NCE (base) 
Study Posttest NCE (base) 
Absolute difference  tvalue  Sig (tvalue)^{1} 
Math, 8th Grade  47.06 (100) 
50.97 (100) 
3.91  3.30  0.001 
Subject, Grade  Control Pretest NCE (base) 
Control Posttest NCE (base) 
Absolute difference  tvalue  Sig (tvalue)^{1} 
Math, 8th Grade  44.95 (85) 
48.42 (85) 
3.47  3.07  0.003 
^{1} Shaded values are significant at the 95% level of confidence or higher.
In addition to NCE scores, performance on mathematics instructional objectives for the study and control groups was also analyzed. The measure used in this analysis was the OPI (Objectives Performance Index) of the TerraNova™ CTBS Complete Battery Plus.
Consistent with the overall improvement in test scores, both students using Scott ForesmanAddison Wesley Middle School Math, Course 3 ©1999 (the study group) and students using other textbook programs (the control group) showed significant gains in all key mathematics instructional areas over the course of a full school year. For both groups, the highest gains were achieved in the areas of: Patterns, Functions, Algebra; Problem Solving and Reasoning; and Number & Number Relations.
8th Grade Mathematics Instructional Objectives: 
Study Mean Point Gain^{1} (pre to post) 
Control Mean Point Gain^{1} (pre to post) 
Number & Number Relations  +10.12  +9.28 
Computation and Numerical Estimation  +8.97  +9.02 
Measurement  +7.77  +7.31 
Geometry & Spatial Sense  +9.27  +6.45 
Data Analysis, Statistics & Probability  +8.32  +8.11 
Patterns, Functions, Algebra  +12.00  +10.13 
Problem Solving & Reasoning  +9.37  +10.53 
TOTAL GAIN  +65.82  +60.83 
^{1} Shaded values are significant at the 95% level of confidence or higher.
Students who used the Scott ForesmanAddison Wesley Middle School Math, Course 3 ©1999 program showed significant overall learning improvement over the course of a full school year. Students using other textbook programs also showed significant learning improvement over this same period.
Consistent with the overall findings, students in both the study and control groups showed significant improvement on all key mathematics instructional objectives. The largest gains were accomplished in the areas of Patterns, Functions, Algebra; Problem Solving and Reasoning; and Number & Number Relations.
For the purpose of completeness, we include here a brief discussion of the statistical tests and measures that were used. All of the statistical tests are in the classical statistical domain, and are broadly used across all disciplines including psychometrics. The programs that were used in the computational steps of this project were SPSS versions 6/9.
The ttest that was used states: The mean of one population is equal to the mean of another population when the variance is unknown. The hypothesis is that the means are equal; and the alternative hypothesis is that they are unequal; for which we use a two sided test. The means and the variance are calculated and the tstatistic is computed. The significance of the tstatistic then computed. If the significance of the tstatistic is less than .05 the hypothesis is rejected; otherwise, the alternative hypothesis is accepted.
Normal Curve Equivalents: "Comparison of Scores across Tests. The normal curve equivalent (NCE) scale, ranging from 1 to 99, coincides with the national percentile scale (NP) at 1, 50, and 99. NCEs have many of the same characteristics as percentile ranks, but have the additional advantage of being based on an equalinterval scale. The difference between two successive scores on the scale has the same meaning throughout the scale. This property allows you to make meaningful comparisons among different achievement test batteries and among different tests within the same battery. You can compare NCEs obtained by different groups of students on the same test or test battery by averaging the test scores for the groups."^{2}
^{2} Quoted directly from: Teacher's Guide to TerraNova, page 138, CTB/McGrawHill 1999.
Objectives Performance Index: "The OPI is an estimate of the number of items that a student would be expected to answer correctly if there had been 100 similar items for that objective... The OPI scale runs from '0' for total lack of mastery to '100' for complete mastery. For CTB achievement tests, OPI scores between 0 and 49 are regarded as the NonMastery level. Scores between 50 and 74 are regarded as indications of Partial Mastery. Scores of 75 and above are regarded as the Mastery level."^{3}
^{3} Quoted directly from: Beyond the Numbers, A Guide to Interpreting and Using the Results of Standardized Achievement Tests, page 11, CTB/McGrawHill 1997.
All tests were scored by CTB/McGrawHill, the publisher of TerraNova. Statistical analyses and conclusions were performed by an independent firm, Pulse Analytics Inc., Ridgewood, New Jersey.