Mathematics Research

Program Efficacy Studies 1998–2003

2000 Course 3 Middle School Mathematics

Abstract

This study investigated the effects of mathematics textbook programs on student performance. Eighth-grade math students were assigned to either a study group (using the Scott Foresman-Addison Wesley Middle School Math, Course 3 ©1999 textbook program) or control group (using their current mathematics textbook program). Students were tested at the start of the academic year with a nationally normed standardized test (the TerraNova™ CTBS Complete Battery Plus, Level 18, Form A). At the end of the full-year treatment period, students were re-tested with the same standardized test. Both the study students utilizing the Scott Foresman-Addison Wesley Middle School Math, Course 3 ©1999 textbook program and the control students using other mathematics textbook programs showed significant learning improvement over the course of the school year.

Objective

The main objective of this project was to determine whether students who were enrolled in classes using the Scott Foresman-Addison Wesley Middle School Math, Course 3 ©1999 program significantly increased their mathematics knowledge and skills after using the pre-publication chapters for a full school year. An additional objective was to measure whether students using their incumbent textbook program significantly increased their mathematics knowledge and skills during the same time period. The measures that were used were the NCE (Normal Curve Equivalent) and the OPI (Objectives Performance Index) of the TerraNova™ CTBS Complete Battery Plus.

Methodology

The study was designed so that there was a study group to whom the Scott Foresman-Addison Wesley program was administered and a control group to whom the program was not administered. The control group used whatever mathematics program had previously been adopted for use in the school.

Both groups were tested in September 2000, prior to the program's introduction, and then again at the end of the school year in May 2001. Therefore, for both the study group and the control group there was a pre-test score and a post-test score. Only students who completed both the pre- and post-tests were included in this analysis.

A total of ten eighth grade math classes comprising 185 students participated in the study:

Study Group
SF-AW Middle School Math Textbook Program
Control Group
Other Textbook Programs
Pre-Test (administered during 1st week of school) 5 classes
100 students
5 classes
85 students
Post-Test (administered at end of school year) 5 classes
100 students
5 classes
85 students

In all cases, one teacher taught both study and control classes. Both study and control classes were in the same school building. Study and control classes were selected to be similar in student ability levels. Of the five schools participating in this research project, three schools were in suburban settings, and two were in urban settings. These schools had a range of sizes. Study participants were from three states: Colorado, New Jersey, and Washington.

Statistical tests were used to examine the following issues with regard to program effectiveness:

  1. Whether the pre-test scores for the study group and the control group showed significant differences at the starting point of the study;
  2. Whether overall mathematics knowledge and skills increased, decreased, or stayed the same from the pre-test to the post-test among students using the Scott Foresman-Addison Wesley program (the study group) and among students using their incumbent textbook program (the control group); and
  3. Whether the students in the study and control groups showed significant learning improvement in key mathematics diagnostic areas.

Step 1.

This step consisted of an analysis of whether or not there was a statistically significant difference between the study group and the control group on the pre-test score. For this, a t-test on the difference between the study and control mean pre-test scores was used. The measure used in this analysis was the NCE (Normal Curve Equivalent) of the TerraNova™ CTBS Complete Battery Plus.

The hypothesis of the t-tests shown below is that the pretest means of the study and control groups are equal; the alternative hypothesis is that they are not equal, for which a two tail test is appropriate. If the significance of the t-value is less than or equal to .05 then the hypothesis is rejected at the 95% (5% significance) level and the alternate hypothesis is accepted.

Since the significance of the t-value is greater than .05 (that is, 0.306, as shown below), we accept the hypothesis; that is, that the pre-test means of the study and control groups are equal at the 95% level of confidence, showing no significant differences at the starting point of the study.

Subject, Grade Study Pre-test
NCE (base)
Control Pre-test
NCE (base)
Absolute difference t-value Sig (t-value)
Math, 8th Grade 47.06
(100)
44.95
(85)
2.11 1.03 0.306

Step 2.

This step consisted of an analysis of whether or not there was a statistically significant difference between the pre-test and post-test scores within the study group and within the control group. For each group, a t-test on the difference between pre-test and post-test scores was used. The NCE was the measure used in this analysis.

The hypothesis of the t-tests shown below is that the pre-test and post-test means are equal; the alternative hypothesis is that they are not equal. If the significance of the t-value is less than or equal to .05 then the hypothesis is rejected at the 95% (5% significance) level and the alternate hypothesis is accepted.

For the study group, the significance of the t-value is less than .05 (that is, 0.001, as shown below). Therefore we accept the alternative hypothesis: the pre-test and post-test means of the study group are not equal at the 95% level of confidence.

  • Thus, students using the Scott Foresman-Addison Wesley Middle School Math program showed significant improvement in overall test scores from the pre-test to the post-test.

For the control group, the significance of the t-value is also less than .05 (that is, 0.003 as shown below).

  • Thus, students using other textbook programs also showed significant improvement in overall test scores from the pre-test to the post-test.
Subject, Grade Study Pre-test
NCE (base)
Study Post-test
NCE (base)
Absolute difference t-value Sig (t-value)1
Math, 8th Grade 47.06
(100)
50.97
(100)
3.91 3.30 0.001
Subject, Grade Control Pre-test
NCE (base)
Control Post-test
NCE (base)
Absolute difference t-value Sig (t-value)1
Math, 8th Grade 44.95
(85)
48.42
(85)
3.47 3.07 0.003

1 Shaded values are significant at the 95% level of confidence or higher.

Step 3.

In addition to NCE scores, performance on mathematics instructional objectives for the study and control groups was also analyzed. The measure used in this analysis was the OPI (Objectives Performance Index) of the TerraNova™ CTBS Complete Battery Plus.

Consistent with the overall improvement in test scores, both students using Scott Foresman-Addison Wesley Middle School Math, Course 3 ©1999 (the study group) and students using other textbook programs (the control group) showed significant gains in all key mathematics instructional areas over the course of a full school year. For both groups, the highest gains were achieved in the areas of: Patterns, Functions, Algebra; Problem Solving and Reasoning; and Number & Number Relations.

8th Grade Mathematics
Instructional Objectives:
Study
Mean Point Gain1
(pre to post)
Control
Mean Point Gain1
(pre to post)
Number & Number Relations +10.12 +9.28
Computation and Numerical Estimation +8.97 +9.02
Measurement +7.77 +7.31
Geometry & Spatial Sense +9.27 +6.45
Data Analysis, Statistics & Probability +8.32 +8.11
Patterns, Functions, Algebra +12.00 +10.13
Problem Solving & Reasoning +9.37 +10.53
TOTAL GAIN +65.82 +60.83

1 Shaded values are significant at the 95% level of confidence or higher.

Conclusions

Students who used the Scott Foresman-Addison Wesley Middle School Math, Course 3 ©1999 program showed significant overall learning improvement over the course of a full school year. Students using other textbook programs also showed significant learning improvement over this same period.

Consistent with the overall findings, students in both the study and control groups showed significant improvement on all key mathematics instructional objectives. The largest gains were accomplished in the areas of Patterns, Functions, Algebra; Problem Solving and Reasoning; and Number & Number Relations.

Technical Post Script

For the purpose of completeness, we include here a brief discussion of the statistical tests and measures that were used. All of the statistical tests are in the classical statistical domain, and are broadly used across all disciplines including psychometrics. The programs that were used in the computational steps of this project were SPSS versions 6/9.

The t-test that was used states: The mean of one population is equal to the mean of another population when the variance is unknown. The hypothesis is that the means are equal; and the alternative hypothesis is that they are unequal; for which we use a two sided test. The means and the variance are calculated and the t-statistic is computed. The significance of the t-statistic then computed. If the significance of the t-statistic is less than .05 the hypothesis is rejected; otherwise, the alternative hypothesis is accepted.

Normal Curve Equivalents: "Comparison of Scores across Tests. The normal curve equivalent (NCE) scale, ranging from 1 to 99, coincides with the national percentile scale (NP) at 1, 50, and 99. NCEs have many of the same characteristics as percentile ranks, but have the additional advantage of being based on an equal-interval scale. The difference between two successive scores on the scale has the same meaning throughout the scale. This property allows you to make meaningful comparisons among different achievement test batteries and among different tests within the same battery. You can compare NCEs obtained by different groups of students on the same test or test battery by averaging the test scores for the groups."2

2 Quoted directly from: Teacher's Guide to TerraNova, page 138, CTB/McGraw-Hill 1999.

Objectives Performance Index: "The OPI is an estimate of the number of items that a student would be expected to answer correctly if there had been 100 similar items for that objective... The OPI scale runs from '0' for total lack of mastery to '100' for complete mastery. For CTB achievement tests, OPI scores between 0 and 49 are regarded as the Non-Mastery level. Scores between 50 and 74 are regarded as indications of Partial Mastery. Scores of 75 and above are regarded as the Mastery level."3

3 Quoted directly from: Beyond the Numbers, A Guide to Interpreting and Using the Results of Standardized Achievement Tests, page 11, CTB/McGraw-Hill 1997.

All tests were scored by CTB/McGraw-Hill, the publisher of TerraNova. Statistical analyses and conclusions were performed by an independent firm, Pulse Analytics Inc., Ridgewood, New Jersey.