experimental or control condition (k = 52). The other effect sizes come from studies that
assigned their subjects randomly by class (k = 4) or by matching (k = 12), used a non-random
assignment procedure (k = 1), or used some other procedure for the assignment of subjects (k
4.3.3 Effect Sizes
The sample size in the primary studies ranged from 24 to 463, with an average sample
size of 106.66 (SD = 102.83). Given the relatively small sample size of some studies, a
correction for an upwards bias was applied, which resulted in a corrected effect size noted as
ES‘ (Lipsey & Wilson, 2001). This meta-analysis contained 70 effect sizes obtained from 40
unique studies. The effect sizes ranged from -0.78 to 2.29. Table 4.2 shows the unweighted
effect sizes ordered from smallest to largest per feedback type.
It can be concluded from Table 4.2 that the majority of the effect sizes concern the
effects on lower-order learning outcomes (k = 43). Other effect sizes (k = 27) indicated the
effects on higher-order learning outcomes or a combination of lower-order and higher-order
learning outcomes. Furthermore, the results showed that the majority of the effect sizes were
concerned with immediate feedback (k = 58), and only a small number of the effect sizes were
associated with feedback that was delivered with a delay (k = 12).
In the majority of cases (k = 61), students only received feedback at one assessment occasion.
In two cases, students received feedback on their assessment results twice. In the remainder of
the cases (k = 7), students were given feedback three times or more.
Table 4.2 shows that the number of effect sizes that were negative was 12, and for
each feedback type, there were both negative and positive effect sizes. The distribution of the
effect sizes and their 90% CI is shown in Figure 4.1. No cases were found that deviated more
than three SDs from several group averages. Therefore, we decided not to exclude or adjust
any effect size.
heterogeneous than what might be expected from sampling variance alone (χ2= 446, df =
69, p < 0.001). This result suggested that a fixed model may not be suitable. Therefore,
weights were supplemented by a random component (see Equation 6). No indication was
found for publication bias. In the case of this meta-analysis one would need 76 extra studies
with an overall effect size of 0 to obtain a small mean weighted effect size of 0.2 (Hattie,
2009). Furthermore, the ratio of the number of included effect sizes and number of studies
(70/40) was considered too small to perform multilevel analyses in order to account for the
non-independence of effect sizes within studies. For this reason, the analyses were performed
using regular linear regression.
We computed the mean weighted effect sizes for various relevant variables. The
results are shown in Table 4.3. The mean weighted effect size in Table 4.3 was the associated
mean effect size for each category, followed by the corresponding 90% CI and p-value. Table
4.3 shows that of the three feedback types, the effects for KR were smallest and those for EF
were largest. With regard to feedback timing, the effects for immediate feedback were larger
than for delayed feedback. Furthermore, the effect sizes appeared to be larger for higher-order
learning outcomes than for lower-order learning outcomes. The effects found in university or
college settings were somewhat larger than those found in primary and high school settings.
The effects also seemed to differ with respect to the various subject areas from which they
were derived. The effects of studies in mathematics were very large, whereas the effects of
studies in social sciences and science were medium, and those in languages were small.