Kleij and Eggen (2013) study were reached by e-mail and asked to disseminate the
questionnaire about the redesigned reports in their schools. A call for participants was also
posted on the Cito website, asking users to fill out the online questionnaire. In total, 74 unique
and sufficiently complete responses were retrieved, of which 36 had filled out questionnaire 2,
version 1 (Q2 V1) (α = .91), and 38 had filled out Q2 V2 (α = .88).
Data analysis. The participants‘ responses in Van der Kleij and Eggen‘s (2013) study
on the multiple-response items were rescored as separate dichotomous (correct/incorrect)
items to obtain maximum information. Furthermore, respondents who had filled out less than
25% in the pre-test were removed from Van der Kleij and Eggen‘s data set. This resulted in
data from 93 respondents, and a questionnaire reliability of α = .95. Subsequently, the results
of both questionnaires were analysed using Classical Test Theory (CTT) in TiaPlus (2010).
The overall results in terms of proportion correct (P-values) of the interpretation of the
original and redesigned report versions were compared. Additional analyses were conducted
using IRT. Based on the item parameters from the model, a latent regression model using the
computer program SAUL (Verhelst & Verstralen, 2002) was estimated to determine any
differences in the abilities of the various user groups—teachers, internal support teachers, or
A differential item functioning (DIF) analysis was also performed, in which the item
versions of the redesigned reports were treated as identical to the items belonging to the
original reports. Subsequently, we analysed whether particular items functioned differently in
the original and redesigned reports.
The results of the items measuring users‘ perceptions of the redesigned report,
compared to those of the original report, were examined using descriptive statistics. The
responses were scored from 0 (totally disagree) to 4 (totally agree). Additionally, ANOVA
was used to examine any differences amongst the perceptions of respondents from the various
user groups, and the relationships with various background variables.
Results. First, the overall results in terms of proportion correct (P-values) were
compared. Figure 6.9 shows the P-values of item Version 1 (original reports) versus item
Version 2 (redesigned reports) and their 95% CI.
Figure 6.9. P-values of the items in the original and redesigned versions of the reports.
Figure 6.9 shows that three items had significantly larger P-values in the redesigned
version of the report, suggesting that they are easier. These items concerned the alternative
pupil report and the group report. However, two items were significantly more difficult in the
redesigned version; these involved the ability growth report.
Using CTT, the differences in interpretation accuracy amongst various user groups
were examined. Table 6.2 shows the results. The results of Q1 and Q2 V2 suggest that
internal support teachers and principals interpret the score reports significantly more correctly
than teachers do. The results of Q2 V1 are in line with those of Q2 V2, but the interpretation
accuracy amongst principals is also higher than those of internal support teachers. However,
this result is unreliable, given that only three principals took Q2 V1.
A 2PL One Parameter Logistic Model (OPLM, 2009; Verhelst & Glas, 1995) was
fitted to the data of Q2 (R1c = 82.9, df = 85, p = .54), and the respondents‘ abilities were
estimated. The results of the subsequent analysis in SAUL suggest that internal support
teachers and principals have significantly higher abilities to interpret the reports from the
Computer Program LOVS than those of teachers (ES = 0.9).
An analysis in which the items from Q1 were treated as identical to those in Q2 V1
and Q2 V2—in fact, they were, except for the different image of the report—revealed DIF in
three items. This is partially consistent with the results from the CTT analysis. One item
belonging to the alternative pupil report was easier, and two items belonging to the ability
growth report were harder.
Furthermore, the relationship between various background variables and interpretation
ability was examined using SAUL. This computer program allows the structural analysis of a
univariate latent variable for complete cases only. Since the information about the various
background variables was incomplete for some participants, the analysis had to be carried out
separately for the background variables of interest; consequently, the sample sizes differed.
The effect sizes (ES) reported in the results of the analyses were computed using Equation 1.
There was a significant difference in terms of the number of respondents in each user
group who had received training in the use of the Computer Program LOVS in the last five
years, F(2, 70) = 7.47, p = .001. In the teachers‘ group, only 19% had received training,
compared to the internal support teachers (58%) and the majority of the school principals
(73%). However, respondents who had received training in the use of the Computer Program
LOVS did not significantly perform better in interpreting the redesigned reports (z = 0.56, p =
.809, ES = 0.14, n = 73).