د/ايمان زغلول قاسم

استاذ تكنولوجيا التعليم المشارك بكلية التربية بالزلفي

book B90

6.7. Conclusion and Discussion
The results from previous research suggest that users of the Computer Program LOVS
struggle in interpreting the reports that feed back results on student learning (Meijer, Ledoux,
& Elshof, 2011; Van der Kleij & Eggen, 2013). This study examined whether the
interpretability of the reports from this Computer Program LOVS could be improved. Using a
design research approach (McKenney & Reeves, 2012), the reports were redesigned in
various cycles of design, evaluation, and revision.
A questionnaire was administered to assess the users‘ interpretations of the redesigned
reports. No clear differences were found between the original and the redesigned versions of
the reports, in terms of the users‘ interpretation accuracy. Only three items were significantly
easier in the redesigned version, which related to the report on ability growth and group
report. The relevant design principles were those of appropriate knowledge, salience,
discriminability, and perceptual organisation. Surprisingly, two items were found to be more
difficult in the revised version. This result might be explained by a change in the scoring
procedures for these items. In the report‘s original version, only growth in ability was shown,
whereas in the revised version, the level indicators were also displayed. Nevertheless, the
report‘s primary purpose still focused on growth in ability. However, when the respondents
did not indicate that the redesigned report showed information on the level of pupils, it was
scored as incorrect. It must also be mentioned that due to the relatively small sample sizes, the
confidence intervals were large, making it improbable for statistical significance to be
reached. Additionally, some changes made in the reports were only minor, which makes it
unlikely for large effects to be found in terms of interpretation accuracy.
Besides, given the current lack of comparable studies in the literature, it is difficult to
establish expectations about interpretation accuracy on the redesigned versions of reports.
Also, the reports from the Computer Program LOVS have been in use for years, which makes
it challenging to assess user interpretation without it being confounded by prior experience.
Nevertheless, this study‘s results showed no differences in interpretation accuracy between
less experienced and more experienced users. In line with the findings from previous research
Chapter 6
148
(Staman et al., 2013; Van der Kleij & Eggen, 2013), the interpretation accuracy amongst
internal support teachers and principals was higher than that of teachers. Although the
proportion of teachers who had received training in the use of the Computer Program LOVS
was larger than that in the Van der Kleij and Eggen (2013) study, the results suggest that the
teachers show the highest need for professional development in assessment literacy.
Although no clear effects were found in terms of users‘ interpretation accuracy, the
results regarding users‘ perceptions of the redesigned reports in both the focus group meetings
and key informant consultations were positive. Users, especially internal support teachers,
were particularly positive about the changes made to the report on ability growth. This report
was adapted using the principles of salience, discriminability, and perceptual organisation.
Furthermore, the principles of relevance and appropriate knowledge seemed relevant to
improving interpretability as perceived by users. The meetings resulted in valuable advice for
upgrading the design solutions. Eventually, a final set of design solutions was proposed.
In the present study, design principles from the literature were applied throughout the
design process and found to be very helpful in directing the latter. Nevertheless, these
principles left room for various design solutions and possible variations in graphic design.
Therefore, we argue that it is an absolute necessity to involve experts and current/future users
in the design process to ensure the validity of the reports.
Although the researchers advocate the careful design of score reports in collaboration
with the users, it is evident that well-crafted reports can only partially compensate for the lack
of basic knowledge (Hambleton & Slater, 1997). The aspects that initially caused many
misinterpretations also posed the most difficulty in achieving a satisfactory design solution.
These mainly concerned the more complex issues that required statistical knowledge. An
example of a problem that could not be solved by redesigning the reports is the score interval
issue. The concept of score intervals seemed complicated to many users. Not surprisingly,
only a few actually use them (Van der Kleij & Eggen, 2013). These findings are consistent
with previous research results (Hambleton & Slater, 1997; Zenisky & Hambleton, 2012),
which indicated that confidence levels are often ignored by users who do not perceive them as
valuable. However, the Standards (AERA et al., 1999) do prescribe that confidence levels be
reported. Nonetheless, when addressing the principle of appropriate knowledge in this
situation, it is clear that not all users are familiar with confidence levels on test scores, not
even the more able and experienced users. Thus, the value of reporting score intervals is
questionable when they are neither understood nor used in the way the test developer intended
(Van der Kleij & Eggen, 2013).
A limitation of this study involves its small sample sizes. However, given the nature of
the study, the researchers preferred depth over breadth of information. Thus, it was deemed
more useful to receive detailed feedback from a small group of users over several versions of
the design solution, than general feedback from a large group. Close collaboration with the
experts was maintained throughout the design process to ensure the design‘s suitability to the
entire field of users. However, the present study only made it possible to predict the
effectiveness of the redesigned reports to a limited extent. The actual effectiveness in terms of
interpretation accuracy will become apparent when the redesigned reports are fully
implemented. Furthermore, the design solutions employed in this study will have to be
consistently applied to the redesign of the other reports in the Computer Program LOVS. It is
Towards Valid Score Reports in the Computer Program LOVS: A Redesign Study
149
also not guaranteed that all the design solutions proposed in this study will be technically
feasible in the way the researchers advised. Aside from the report‘s development process,
design and format, and contents, the ancillary materials and dissemination efforts (Zenisky &
Hambleton, 2012) should also be aligned with the intended uses of the score reports. The
ancillary materials could possibly reinforce support in the report aspects that caused users to
struggle.
Although the reports from the Computer Program LOVS had been in use for years,
clearly, not all users can interpret them correctly. There might be more pupil-monitoring
systems being used whose reports are not understood well and are in need of redesign.
Moreover, we propose that the thoughtful design of score reports in collaboration with users
be an integral element of the general assessment design.
Future research is needed to clarify the extent of educators‘ professional development
needs in terms of the correct interpretation of score reports and test results in general.
Specifically, an accurate data interpretation about student learning is a necessary precondition
for successfully implementing DDDM, since it concerns the first step in the evaluative cycle.
Nevertheless, research suggests that the subsequent steps, such as making decisions on how to
adapt instruction, might even be more challenging (Chahine, 2013; Hattie & Brown, 2008;
Heritage, Kim, Vendlinski, & Herman, 2009; Timperley, 2009). These skills have been called
―pedagogic data literacy‖ by Mandinach and Jackson (2012). Several researchers have
recently addressed the necessity of pedagogical content knowledge as a precondition for
effectively acting on assessment results (Bennett, 2011; Heritage et al., 2009; Timperley,
2009). Although a correct interpretation of assessment results is a prerequisite, it provides no
guarantees for their adequate use. Further research in this area is therefore warranted.
Finally, despite the growing body of research on effective score reporting (Zenisky &
Hambleton, 2012), little effort has focused on the users‘ actual interpretations of reports.
These types of studies are essential to ensure that the users interpret the test results as the test
developer intended, a necessary move towards valid reporting practices.

الوقت من ذهب

اذكر الله


المصحف الالكتروني