د/ايمان زغلول قاسم

استاذ تكنولوجيا التعليم المشارك بكلية التربية بالزلفي

book B23

completing the items from the total time spent on the assessment for learning. Besides, the
questionnaire contained questions about whether students read the feedback.
Checking functionality of the instruments. A pilot test was performed with a small
group of students (N = 8) enrolled in exactly the same study programme as the participants of
the experiment but at a different location. The aim of the pilot test was to investigate if the
instruments functioned as intended. Students were asked to provide feedback on problems or
mistakes in the assessments.
Additionally, all parts of the assessment were evaluated several times before the
assessment was administered at different locations. Some adaptations were made in the
assessments and questionnaire after performing the pilot tests; for example, the instruction on
the screen was adapted. Furthermore, students reported they did not like the fact that it was
not possible to navigate through the items in the assessment for learning with immediate
feedback. This implied that students had to start with item one, then move on to item two, etc.
We were aware of this disadvantage; however, due to software limitations, we could not
change this. In order to keep the conditions within the three groups as identical as possible, it
was also decided to not let the other groups navigate in the assessment for learning.
Quality of the assessments. The quality of the assessments was investigated applying
Classical Test Theory (CTT) and Item Response Theory (IRT). The software packages
TiaPlus (TiaPlus, 2009) and the One Parameter Logistic Model (OPLM) (Verhelst, Glas, &
Verstralen, 1995) were used for analysing the data from both a CTT and an IRT perspective.
The assessment for learning was judged to be of sufficient quality based on the quality
indicators provided by TiaPlus. For this assessment, Cronbach‘s alpha α = .85. The
summative assessment was judged to be of insufficient quality. For this assessment, α = .40.
This value is considered too low, which means the assessment was not reliable. In order to
measure the underlying constructs of the assessments, a factor analysis was performed. The
result showed that the assessment for learning measured one factor but that the summative
assessment measured more than one factor. This meant the summative assessment measured
constructs other than the assessment for learning. Therefore, the summative assessment was
not a suitable instrument for measuring the learning gains of students within the different
groups. In order to overcome this problem, a selection of items was made for the summative
assessment based on the factor analysis. These items measured the same construct as the items
in the assessment for learning. The number of remaining items was 11. These 11 items
together had a reliability of α = .66, which means that removing the other items led to an
increase in the reliability of the summative assessment. However, the reliability was still low.
From the assessment for learning, one item was removed because it was too easy. Using IRT,
the ability of the students (θ) was estimated (R1c = 98.242; df = 78).
There appeared to be no difference between the initial ability of the students in the
three groups. Besides the differences in reliability between the two assessments, the results of
the CTT and IRT analyses also showed that the summative assessment was more difficult
than the assessment for learning.

الوقت من ذهب

اذكر الله


المصحف الالكتروني