Working on educational improvement with test data

15 Sep Working on educational improvement with test data

Posted at 13:01h in Assessment, Background comparative judgement, Cases by Tine Stoop

0 Likes

Roos Van Gasse, co-founder of Comproved, is a researcher and guest professor at the Faculty of Training and Education Sciences at the University of Antwerp. She is also team leader School Feedback at the Support Center for Central Testing in Education. We sat down with her to discuss school feedback and the use of test data for educational improvement.

From where did your passion for education come?

“My original plan was to teach. But during teacher training, I felt very much that I wanted to know more about education. I then went on to study training and educational sciences. There, while writing my master’s thesis, I became interested in research and student orientation.”

“After my studies, I first did some policy-oriented research at the university. Then I joined D-PAC, the research project from which Comproved came. There I did research on what teachers can learn from their students’ test performance to improve their teaching practices.”

What are you working on at the moment?

“At the moment I am doing research for the Support Center for Central Testing in Education. Because the quality of education in Flanders is declining, the minister has decided that from this school year, central tests will be organized throughout the region to better monitor the quality of our education.”

“To shape these tests, an interuniversity support center has been established. I have been appointed team leader of the section on school feedback. We investigate what data from test results we can use for school feedback and we design dashboards to feed that feedback back to the schools. We are also doing user monitoring and finally we are looking at how we can professionalize schools to work with that feedback.”

Will comparative judgement play a role in those centralized tests?

“The idea is that everything should be scored automatically. But automatic scoring of a competency like writing skills has never worked well for anyone. So with Comproved we are running pilots to collect a lot of human assessments to see if the algorithm can learn to make comparisons. So in that sense, comparative judgement does get used. But I assume that eventually they’re going to want to use comparative judgement with humans as well (laughs).”

As a teacher, you have a lot of data at your disposal, but there is no use in trying to do something with everything.

As a teacher, how can you get started with test data yourself?

“The important thing is that you actively seek feedback. As a teacher you have a lot of data at your disposal, but there is no use in trying to do something with everything. To deal qualitatively with that data, you have to start from your objectives. Evaluation is then a touchstone to see if you have achieved your goals with your students. And if not, you start looking for causes and solutions. That’s the idea behind using data for educational improvement.”

Can you give an example?

“Suppose a 2nd grade teacher is teaching his students the table of 7. To see if his students have automated the times table, he takes a test. So he wants to find out if he has achieved his goal of ‘my students know the table of 7’.”

“However, there is a student who makes mistakes. The teacher then looks for the cause of those errors. Are there errors in the automation, is the student having a bad day, or is he not motivated? The teacher can gather additional data by, for example, sitting next to the student in question and having a chat or asking for additional exercises.”

“That way, the teacher learns about his students and his class. He may come across problems he was not aware of before. Then he starts looking for a solution. For example, he may seek advice from colleagues or consult the literature. In this way, the teacher takes steps in his own professionalization.”

Can you use Comproved to collect data for educational improvement?

“Yes, definitely. In higher education, for example, Comproved is widely used for peer assessments. Such peer assessments, in which students assess each other, are a useful means of teaching students what a particular competency entails and instilling in them a sense of quality. Moreover, they provide a lot of useful information.”

What kind of information is that exactly?

“With a peer assessment in Comproved, students are presented with the works of their peers in pairs. For each pair, they have to indicate which work they think best meets the competency being assessed. Based on all those comparisons they make, the comparing tool generates a ranking of the works.”

“That ranking is a source of information. It allows you to find out which works students rate as good and which they don’t. Do they attach importance to the same aspects as you do or not at all? And to what extent does the class do that as a whole? Perhaps they arrive at a very consistent ranking among themselves, with high reliability, but it differs completely from yours. Then you know you need to work with the whole group on that sense of quality. You could then discuss the ranking in class and explain why a particular work is or is not a good representation of the competence.”

Interesting! Is there any other information you can get from Comproved?

“Definitely! Other interesting data that Comproved provides is misfits. A misfit marks an assessor, so in the case of a peer assessment, a student, who chooses very differently from their peers. You can then verify if those misfits are students who are not yet fully on board with the subject matter. Or the opposite, that the group is not following and that you have some students who are already further along. With those insights, you can work again during class. You could have the students work in groups and give the stronger students a certain role to take their peers in tow.”

It is precisely because you get a wealth of information with limited input that Comproved is such a fun tool.

We’re on a roll, what else?

“(laughs) You can also look at the time students spent on their comparisons. For example, did a misfit take much less time on his comparisons than his peers? That doesn’t immediately mean he doesn’t understand the material well, but that he may have been working nochalantly.”

“Precisely because you get a wealth of information with limited input, Comproved is such a nice tool. Take class tests for example: you put a lot of work into developing them, taking them, assessing them, and at the end you have a grade. With Comproved, you put some time into designing the task and into the assessment, which is also a learning moment at that point, by the way, but what you get in return is huge. So you have information about reliability, comparisons, time per comparison, misfits, position in the rank order, possibly relative to benchmarks. So it’s a relatively small investment that you have to make in exchange for all that data that you can use to improve your teaching practice.”

What about summative assessments in Comproved?

“In a summative assessment, the degree of reliability of the ranking is very important. After all, big decisions are based on summative tests. With other assessment methods, you usually have no insight into the quality of the test and its assessment. We know from research that with a criterion list assessment, there are up to 50% assessor differences. That’s a lot of uncertainty for a summative assessment. The advantage of Comproved is that you can say something about a grade with some certainty.”

“Fortunately, many teachers are aware of that uncertainty. They know that a test is a snapshot and that they have to consider the full picture of a student to make a final judgment. Yet we find that grades are still very important in summative decisions.”

Are you inspired to also get started with test data for educational improvement? Wondering how Comproved can support you in this? Contact us for an informal conversation!

Tags:

#comparative judgement, assessment tool, comparative judgement, comparative judgement, educational improvement, pairwise comparisons, peer assessment, peer assessment, test data