Background

As a team, we have accumulated considerable expertise in comparative judgement over the years. We have gathered the insights from research and real-life cases to make the background of comparative judgement and Comproved clear for you.

For those who like to deep dive into the scientific research, we also have collected all the publications that have appeared in recent years.

Frequently asked questions

What is comparative judgement?

Comparative judgement = making comparisons. And you do so almost without thinking about it. For example, it is a hopeless task to estimate at what altitude an overflying plane is flying. But if two planes fly over, you can quickly see which of the two is the highest in the sky. So the principle of comparative judgement is: you don’t judge pieces of work as separate products but in relation to each other. Out of two tasks, you choose the best one each time. By comparing all products in pairs this way, you arrive at a ranking from “least good” to “best”.

Comparative judgement reflects the consensus among assessors. And it does so without lengthy discussions, but simply through the statistical model underlying the method (Bradley-Terry-Luce). That calculates a quality scale from inferior to superior quality. It is also quickly clear which evaluators deviate from the consensus, i.e. who often chooses differently, and about which products the opinions are strongly divided. This provides useful information that can be explored further.

Comparative judgement takes a holistic approach: the task is assessed as a whole. This is in contrast to analytical methods that usually work with criteria lists: a piece of work is analyzed in detail by looking at partial aspects and partial competencies.

What are the main benefits of comparative judgement?

Reliable results

Research has shown that comparative judgement leads to very reliable results. This is because comparative judgement is cognitively a simpler task than, say, grading using a criteria list. When you, as a teacher, compare tasks in pairs, you will – thanks to your expertise – be able to answer effortlessly which piece of work is the better of the two.

Because comparing is easier, you will also make more consistent decisions. Therefore, the same piece of work will stand out in a comparison every time, regardless of the time of day or what tasks you’ve seen before. So as an evaluator, you can be pretty confident in your judgment.

High validity

Validity has to do with actually assessing the competency that needs to be assessed. However, the problem with complex competencies is that they usually cannot be captured in too tight a framework. Research finds that with analytical assessment methods, there is a lot of overlap between the aspects that need to be distinguished. Moreover, you neglect the bigger picture by zooming in on the parts. All this makes assessment difficult as well as inefficient.

By assessing in a more holistic way, which is the case with comparative judgement, assessors automatically take more criteria into account. Even those criteria that are not made explicit in criteria lists or rubrics, but that are nonetheless relevant. Moreover, by assessing together with some colleagues and combining different perspectives, the different (minor) aspects of a competency are better scrutinized. Thus, you assess the competency as a whole.

Efficient

Another advantage of comparative judgement is that it saves you a certain amount of time. Don’t get me wrong, grading is a time-consuming job anyway, no matter which way you approach it. But, if you think that comparative judgement gives you twice the amount of work because you’re grading in pairs, you’re not quite right either. For pairs with a clear difference in quality, it is quickly obvious which product is the best. If the products are of similar quality, then logically making a decision takes more time. But the overall assessment time will usually be no longer than with a criteria list assessment.

The biggest time savings in comparative judgement? No criteria lists need to be developed, validated and calibrated. Comparative judgement relies on the expertise of assessors, which has proven to be very reliable. Moreover, it works intuitively. Assessors do not need to be trained to learn to look at (the same) aspects.

More learning opportunities for students
However you use the method, it provides great learning opportunities for students. For example, after all the products have been compared, you could have students look at the rank order of the works. This will help them better estimate where they themselves stand. After all, they get a chance to look at better and less good examples of the task and consequently they can discover why those differ from their own work in competency level.

When you use the method for peer assessment, even more learning opportunities are added. For example, (anonymously) assessing peer work is a learning opportunity in itself. Indeed, by weighing the works in a comparison, students learn bottom-up to recognize key aspects in quality tasks. By explicitly naming these in the feedback they give their fellow students, they activate this knowledge in themselves as well, which (hopefully) benefits their follow-up works.

Can you use comparative judgement formatively as well?

Comparative judgement can also be used formatively.

Do you assess with fellow teachers? Then first of all, you produce learning opportunities through the rich feedback. Moreover, because students can see the ranking of the works, they can better estimate where they themselves stand. They get a chance to look at better and less good examples of the task and in that way discover why those differ from their own work in competency level. These are two clear learning opportunities.

Are you using the method for peer assessment? Then even more learning opportunities are added. For example, (anonymous) peer assessment of work is a learning opportunity in itself. After all, by weighing the works in a comparison, students learn bottom-up to recognize the most important aspects in quality tasks. By explicitly naming these in the feedback they give their fellow students, they activate this knowledge in themselves as well, which (hopefully) benefits their follow-up works.

What is Comproved and how does it work?

Comproved is a spin-off that grew out of a research project at the University of Antwerp, University of Ghent and imec. The central question of that research project was: what is the added value of comparative judgement for the evaluation of complex competencies? Comproved grew out of this project.

Comproved wants to help teachers and assessors assess fairly and qualitatively by providing them with knowledge and tools. We have developed a digital tool that supports comparative judgement. If you have to rank, say, 50 or 100 tasks from ‘least good’ to ‘best’, it is not practically feasible to compile pairs yourself at random. The comparing tool automates this process and makes it possible to quickly and reliably compare pairs of products in an online environment.

How does Comproved work?

  • Students upload their work into the digital tool, completely anonymously.
  • The assessor is then presented with a series of pairs composed at random and chooses the best piece of work of the two each time. Each product is compared the same number of times: for a new pair, the algorithm always selects the product that was compared the least.
  • For even more certainty about which piece of work is the best, the tool uses multiple assessors. Each assessor gets to see the same products, but in different combinations.
  • In the end, the tool brings all this input together. The result: a quality scale that ranks the products.
What are the main advantages of Comproved?
  • The tool is user-friendly. Students upload their own work, then the tool constructs pairs and sends them out to the assessors. The tool calculates rankings and provides feedback to the students.
  • As an instructor, Comproved allows you to set up and manage assessments quickly and easily. Moreover, there are plenty of options to customize the assessment to your liking: how many comparisons do you want each assessor to make and what feedback do you request (with or without criteria)?
  • The tool gives teachers more guidance by basing scores on multiple judgements and combining the expertise of multiple assessors.
  • Guidance also come in the form of a team of experts. The Comproved-Academy shares its knowledge, insights and know-how through publications, presentations and customized advice. Teachers and educational teams who want to get started with the tool can count on substantive training and guidance. The Comproved Academy conducts webinars and workshops, both knowledge- and practice-oriented.
  • Also helpful: the comparing tool is available “stand alone” in the cloud environment, but can also be perfectly integrated into Learning Management Systems (LMS), such as Blackboard, Canvas or Brightspace.
What types of assessments can Comproved be used for?
  • Formative assessment focuses on the learning process and on the strengths and areas for improvement. Developmental feedback is especially important here. The comparing tool has a feature that enables comprehensive feedback. Students also learn a lot from giving feedback on each other’s work. The tool is therefore particularly suitable for peer assessments (set up in no time!), where students assess and provide feedback on each other’s products.
  • In summative assessment, where you determine whether a student is achieving the expected level of performance, the tool helps make fair, objective and qualitative judgements. The results are valid and more reliable than common methods because multiple assessors provide multiple perspectives and each product is compared multiple times with other products. Students can review their own work, feedback and other works afterwards.
  • Live judging is done, for example, with physical products that cannot be digitized, such as an installation or a model. Each piece of work is assigned a code. Using Comproved, the jury is guided through the products “in real life” and must compare two products each time, indicate the best product and provide feedback. The results reflect the consensus of the judges.
  • With the comparing tool, there is no need for the assessors to be physically together or assess at the same time. They can assess remotely: wherever and whenever it suits each of them, completely independent of time and space. No need to consider distances or agendas.
  • The tool can be used in any context where complex assessments need to occur and selections need to be made. Consider, for example, the evaluation of grant and project proposals. In personnel recruitment, in turn, the tool can help to quickly and efficiently retain the best candidates from a bunch of resumes. Even when thinking about the vision or mission of a company or department, it can be a useful tool: what themes do we as a group consider most important?
achtergrond

Does it all remain a bit abstract? Read the stories of our users!

Learn more?

When is comparative judgement useful?
How many assessors do you need?
Does comparative judgement take longer?
Can you use comparative judgement for group works?

Read all the questions and answers

Scientific publications

Comproved is an evidence-based tool based on nearly 10 years of scientific research on comparative judgement. We have collected all scientific publications and highlighted the most recent articles.

Comparative approaches to the assessment of writing: Reliability and validity of benchmark rating and comparative judgement

Renske Bouwer, Marije Lesterhuis, Fien De Smedt, Hilde Van Keer & Sven De Maeyer (2023)

There are currently two ways to comparatively assess writing assignments: with the use of so-called benchmarks (anchor texts) or by comparative judgement. In the first method, you compare each assignment to anchor texts that exemplify a particular level. The anchor text also has a description that indicates why the text is better or worse than the next anchor text. Comparative judgement involves comparing texts only and not using anchor texts.

Both approaches provided consistent ratings, according to this study, but it seems as if using anchor texts caused raters to more often choose the rating in the middle of the scale and less the very good or very bad ratings. The suggestion is made that perhaps a combination of both comparative judgement methods should be used.

Read the entire article here.

Peer overmarking and insufficient diagnosticity: the impact of the rating method for peer assessment

Florence van Meenen, Liesje Coertjens, Marie-Claire Van Nes & Franck Verschuren (2022)

In this study, peers assess each other’s work both in the analytical way (with a criteria list or rubric) and by comparative judgement. When the students’ assessments are compared with those of the teachers, the analytical assessment method reveals only a slight similarity between their assessments. Students do not recognize substandard essays. When comparative judgement is used, however, the agreement between the student and teacher assessments is acceptable. Students now do recognize the essays of substandard quality. The results show that comparative judgement results in better assessment in this case.

Read the entire article here.

All publications

Books

  • Bouwer, R., Goossens, M., Mortier, A. V., Lesterhuis, M., & De Maeyer, S. (2018). Een comparatieve aanpak voor peer assessment: Leren door te vergelijken. In D. Sluijsmans & M. Segers (Eds.), Toetsrevolutie: Naar een feedbackcultuur in het hoger onderwijs (p. 92-106). Culemborg, NL: Phronese.
  • Deneire, A., De Groof, J., Coertjens, L., Donche, V., Vanhoof, J., & Van Petegem, P., & De Maeyer, S. (2022). De kwaliteit van grootschalige ‘performance assessments’ gewikt en gewogen. Antwerpen: Edubron. Link
  • Settembri P., Van Gasse R., Coertjens L., De Maeyer S. (2018) Oranges and Apples? Using Comparative Judgement for Reliable Briefing Paper Assessment in Simulation Games. In: Bursens P., Donche V., Gijbels D., Spooren P. (eds), Simulations of Decision-Making as Active Learning Tools. Professional and Practice-based Learning, vol 22. Springer, Cham. Link

Scientific articles

  • Bouwer, R., Lesterhuis, M., Bonne, P., & De Maeyer, S. (2018, October). Applying criteria to examples or learning by comparison: Effects on students’ evaluative judgment and performance in writing. In Frontiers in Education(Vol. 3, p. 86). Link
  • Bouwer, R., Lesterhuis, M., De Smedt, F., Van Keer, H., & De Maeyer, S. (2023). Comparative approaches to the assessment of writing: Reliability and validity of benchmark rating and comparative judgement. Journal of Writing Research. Link
  • Coenen, T., Coertjens, L., Vlerick, P., Lesterhuis, M, Mortier, A. V., Donche, V., Ballon, P., & De Maeyer, S. (2018). An information system design theory for the comparative judgement of competences. European Journal of Information Systems, 27(2), 248-261. Link 
  • Coertjens, L., Lesterhuis, M., Verhavert, S., Van Gasse, R., & De Maeyer, S. (2017). Teksten beoordelen met criterialijsten of via paarsgewijze vergelijking: een afweging van betrouwbaarheid en tijdsinvestering. Pedagogische Studiën, 94(4), 283–303. Link
  • Coertjens, L., Lesterhuis, M., De Winter, B. Y., Goossens, M., De Maeyer, S., & Michels, N. R. (2021). Improving Self-Reflection Assessment Practices: Comparative Judgment as an Alternative to Rubrics. Teaching and Learning in Medicine33(5), 525-535. Link
  • Goossens, M., & De Maeyer, S. (2018). How to obtain efficient high reliabilities in assessing texts: rubrics vs comparative judgement. Proceedings of Communications in Computer and Information Science. Berlin: Springer-Verlag. Link
  • Lesterhuis, M. (2018). When teachers compare argumentative texts: Decisions informed by multiple complex aspects of text quality. Educational Studies in Language and Literature18(1). Link
  • Lesterhuis, M., Verhavert, S., Coertjens, L., Donche, V., & De Maeyer, S. (2017). Comparative judgement as a promising alternative to score competences. In Innovative practices for higher education assessment and measurement(pp. 119-138). IGI Global. Link
  • Lesterhuis, M., Donche, V., De Maeyer, S., Van Daal, T., Van Gasse, R., Coertjens, L., … & Van Petegem, P. (2015). Competenties kwaliteitsvol beoordelen: brengt een comparatieve aanpak soelaas? Tijdschrift voor hoger onderwijs ISSN 0168-1095 – 33:2(2015), p. 55-67. Link
  • Mortier, A., Brouwer, R., Coertjens, L., Volckaert, E., Vrijdags, A., Van Gasse, R., … & De Maeyer, S. (2019). De comparatieve beoordelingsmethode voor een betrouwbare en valide cv-screening: een vergelijking tussen experts en studenten. Gedrag & Organisatie32(2). Link
  • Mortier, A. V., Lesterhuis, M., Vlerick, P., & De Maeyer, S. (2015). Comparative judgement within online assessment: Exploring students feedback reactions. Proceedings of Communications in Computer and Information Science 571, 69-79Link
  • Van Daal, T., Lesterhuis, M., Coertjens, L., van de Kamp, M.-T., Donche, V., & De Maeyer, S. (2017). The Complexity of Assessing Student Work Using Comparative judgement: The Moderating Role of Decision Accuracy. Frontiers in Education, 2, 1–14. Link
  • van Daal, T., Lesterhuis, M., Coertjens, L., Donche, V., & De Maeyer, S. (2019). Validity of comparative judgement to assess academic writing: Examining implications of its holistic character and building on a shared consensus. Assessment in Education: Principles, Policy & Practice26(1), 59-74. Link
  • Van Meenen, F., Coertjens, L., Van Nes, MC., & Verschuren, F. (2022). Peer overmarking and insufficient diagnosticity: the impact of the rating method for peer assessment. Advances in Health Science Education 27, 1049–1066. Link
  • Gasse, R. V., Mortier, A., Goossens, M., Vanhoof, J., Petegem, P. V., Vlerick, P., & Maeyer, S. D. (2016, October). Feedback opportunities of comparative judgement: An overview of possible features and acceptance at different user levels. In International Computer Assisted Assessment Conference(pp. 23-38). Springer, Cham. Link
  • Van Gasse, R., Bouwer, R., Goossens, M., & De Maeyer, S. (2017). Competenties kwaliteitsvol beoordelen met D-PAC. Examens: Tijdschrift voor de Toetspraktijk, 1(1), 11-17.
  • Van Gasse, R., Lesterhuis, M., Verhavert, S., Bouwer, R., Vanhoof, J., Van Petegem, P., & De Maeyer, S. (2019). Encouraging professional learning communities to increase the shared consensus in writing assessments: The added value of comparative judgement. Journal of Professional Capital and Community. Link
  • Verhavert, S., Bouwer, R., Donche, V., & De Maeyer, S. (2019). A meta-analysis on the reliability of comparative judgement. Assessment in Education: Principles, policy & practice26(5), 541-562. Link
  • Verhavert, S., De Maeyer, S., Donche, V., & Coertjens, L. (2017). Scale separation reliability: what does it mean in the context of comparative judgement? Applied Psychological Measurement, 9, 1-18. Link
  • Verhavert, S., Furlong, A., & Bouwer, R. (2022). The accuracy and efficiency of a reference-based adaptive selection algorithm for comparative judgement. In Frontiers in Education(p. 553). Link

Presentations, dissertations

  • Coertjens, L., Lesterhuis, M., De Winter, B., De Maeyer, S., & Michels, N. (2017). Assessing self-reflections in medical education using Comparative Judgement. In European Association for Research in Learning and Instruction (EARLI).
  • De Smedt, F., Lesterhuis, M., Bouwer, R., De Maeyer, S., & Van Keer, H. (2017). Het beoordelen van teksten: de beoordelingsschaal aan de hand van ankerteksten en de paarsgewijze vergelijking. In Onderwijs Research Dagen 2017.
  • Lesterhuis, M., Mortier, A., Donche, V., Coertjens, L., Vlerick, P., & De Maeyer, S. (2016). Feedback op schrijven: wat heeft de comparatieve methode te bieden?. In Onderwijs Research Dagen.
  • Mortier, A., Lesterhuis, M., Vlerick, P., & De Maeyer, S. (2015). Comparative judgment within online assessment. In (Digitaal) toetsen en leren integreren. Presented at the (Digitaal) toetsen en leren integreren, Utrecht, The Netherlands
  • Verhavert, S. (2018). Beyond a mere rank order: The method, the reliability and the efficiency of comparative judgment (Doctoral dissertation, University of Antwerp).

Master’s theses

  • De Kinder, T. (2016). Generaliseerbaarheid van performance assessment met behulp van paarsgewijze vergelijking (Master’s thesis). University of Antwerp, Belgium.
  • Maquet, T. (2018). Beoordelingsprocessen van experten en novieten bij comparatief beoordelen van schrijfopdrachten (Master’s thesis). University of Antwerp, Belgium.