Misfit: statistics explained

Misfit: statistics explained

Assessing with Comproved provides a lot of useful information. Often this information is self-explanatory, such as a product’s place in the ranking or the time it took an assessor to make a comparison. However, some aspects of the comparing tool are rather technical and thus require a bit more explanation. In this article, we talk about the misfit.

Comparative judgement relies on a shared understanding of what is good. In educational contexts it is often about whether student works meet the quality criteria. Based on this conception of what is ‘a good product’, assessors then choose between two products. Which product is better in the light of a particular competence they want to measure? The result of those judgements is a ranking. To arrive at that ranking, there must be a high degree of mutual agreement among assessors on what ‘a good product’ is.

Now, each assessor will deviate from this estimated model (the ranking) to a greater or lesser extent. A misfit reflects such a deviation from the model. On the one hand, we can calculate to what extent the assessors’ choices are in line with the estimated ranking. We call this the assessor’s (mis)fit. On the other hand, we can calculate the measure of agreement within the group of assessors with respect to a given product. We call that the (mis)fit of the product.

Assessor misfit

The rank order is a translation of the probability that a higher placed product has of being chosen as better by assessors in a comparison with the other product in that comparison. If the quality of the products is far apart, all assessors are likely to choose the same product as better. So the probability of them making the same decision is high.

If an assessor repeatedly makes a different decision than one would expect according to the model, that assessor’s score on the misfit scale drops. Those who deviate significantly more from the average are marked as misfit.

What can you do when an assessor is marked as misfit?
  • You can look at the time spent by the assessor on the comparisons. If the choices are made very quickly, or much faster compared to the other assessors in the assessment, this may be an indication of sloppy work.
  • You can look at the feedback or arguments entered by the assessor in the comparisons. Are these arguments valid or not? Perhaps the assessor has good reasons for the choices he or she made and the other assessors can learn something.
  • If there are multiple misfits, you might want to check whether the assessors do not consist of two groups, e.g. expert vs novice, or certain ‘movements’.

Product misfit

A misfit can also be calculated at the product level. In that case, when there is a significant deviation from the average, it means that there is less agreement within the group of assessors about that particular product.

Specifically, there are assessors who indicate a certain product as better in a comparison while other assessors would indicate that product (or a similar product) as less good in a similar comparison.

What can you do when a product is marked as misfit?
  • Again, you can look at the feedback provided by the assessors for the concerning comparisons.
  • You can analyse the work itself to see if it might contain misleading information. E.g. a very different layout, a lot of jargon, …
  • You can put the findings from 1 and 2 together and discuss them with the assessors involved. That way, they gain insight into each other’s motivations which can be very enlightening. If necessary, the grade given to the product in question can be corrected manually.
Share this