Respected expert Dr. William Sands responds to Andrew Thorton’s conclusion that at the execution score of the best gymnasts in the world has been in decline since the start of the open code. And, partly as a result, that execution scores of the top group of gymnasts in the world are ‘boxed’ between 8.5 and 9.0.
While the trends appear obvious, be aware that there is a fundamental flaw in evaluating judges – the lack of a “gold-standard” with which to compare. People have tried for decades to establish an approach to evaluate judges. Gymnastics judging immediately faces four paired problems: reliability and validity – and – cheating and incompetence. Some have argued that the actual score doesn’t matter much as long as the judges get the athletes in the right order (a sort of validity). Others have studied scores to see if the judges agree among themselves (inter-rater reliability). I would argue that the first is the most important, that the right athlete wins. The second, inter-rater reliability, is also important because when judges agree there is a tacit understanding that they’re seeing the same thing.
Unfortunately, what you often find is that people can rationalize athlete placements just about anyway they want. Both the advocate of a score and the detractor’s arguments rapidly collapse to circularity because neither has a gold-standard and must rely largely on opinion. The problem of inter-rater reliability is the very old game of “stay in range.” I think judging rules also have to combat the second pair of problems: cheating and incompetence. Fundamentally, these are unlikely to be changed by statistics.
If judges are supposed to be basically like court stenographers, then I would suggest that the “stream” of deductions and other information be recorded in terms of time on their computers. Clearly the technology now exists for this, and has existed for some time. Moreover, the tenets of systematic observation have been well known in scientific circles for decades. In this way, judges not only have to stay in range with their total score, they also have to be deducting for the same things (as seen in their time-based stream of data). I think editing should be allowed, in other words, the judge should be able to go back in the stream of his/her writing and add-in or change things that he/she couldn’t write fast enough or he/she simply changed his/her mind at the end. The original data stream is always preserved however, so that any changes are recorded as changes. However, once completed by the judge and scores are “locked” then the overall analysis of judges can proceed in several dimensions simultaneously and could serve as both an evaluative and educational tool for judges. For fans, the stream could be displayed in real time and finally reduce some of the mystery of “how on earth did they get that score” for fans.
I realize that to think this is likely to be implemented is naive. However, if you check the judging literature, you’ll rapidly find that these same issues have plagued judging since at least the 1951 (in my personal library), and probably before that.
All the best.
Wm A. Sands, PhD, FACSM, C-ARS, NR/WEMT
Bill’s now Director at the Monfort Family Human Performance Research Laboratory in Colorado.
On Vault I think we should judge exactly like they do in Diving. No paperwork. One number for execution is flashed immediately and transparently in real time.
On the other apparatus it might be possible to use an iPad or keyboard to record in real time. I recall an American MAG judge who used a keyboard to both record skills and execution on Horizontal Bar. It looked very accurate to me.
Bill’s citations on the gymnastics judging problem(s) are posted in the comments.