Bills Sands – judging in ‘real time’

Respected expert Dr. William Sands responds to Andrew Thorton’s conclusion that at the execution score of the best gymnasts in the world has been in decline since the start of the open code. And, partly as a result, that execution scores of the top group of gymnasts in the world are ‘boxed’ between 8.5 and 9.0.

While the trends appear obvious, be aware that there is a fundamental flaw in evaluating judges – the lack of a “gold-standard” with which to compare. People have tried for decades to establish an approach to evaluate judges. Gymnastics judging immediately faces four paired problems: reliability and validity – and – cheating and incompetence. Some have argued that the actual score doesn’t matter much as long as the judges get the athletes in the right order (a sort of validity). Others have studied scores to see if the judges agree among themselves (inter-rater reliability). I would argue that the first is the most important, that the right athlete wins. The second, inter-rater reliability, is also important because when judges agree there is a tacit understanding that they’re seeing the same thing.

Unfortunately, what you often find is that people can rationalize athlete placements just about anyway they want. Both the advocate of a score and the detractor’s arguments rapidly collapse to circularity because neither has a gold-standard and must rely largely on opinion. The problem of inter-rater reliability is the very old game of “stay in range.” I think judging rules also have to combat the second pair of problems: cheating and incompetence. Fundamentally, these are unlikely to be changed by statistics.

If judges are supposed to be basically like court stenographers, then I would suggest that the “stream” of deductions and other information be recorded in terms of time on their computers. Clearly the technology now exists for this, and has existed for some time. Moreover, the tenets of systematic observation have been well known in scientific circles for decades. In this way, judges not only have to stay in range with their total score, they also have to be deducting for the same things (as seen in their time-based stream of data). I think editing should be allowed, in other words, the judge should be able to go back in the stream of his/her writing and add-in or change things that he/she couldn’t write fast enough or he/she simply changed his/her mind at the end. The original data stream is always preserved however, so that any changes are recorded as changes. However, once completed by the judge and scores are “locked” then the overall analysis of judges can proceed in several dimensions simultaneously and could serve as both an evaluative and educational tool for judges. For fans, the stream could be displayed in real time and finally reduce some of the mystery of “how on earth did they get that score” for fans.

I realize that to think this is likely to be implemented is naive. However, if you check the judging literature, you’ll rapidly find that these same issues have plagued judging since at least the 1951 (in my personal library), and probably before that.

All the best.

Wm A. Sands, PhD, FACSM, C-ARS, NR/WEMT

Bill’s now Director at the Monfort Family Human Performance Research Laboratory in Colorado.

On Vault I think we should judge exactly like they do in Diving. No paperwork. One number for execution is flashed immediately and transparently in real time.

On the other apparatus it might be possible to use an iPad or keyboard to record in real time. I recall an American MAG judge who used a keyboard to both record skills and execution on Horizontal Bar. It looked very accurate to me.

Bill’s citations on the gymnastics judging problem(s) are posted in the comments.

10 comments ↓

#1 coach Rick on 11.26.10 at 11:14 am

From Bill Sands:

Here are some citations if anyone is interested in seeing how others have attacked the gymnastics judging problem(s).

1. Ansorge, C. J. and Scheer, J. K. International bias detected in judging gymnastic competition at the 1984 Olympic Games. Research Quarterly for Exercise and Sport. 1988; 59(2):103-107.
2. Ansorge, C. J.; Scheer, J. K.; Laub, J., and Howard, J. Bias in judging women’s gymnastics induced by expectations of within-team order. The Research Quarterly. 1978; 49(4):399-405.
3. Arkes, H. R. Costs and benefits of judgment errors: implications for debiasing. Psychological Bulletin. 1991; 110(3):486-498.
4. Bard, C.; Fleury, M.; Carriere, L., and Halle, M. Analysis of gymnastics judge’s visual search. Research Quarterly for Exercise and Sport. 1980; 51(2):267-273.
5. Bartczak, G. M. The use of decision rules in evaluation of gymnastic exercises. Biology of Sport. 1988; 5 (1):65-78.
6. Bastuscheck, C. and Wettstone, G. A test for determining bias in judging. Aronson, R. M. The art and science of judging men’s gymnastics. Lowell, MA: Lowell Technological Institute; 1970; pp. 58-61.
7. Boen, F.; van Hoye, K.; Auweele, Y. V.; Feys, J., and Smits, T. Open feedback in gymnastics judge causes bias based on informational influencing. Journal of Sports Science. 2008; 26(6):621-628.
8. Borysowicz, M. A. Stress in the lives of high-level women’s gymnastics judges. Journal of Sport & Exercise Psychology. 1995; 17:S18.
9. Calkin, G. F. The future of computers in evaluating judges. Aronson, R. M. The art and science of judging men’s gymnastics. Lowell, MA: Lowell Technological Institute; 1970; pp. 98-99.
10. Cardinali, J. The gymnastics judge’s clinic. Aronson, R. M. The art and science of judging men’s gymnastics. Lowell, MA: Lowell Technological Institute; 1970; pp. 28-31.
11. Claessens, A. L.; Lefevre, J.; Beunen, G., and Malina, R. M. The contribution of anthropometric characteristics to performance scores in elite female gymnasts. Journal of Sports Medicine and Physical Fitness. 1999; 39(4)355-360.
12. Consonni, W. Nuovi orizzonti per le parallele asimmetriche. Gymnica. 1990 Oct; 22-5.
13. Davis, B. A. How to handle a conference after a meet. Aronson, R. M. The art and science of judging men’s gymnastics. Lowell, MA: Lowell Technological Institute; 1970; pp. 12-13.
14. Donovan, J. Karolyi questions fairness of judges. The Cincinnati Post. 1991 Jun 8; 1B, 5B.
15. Engeler, A. Fall from grace. Rolling Stone. 1988 Sep 22; (535):95-99.
16. Faulkner, J. and Loken, N. Objectivity of judging at the National Collegiate Athletic Association gymnastic meet: A ten-year follow-up study. The Research Quarterly. 1962; 33(3):485-486.
17. Fay, J. There he goes a’ Karolyi-ng. The Cincinnati Enquirer. 1991 Jun 8; B1, B6.
18. Flatten, E. K. A study of the relationship between amplitude scored by gymnastic judges and measured by cinematographic techniques: Indiana University; 1974 Doctorate.
19. Franks, I. M. The effects of experience on the detection and location of performance differences in a gymnastic technique. Research Quarterly for Exercise and Sport. 1993; 64(2):227-231.
20. George, G. S. Execution – the ultimate in performance. Aronson, R. M. The art and science of judging men’s gymnastics. Lowell, MA: Lowell Technological Institute; 1970; pp. 66-67.
21. Godbout, P.; Fink, H.; Lascari, A.; Maz‚as, H., and Wilson, V. E. Issues in the judging of gymnastics: A panel. Salmela, J. H. The advanced study of gymnastics. Springfield, IL: Charles C. Thomas; 1976; pp. 167-182.
22. Hanscom, R. The beginning judge. Aronson, R. M. The art and science of judging men’s gymnastics. Lowell, MA: Lowell Technological Institute; 1970; pp. 6-7.
23. Hauser, M. Behind-the-scenes power struggle hurting women’s gymnastics? The Houston Post. 1988 Jul 10; 2C.
24. Holmes, B. Judging horizontal bar. Aronson, R. M. The art and science of judging men’s gymnastics. Lowell, MA: Lowell Technological Institute; 1970; p. 47.
25. Hudson, M. A. Ex-U.S. gymnastics coach admits fix. Los Angeles Times. 1988 Apr 28; 1,13.
26. —. Gymnastics unit to move on cheating. Los Angeles Times. 1988 May 4; 3,10.
27. —. World gymnastics officials say score fixing is hard to control. Los Angeles Times. 1988 Apr 29; 1,8.
28. Hunsicker, P. and Loken, N. The objectivity of judging at the National Collegiate Athletic Association gymnastic meet. The Research Quarterly. 1951; 22423-426.
29. Huval, L. J. Poor judging fosters poor gymnastics. Aronson, R. M. The art and science of judging men’s gymnastics. Lowell, MA: Lowell Technological Institute; 1970; pp. 1-3.
30. International Gymnastics Federation. 1997 – 2000 Code of Points Women’s Artistic Gymnastics. Indianapolis, IN: International Gymnastics Federation; 1997.
31. Johnson, M. Objectivity of judging at the National Collegiate Athletic Association gymnastic meet: A twenty-year follow-up study. The Research Quarterly. 1971; 42454-455.
32. Kjeldsen, E. A coach’s view on the compilation of scores during competition. Aronson, R. M. The art and science of judging men’s gymnastics. Lowell, MA: Lowell Technological Institute; 1970; pp. 52-55.
33. Leskosek, B.; Cuk, I.; Karacsony, I.; Pajek, J., and Bucar, M. Reliability and validity of judging in men’s artistic gymnastics at the 2009 University Games. Science of Gymnastics. 2010; 2(1):25-34.
34. Maloney, T. Form and technique and how it applies to leniency. Aronson, R. M. The art and science of judging men’s gymnastics. Lowell, MA: Lowell Technological Institute; 1970; pp. 105-106.
35. Massimo, J. What to look for and how to judge rings. Aronson, R. M. The art and science of judging men’s gymnastics. Lowell, MA: Lowell Technological Institute; 1970; pp. 40-42.
36. Massimo, J. L. The role of psychological factors in the evaluation process. Aronson, R. M. The art and science of judging men’s gymnastics. Lowell, MA: Lowell Technological Institute; 1970; pp. 92-94.
37. National Women’s Program Committee . 2010-2011 Women’s Program Rules and Policies. Indianapolis, IN: USA Gymnastics; 2009.
38. Osborn Bowers, C.; Klein Fie, J., and Bodo Schmid, A. Judging and coaching women’s gymnastics. Palo Alto, CA: Mayfield; 1981.
39. Plessner, H. Expectation biases in gymnastics judging. Journal of Sport & Exercise Psychology. 1999; 21:131-144.
40. Potts, K. Mental gymnastics. The NCAA News. 2002; 39(3):1, 18.
41. Robinson, J. Marsden tells of score-fixing at world gym meet. Deseret News. 1988 Apr 28; D1.
42. Sands, W. A. and Kipp, R. W. Gymnastics judging and the assessment of objectivity. Technique. 1992; 12(9)17-22.
43. Sands, W. A. and McNeal, J. R. Limits to performance Women’s NCAA Championships. Technique. 2000; 20(7)5-7.
44. Scheer, J. K. Effect of placement in the order of competition on scores of Nebraska high school students. The Research Quarterly. 1973; 44(1):70-85.
45. Scheer, J. K. and Ansorge, C. J. Influence due to expectations of judges: a function of internal-external locus of control. Journal of Sport Psychology. 1979; 1:53-58.
46. Scheer, J. K.; Ansorge, C. J., and Howard, J. Judging bias induced by viewing contrived videotapes: a function of selected psychological variables. Journal of Sport Psychology. 1983; 5:427-437.
47. Science News – Behavior. Memories trip up gymnastics scores. Science News. 1991; 139(10):159.
48. Ste-Marie, D. M. Expertise in women’s gymnastic judging: an observational approach. Perceptual and Motor Skills. 2000; 90:543-546.
49. Ste-Marie, D. M. International bias in gymnastic judging: conscious or unconscious influences? Perceptual and Motor Skills. 1996; 83:963-975.
50. Ste-Marie, D. M.; Valiquette, S. M., and Taylor, G. Memory-influenced biases in gymnastic judging occur across different prior processing conditions. Research Quarterly for Exercise and Sport. 2001; 72(4):420-426.
51. Ste-Marie, D. M.; Valiquette, S. M., and Taylor, G. Memory-influenced biases in gymnastics judging occur across different prior processing conditions. Research Quarterly for Exercise and Sport. 2001; 72(4):420-426.
52. Steeves, F. J. Discussion of parts of no value. Aronson, R. M. The art and science of judging men’s gymnastics. Lowell, MA: Lowell Technological Institute; 1970; pp. 86-88.
53. Stephenson, D. A. and Jackson, A. S. The effects of training and position on judge’s ratings of a gymnastic event. The Research Quarterly. 1977; 48(1):177-180.
54. Tonry, D. Judging the side horse event. Aronson, R. M. The art and science of judging men’s gymnastics. Lowell, MA: Lowell Technological Institute; 1970; pp. 35-37.
55. Uram, P. and McKinnis, D. How can a judge dictate the style of gymnastics? Aronson, R. M. The art and science of judging men’s gymnastics. Lowell, MA: Lowell Technological Institute; 1970; pp. 18-21.
56. Valiquette, S. and Downey, P. Effects of the performer’s body shape on gymnastics judging. Research Quarterly for Exercise and Sport. 1998; 69(1):A-117.
57. Weber, C. Mounts and dismounts, a commensurate part of an exercise. Aronson, R. M. The art and science of judging men’s gymnastics. Lowell, MA: Lowell Technological Institute; 1970; pp. 71-72.
58. Weiss, R. W. Two objective methods of rating gymnastics judges: University of Utah; 1979Doctorate.
59. Wells, F. A. Judging the compulsory exercise. Aronson, R. M. The art and science of judging men’s gymnastics. Lowell, MA: Lowell Technological Institute; 1970; pp. 109-110.
60. Wilson, V. E. Judging gymnastic judging. Salmela, J. H. The advanced study of gymnastics. Springfield, IL: Charles C. Thomas; 1976; pp. 151-166.
61. —. Objectivity, validity, and reliability of gymnastic judging. The Research Quarterly. 1976; 47(2):169-173.62. Wright, J. To hold or not to hold. Aronson, R. M. The art and science of judging men’s gymnastics. Lowell, MA: Lowell Technological Institute; 1970; pp. 78-83.

#2 Anon on 11.26.10 at 1:35 pm

Why not use the same system that Figure skating uses? They have slow-mo reply and can watch each element being performed if they need to. Its a computerized system where they give each marks as they’re being performed but can go back and check jumps and what not. It still has its flaws but its working better than the system gymnastics has in place right now.

#3 Bill on 11.26.10 at 2:03 pm

Actually, I think we’re saying the same thing. While working with Figure Skating, they’re not exactly thrilled with their system either, but your proposal is sound, and I picture (someday) a gymnastics performance occurring with the “crawler” below the screen showing the values and deductions as the performance proceeds. A replay could then be included that would include a video of the gymnast’s routine and the judges’ impressions below. A sort of tally could then be kept alongside the video to indicate how the score “developed.” I know it’s very unlikely that such a system would ever be implemented for a whole bunch of reasons, but from my view – what would be the harm?

#4 bc on 11.26.10 at 2:09 pm

I dont like the idea of slow-motion replay. Slow motion distorts technique. I think the problem with the vault finals was that there was some very obvious execution problems (in real time that werent deducted for). I thought Andy’s comparison of the E scores was very illustrative of the problem. The fact that their execution scores were even remotely close was kind of worrisome.

#5 coach Rick on 11.26.10 at 2:12 pm

One of the biggest complaints I heard in Rotterdam was that video replay was being used too often in the WAG meet.

Only Nellie Kim gets to see the video. She then phones down to the Head Judge.

It’s difficult to argue when she has the video. And you don’t.

How do we let all judges watch the video replay without slowing down the meet even worse?

… I think it should be used RARELY. After a written protest. Or in case of no compromise in the panel.

#6 PolyisTCOandbanned on 11.27.10 at 1:41 am

I think the rules are that it’s not used for execution. No?

#7 Anna on 11.27.10 at 6:27 am

I’ve always thought in trampoline, each execution deduction should be immediately entered into a numberpad/computer system. There would only be a need for six fingers (possibly one hand records deductions 0.1 – 0.5, and the other hand uses space bar for 0.0).

Trampoline judging lends itself to an incredible amount of statistical study. I believe it is possible to set the “most accurate” deduction for a skill, and then check the standard deviation of judges’ scores against in, when determining and looking for quality and consistency in judging.

#8 TheHeatherSlayer on 11.27.10 at 6:13 pm

I don’t think that there is a problem. Our sport went through how many decades till we saw a 10.0? Why should we expect athletes to be scoring so high in execution with this new system that has only been in place for 4 years? Gymnasts and coaches are just starting to understand it.

I think Todd is just using this topic as an agenda to attack open scoring.

#9 PolyisTCOandbanned on 11.27.10 at 7:24 pm

I think in the NFL, they have a process of reviewing official’s calls and giving them feedback (internally). It’s not expected that with all the rules and the complicated, fast play that officials will find every foul or be correct on every call. But there is still a feedback mechanism. And they can tell which officials are better than others. The system in general makes them all elevate their game.

#10 shergymrag on 11.27.10 at 8:23 pm

“I picture (someday) a gymnastics performance occurring with the “crawler” below the screen showing the values and deductions as the performance proceeds.”

I’ve wished this was how things were done myself.

“But there is still a feedback mechanism. And they can tell which officials are better than others.”

That’s great.

Leave a Comment