gymnastics – 8.5 to 9.0 regardless

Andy Thorton has been holding fire to the feet of FIG judges better than anyone else.

He’s just posted some disturbing statistics on Execution score trends at the the international level. Here are a few of the highlight quotes from his analysis:

scores seem to be “trapped” between an 8.5 and a 9.0 regardless of the performance …

With the exception of men’s vault, it would appear from the numbers that gymnasts in general are anywhere from three to seven tenths sloppier today than they were in 2006. Is this a fact, or a function of something else going on? …

Judging in general has become much more harsh, much more unreasonable …

Men’s high bar judging has perhaps become the most outrageous and unpredictable; sometimes the cleanest routine receives an 8.7 and sometimes the sloppiest routine receives an 8.9, but the rule is no one gets above a 9.0. I miss the days even four years ago when 9.5’s and 9.6’s were given to clean routines …

As we strive for a resurgence of artistry, stricter rules are not the solution; in fact, they’re part of the problem. Today’s execution standards have not created less subjectivity in our sport; they’ve created MORE subjectivity …

I miss the days when judges felt free to throw out a 9.8, a 9.9, or even a 10.0 when a gymnast was magnificent …

Me too.

See the stats and Andy’s very logical argument on American Gymnast - A fascinating look at scoring trends

He doesn’t mention this time, but has in the past, that one of the main causes of “boxed scores” (lowest Execution score too close to highest) is that judges fear being out-of-range.

It used to be that Women’s Gymnastics was far more guilty. (They’ve always listed many subjective deductions that are near impossible to evaluate consistently.) But what’s going on in Men’s Gymnastics? … Andy’s stats show the MAG execution scores dropping even more precipitously than WAG.

Detail, for example, the 1.0 in deductions on this Pommel bronze medal routine at Worlds:

results (PDF)

I have no confidence that either FIG Technical Committee has the leadership to fix boxed scores. That means, in most cases, the highest difficulty score will continue to win. There’s no incentive to try to improve execution.

Next? … I’d love to see Andy or THE ALL AROUND do a more detailed statistical analysis on this trend.

30 comments ↓

#1 Ono No Komachi on 11.25.10 at 8:47 am

Was the trend statistically signifigant?

#2 PolyisTCOandbanned on 11.25.10 at 9:07 am

I already told you guys the answer.
Judges need to say what they took off for specifically. If the scores are justified, fine. If not, then sunshine will fix that. Of course there will be some errors, still, but this will improve the bulk of them.

#3 Ono No Komachi on 11.25.10 at 9:12 am

Not like it matters. This isn’t cancer research. I’m just curious.

On the other hand, it’s possible the early E scores were higher because they were given when judges lacked experience with the system, and as judges gained more expertise, the scoring has become more accurate.

That Zou Kai Olympic FX routine got a 9.35 E score.

Kosmidis got a 9.1 in Rotterdam.

Well, actually all that shows is something is wrong somewhere. Neither one of those seems right.

#4 valentin on 11.25.10 at 9:37 am

Can someone explain to me how he got 8.96 in execution? I really don’t get it at ALL.
Berkei hit touched the pommel on his Spindle Magyar, and Louis even though done well, he was not as dynamic, or extended. He most definitely did not deserve to beat Prashanth.

#5 coach Rick on 11.25.10 at 9:38 am

No. But you know it’s true.

What you and I don’t know is … why?

Why do judges score a 9.6 pommel routine 9.0?

#6 coach Rick on 11.25.10 at 9:40 am

I looked very closely at all three before putting up the BEST routine.

One thing I like about Smith is that he never brushes the leather with his legs. Not once. But in the final he and Berki both had small form breaks. Like 0.5 form breaks back when I was a judge.

#7 George N on 11.25.10 at 2:59 pm

Agreed Rick. The deductions seem unjustified and we were both FIG judges with a critical eye. I was under the impression that the reason for moving to the new code was so that the gymnasts doing the harder skills were actually rewarded for their efforts (the scores used to be boxed all around 9.7 to 10 back then anyway, no matter which code you used it seems) but lately it seems that the difficulty aspect truly is the only thing the athletes are getting rewarded for.

Perhaps we can therefore simplify the code even further by just dropping the execution portion altogether ;-)

#8 Blair Lowe on 11.25.10 at 5:44 pm

Wouldn’t it be nice if judges actually took for execution errors, especially in college meets?

#9 Jaaa on 11.25.10 at 6:13 pm

The women ALL have horrible form. Even Nastia had bad form. The gymnasts now if they competed in the 80′s and 90′s would have been crusified with their leg separations and bent knee’s.

Nastia had leg separations and feet separations on every turn on bars. Her giants should be a huge deduction in itself. She deserved those execution scores and so does everyone these days.

These scores are correct for the woemn.

#10 PolyisTCOandbanned on 11.25.10 at 7:15 pm

This sort of criticism would be much more useful, if you actually took a routine and cited what you would take off for. specifically. Just saying “it looks good” is moronic. It’s just general impressionism.

One of the things I liked about Nastiafan is he would actually score a routine. Only then can we discuss where judges are being too picky or lenient, where “benefit of the doubt to the gymnast” should rule etc. But as is now, I don’t REALLY know if big guys like you Rick, even know what the deductions in the code are.

At least having that discussion with a specific routine and scoring it, would give us some insights. Maybe the effort backs up your point of view. Or maybe we learn some new deductions and become more educated fans.

And am really getting sick of Andy’s analysis where he pulls boners like talking about 0.2 off or the like.

#11 CoachReavis on 11.25.10 at 11:28 pm

Interesting solution would be real time scoring. Execution errors run 0.1, 0.3, 0.5 and 1.0. Judges use a four button remote. Judge presses appropriate deduction button when enfraction is made. Multiple judges deducting at same time average deduction is taken for each skill. D panels job is to input what skills are being done.  If nothing else it would make for a great graphic on the bottom of your television. You would see the difficulty build and exicution go down. It would make the sport much more friendly to people that don’t spend their time on gymnastics blogs.

#12 CoachReavis on 11.25.10 at 11:32 pm

Execution. Excuse me.

#13 A J on 11.26.10 at 4:54 am

Were changes in the code considered in Thornton’s analysis. For example – falls going from .80 in 2006 to now being 1.00?

#14 JO on 11.26.10 at 9:32 am

I know we all talk about the 90′s and 80′s like it was perfect from an execution stand point but that means we are ignoring many errors. The cowboying was all over, leaps were often not even close to 180, and you could take steps and still get a 10.0.

#15 coach Rick on 11.26.10 at 10:32 am

Diving judges use a system like that. A hand held remote.

The score is up before the gymnast emerges from the water. Nice.

… Of course they input only the FINAL score.

I saw an American judge use a laptop keyboard to both record skills and deductions in real time. He could only do it on H Bar he told me.

#16 Bills Sands – judging in ‘real time’ — Gymnastics Coaching.com on 11.26.10 at 11:07 am

[...] expert Dr. William Sands responds to Andrew Thorton’s conclusion that at the final execution score of the best gymnasts in the world has been in decline since the [...]

#17 judge&coach on 11.26.10 at 11:07 am

A) In any other kind of contest or scoring system, 8 or 9 out of 10 would be considered a good number! Why have scores like 9.8 when there are many obvious little deductions AND there are 97 other scores below 9.8, 805 of which were never used?

B) Andy Thorton’s analysis is awful for so many reasons.

First, he looks only at the top execution marks without looking at the bottom ones, too. The scores are way more spread out now, and that’s great. It helps us better distinguish great routines from good routines from mediocre all the way down to awful, ugly routines.

Second, he even admits one of the reasons for worse execution scores – gymnasts are trying harder skills (perhaps harder than they ought to in many cases), which leads to more deductions. The other two main reasons are: the errors were always there — we just ignored them. and by making medium deductions 0.3 and large deductions 0.5, we’re hitting gymnasts harder for more serious errors, thereby encouraging better execution.

Then he bemoans high bar judging, where “clean” routines can score 8.7. We all know that “clean” does not always mean great technique. The high bar mystery is not a tough one to solve: pirouettes past handstand. Gymnasts and coaches just don’t seem to understand the fact that finishing turns past 30° is a deduction, and that they’re doing so almost every time they do a full pirouette or more! Most scores would rise if gymnasts either removed or fixed their Rybalkos and (endo) Healys to el-grip. Very few guys are doing these skills properly, and they SHOULD be penalized.

Finally, There is a much wider range of scores than Thorton acknowledges. And there ARE scores above 9 — quite a few at this Worlds, in fact, not even including vault.

C) Sellathurai’s routine is wonderful, but he does a scissor to handstand that stalls on the way up, and come on guys, as former FIG judges, I am sure you see all the skewing errors. He’s got about -0.7 for that alone—and only 0.1 at a time! Skewing is probably one of the most common deductions on PH right now, just like being past handstand on high bar pirouettes, not opening up out of roll-out skills on FX, and catching without control or opening up on double flips on PBars.

D) In short, racking up the difficulty is NOT the best way to get higher scores. Fixing the execution is, and that WILL bring more artist routines. Is it any surprise that the most balanced, artistic gymnasts on the floor won the All-Arounds this year? The judges DO reward routines with great execution. They WANT to, and you all know that. (If Zhang Hongtao’s routine from last year were judged today, he’d still get roughly a 9.6.) The problem is that most gymnasts are racking up difficulty IN SPITE of their execution.

Athletes and coaches need to figure out that the place to get more points is the execution score. When compared to the values of the skills, is -0.3 a bit harsh for some of these errors? Maybe – THAT would be a good place to focus the debate. But we have to call a space a spade and deduct the errors SOMEHOW, so that guys have INCENTIVE to learn how to do these skills properly. Once they do, we’ll see some of the scores go up, and the most outstanding gymnasts will be the ones with the best combination of difficulty, form, and technique.

#18 coach Rick on 11.26.10 at 11:13 am

There are a few experts like yourself who defend what’s happening at the international level. Want me to post your comments as I did Bill Sands?

I talked to one of the P Bar judges in Rotterdam. He told me that apparatus went perfectly, so far as he was concerned. No bias. No cheating. Good judging.

But overall I agree with Andy more than you. Taking 0.7 for skew on that routine is far too much. He’s quite square.

On the women’s side I guarantee that it’s fear of being out of range. I don’t know what’s going on in MAG. Horizontal Bar is a disaster in every way. Andy’s right.

#19 PolyisTCOandbanned on 11.26.10 at 1:13 pm

Again, your comments are qualitative. At least go through the routine and micro-analyze. You and Andy are really trying to be general impressionists!

I’ll be the first to be on the lookout for judges cheating (the sport’s history is SHAMEFUL) or using general impressionism. But your criticism is not even well described. The dude above has detail at least!

#20 judge&coach on 11.26.10 at 2:13 pm

Thanks for the venue, but I’d rather not put my name on this. I have athletes that I don’t want hit because other judges don’t like my opinions.

Not to feed the trolls, but TCO has a valid concern: your statements are too qualitative. For example, when you say, “taking 0.7 for skew on that routine is far too much,” I wonder why–do you disagree that there are 7 circles that are slightly skewed? Or do you think that even if there were 7 slightly skewed circles, that 0.7 is still too much?

#21 denn333 on 11.26.10 at 2:52 pm

A friend of mine directed me to this blog post, which is very interesting. I am somewhere in the middle of the debate. I think that lowering and spreading out scores is fine. I also think that because the errors ARE there, you have to deduct, because if you don’t, you won’t get better performances under an a-la-carte judging paradigm. But I agree with Andy Thorton and Rick McCharles that scoring doesn’t totally make sense, and it’s frustrating. I think I have a strong sense of why, but I don’t now how to fix it.

I think that what really is bothering Rick and Andy is that the meaning of the scores has changed, and moreover, changed in a way that takes away meaning from the final numbers. We used to know what scores MEANT. For example, 9.4 represented a good, decently-composed routine with several small errors, or maybe a couple small ones and a couple more blatant ones. This meaning was not described in any of the codes, but it was widely understood. Starting in 2006, the meaning of a 9.4 changed changed — it could now represent just a couple of medium errors. And in 2007, it changed more when deductions of 1/2/4/8 were changed to 1/3/5/10. (Did I remember that correctly? I am losing track nowadays!) And now, judges are explicitly being directed to be even harsher: to take more deductions and to take more of the medium and large deductions. Now, 9.4 could represent a perfect, virtuosic routine with a pirouette that was a little off axis and went too far past handstand.

I think that Andy is right that judging is more subjective now. If a gymnast catches a release on high bar with quite-bent arms, do you deduct 0.1 or 0.3 or 0.5? It was otherwise a great release and the gymnast corrected the giant that followed. The decision in the past was “Should I deduct or not?” The decision now is “How do the rules tell me to deduct, and do I really want to follow those rules?” And I promise, that is what judges are asking themselves at most meets. Talk to any judge: most of them know they are being asked to take 0.5 off for certain errors, but almost none of them want to do it. It just feels wrong.

But WHY?!!! I think it’s because 0.5 already has two meanings. It’s the amount we USED to deduct for a fall. But that’s something we can move on from. 0.5 is also the amount your routine goes up by when you add an F skill (compared to an A skill that is already there). F skills are REALLY FREAKING hard!!! Deducting 0.5 feels like a LOT!!! This open code system would have worked much better with a clearer, more thoughtful analysis of what 0.1, 0.2, 0.3, etc. represent in terms of both difficulty and execution – and the resulting equivalences between the two. 0.1 is the difference between a double back and a full-in on most events. It’s also the same as slight skew in Sellathurai’s pommel horse routine.

I’m not sure, but I think we’d see a lot fewer gripes if there were a 0.2 deduction available — so 0.1 (slight error), 0.2 (___ error), 0.3 (large error), with 0.5 being reserved for the REALLY LARGE errors.

I also think that a skill should be deducted only twice – once for form and once for technique and completion. From my point of view, judges have a HARD time finding four errors in a single skill, but they can look at a skill and deduct a holistic amount between 1 and 10 very intuitively. (I sat by an experienced judge recently who did just that!) That would go well with Bill Sands proposal to use a touch screen. The left side would be a list of numbers 1 to 10 for execution. The right side would be numbers for technique and completion.

#22 coach Rick on 11.26.10 at 3:05 pm

I was an FIG judge for about 16 years. We talked a LOT about skew.

I’ve not been a judge since 2004 but since then I’ve heard a few (different) interpretations of skew. One from a member of the MTC who said there should be no deduction for skew on long travels. That it’s a technique.

0.7 on that routine would be 0.5 more than I’ve ever heard of in the past. A completely new interpretation.

That’s insanely unfair for a very slight “error”. Or a non error.

But the criticism is fair. I’m not an active international judge right now. The interpretation could have changed. I’ll check with some judges and post on skew.

… Seems Andy’s main point has been sidetracked with this discussion, though. He’s still right that winners are now determined by difficulty score, not execution. To say otherwise is lying. And the coaches know it.

#23 coach Rick on 11.26.10 at 3:48 pm

I emailed some judges. One who was in Rotterdam, but not on Pommels, says:

Prashanth Sellathurai’s pommel routine, I don’t see 0.7 in skew – not even close.

From this angle, it is hard to see more than 0.2

#24 PolyisTCOandbanned on 11.27.10 at 1:39 am

Miss me, Denn?

#25 Ono No Komachi on 11.27.10 at 9:57 am

The argument that winners are now determined mainly by D scores and not execution because E scoring is harsher now than it was a few years ago (and I’m not sure it that’s the one really being made by Thornton) is not logical.

Athletes are ranked by the difference in their scores.

In a field where every single man got a 10 E score, the end result would be determined entirely by the D score.

If people want to see a bunch of E scores over 9, watch the US Men’s NCAA. Those judges aren’t afraid of throwing out 9.5 E scores.

#26 denn333 on 11.27.10 at 10:30 am

I have been planning on doing a statistical analysis of scores from Worlds scores this week. Rick, I will forward it to you when it’s done. I strongly suspect that Andy is wrong, and that E-scores actually account for rankings more than D-scores. But we’ll see what is revealed by the data.

For Sellathurai, using the rules, I can ALMOST see how he got a 9.0. I tried to judge it in real time, without pausing and rewatching. Here’s what I got:

-0.1 for stopping upward movement on scissor to hs
-0.1 for using strength on it, too
(maybe one of these “should” be -0.3???)
-0.1 skew on loop in saddle after R1080
-0.1 rhythm on the spindle (nice save)
-0.2 for skew on 2 loops on end after the R1080
-0.1 for skew on Magyar (the saddle loop)
-0.1 for landing legs apart
So I get 9.2 instead of 9.0. I guess I’d be in range, though.

There’s a new directive to deduct leg cuts for not having large split, so maybe they hit him for 0.1 there? Hmm… And I see 0.4 in skew. Maybe one of the deductions should be 0.3? It seems harsh to deduct more than 0.1 for skew, but that’s what the rulebook says. Ugh.

Wasn’t Steve Butcher apparatus supervisor for PH this year? Maybe someone can ask him, off the record.

#27 coach Rick on 11.27.10 at 10:36 am

Looking forward to your analysis. Certainly the judges are not idiots. They’ve been instructed to judge that harshly.

#28 PolyisTCOandbanned on 11.27.10 at 7:28 pm

I’m just a fan, but the legs being apart as he does the piro for dismount seems like it should be a deduction. I don’t know if it’s in the book, but it’s unattractive and also probably gives the gymnast a mechanical advantage.

#29 shergymrag on 11.27.10 at 8:38 pm

“As we strive for a resurgence of artistry, stricter rules are not the solution; in fact, they’re part of the problem. Today’s execution standards have not created less subjectivity in our sport; they’ve created MORE subjectivity …”

More rules doesn’t create more subjectivity. How can you be more subjective about a landing deduction, for example, when the rules define what is a .1, .3, or .5 deduction? The only problem with the FIG focusing on creating more rules, is that they would really get better results if they simply focused on less incompetence, cheating, and general blindness.

#30 valentin on 12.01.10 at 11:01 pm

I also think that 0.7 on skew for Sellathurai is far to much. From the angle like the quote from the judge that Rick posted, i can’t see much of a skew (beyong the 15deg allowable).

I have to say i can accept your deductions den333, however when you compare that level of strictness (which i am totally fine with), then Berki and Louis should have been killed!, for lack of extension of circle to say the least. (or is that a global deduction?, if it is it should not be).

Leave a Comment