What exactly is “statistical significance”? What does it mean if one’s school’s progress score is described as “significantly” above or below national average?
The first thing to understand is that, in the world of statistics (unlike in everyday parlance) “significantly” does not mean “very”.
Significantly ≠ Very
“Significantly below average” does not mean “very much below average”. What it really means is “we can be (reasonably) confident that this result is below average”. (But it might be only a little bit below.)
So how does it work? How is this level of confidence produced?
It’s based on what statisticians call the “95% confidence interval”. If you want to know the formula for calculating the confidence interval, you will find it in the DfE technical guidance documents (KS2 and KS4) - but for the majority of users of school data it is enough to know the basic rule that the size of the confidence interval depends upon the number of pieces of data in the dataset (i.e. number of pupil results).
The more pupils you have, the smaller the confidence interval (i.e. the more confident you can be in the conclusion).
In the example pictured below, taken from Compare School Performance, the school has a progress score of 3.6, but the full range of the confidence interval is from 0.9 to 6.3. (Based on the number of pupils in this school, the confidence interval is +/- 2.7.)
Ofsted’s Inspection Data Summary Report shows the confidence interval in a more graphical way, as the horizontal blue line running through the blue dot (which is the progress score of 3.6).
The 95% confidence interval tells us we can be 95% confident that the "true value" of the progress score for this school lies somewhere in the range of 0.9 to 6.3. We have no idea whereabouts in that range the true score lies. (The published figure of 3.6 is simply the mid-point of that range, but the “true value” could be anywhere within the range.)
Why “95%” confident?
Let’s consider it this way. Think of your school, and all the pupils within it, as a great big experiment - like a die-rolling or coin-tossing probability experiment. When conducting any experiment, in order to verify the result, we always want to repeat the experiment multiple times to see if we get the same outcome.
So imagine 20 parallel universes. In each of these universes exists your school, with the same pupils in it, sitting the same tests, having received exactly the same teaching. But of course there are all sorts of other factors, beyond our control, that affect how the pupils perform in the tests in these different universes - factors such as what mood each pupil is in, how much sleep they had, whether they had breakfast, whether they can remember that thing you taught them several months ago, whether they are reading the questions carefully or rushing and making mistakes etc etc… So, in each of the 20 universes, the pupils will make slightly different amounts of progress, and hence the school’s progress score will vary across the 20 parallel universes.
As a fraction, 95% is equivalent to 19/20. So, the 95% confidence interval means that in 19 cases out of 20, you would end up with a progress score within that same range. So in the example above, 19 times out of 20 the progress score would lie somewhere between 0.9 and 6.3. (But in the remaining case, it could be outside of that range - above or below.)
Because, in this example, the entire range of the confidence interval is above the national average score (zero) than means we can say with 95% confidence that progress in this school is above national average. In 20 repetitions of this experiment, on 19 occasions the progress score would still come out as being above average. It could be as low as 0.9, or as high as 6.3 - but it will definitely be above zero. So ‘significance’ is not an indicator of how much better than average, just an indicator of confidence that it is above average.
Broadly speaking, there are only three sensible conclusions you can draw about school progress scores - they are either:
- statistically significantly above average
- statistically significantly below average
- not statistically significantly different to average
- well above average (and sig+)
- above average (and sig+)
- below average (and sig-)
- well below average (and sig-)
Any school progress score that is not statistically significantly different to average will be given the middle of those five judgements: approximately 63% of schools in 2017 KS2 data and about 40% of schools in 2017 KS4 Progress 8 data.
For schools with smaller cohorts (and for larger schools focusing on smaller groups of pupils e.g. disadvantaged) there is a greater likelihood of finding yourself in the ‘average’ category, because the 95% confidence interval will be wider - even though the value of your progress score could be quite high or quite low - because of the wider confidence interval.
A future blog post will explore further the issues for small schools.
If you’re interested to see how confidence intervals vary with number of pupils, this spreadsheet shows the value of the confidence intervals (in 2017, KS2 and KS4) for cohort sizes from 1 to 300 pupils.