Following on from this earlier post, explaining the concept of statistical significance, this post explores a couple of interesting situations that can arise in school data, where different data tools appear to be giving conflicting messages about the same set of results.
Both of these examples come from genuine queries that I have received from Headteachers.With the kind permission of both Headteachers, I am sharing the queries, including some snapshots of their data, along with my explanations.
ASP and CSP show that my school’s progress in writing is Average (a category that incorporates about 66% of schools).
But in the IDSR, progress in writing is shown in the bottom 20% (5th quintile).
How can we be both ‘average’ and in the bottom 20%?
Your writing progress score (-2.9) is indeed in the bottom quintile of the national range of progress scores. In fact it is at the 90th percentile, so just into the bottom 10% of scores.
On page 9 of the IDSR, which shows the progress scores in a graphical form (image below), you can see that the blue circle (which represents your writing progress score) appears exactly on top of the dotted red line (which shows the 90th percentile i.e. the cusp of the bottom 10% of scores).
However – and this is the key point – the progress score is not statistically significantly different to average. You can see that the upper limit of the confidence interval is in the positive half of the chart (above zero). This is also shown in ASP and CSP – the confidence interval for Writing goes from -7.1 to +1.3. This is why in ASP and CSP the progress categorisation is ‘Average’ (the yellow zone). Any progress score that is not statistically significantly different to average is shown in this Average zone (even though the actual progress score itself is a very low score, in the lowest 10% of the national distribution of progress scores).
The fact that this low score is not significantly different to average is due to the small numbers of children (8 children counted in the progress measure).
And, digging into the data a bit more using the scatterplots in ASP, I found that the overall progress score has been very much driven by 1 child who has an exceptionally low progress score. The other 7 children are much closer to the ‘average progress’ line (some above, some below).
To further illustrate how much the progress score has been driven by one child, FFT Aspire now includes the very useful facility of allowing you to edit out individual pupils from the data to explore the effect those pupils had on the overall scores. In this particular school’s case, removing this one child from the data changes the progress score from -2.9 to 0.0 (i.e. it becomes exactly in line with national average progress).
The smaller the size of the cohort, the more it can be affected in this way by the performance of just one child. This is what is meant by being aware of the impact of ‘statistical outliers’ in the data.
And it is important to bear in mind that Ofsted’s September 2017 update on school inspection states:
“inspectors must be cautious in making any inferences about underperformance of small numbers of pupils in schools in any group”
This is very much an issue for small schools. It is very much less likely for a small school to have a score that is statistically significantly different to average. This could be seen as a positive or a negative though - harder to be significantly above average, but equally unlikely to be significantly below.
ASP shows our writing progress as well below average.
CSP shows the writing progress asaverage.
How can this be?
This is a curious one and it all comes down to the fact that CSP shows the figures to just 1 decimal place, whereas ASP uses 2 decimal places.
The actual progress score (i.e. the average of the 14 individual pupils’ progress scores) is, to 3 decimal places, -3.224.
The confidence interval, on the basis of 14 pupils, is “Progress Score +/- 3.168” i.e. -6.392 to -0.056
So, in ASP (using 2 decimal places) the progress score is shown as -3.22, but the full range of the confidence interval is from -6.39 to -0.06 (statistically significantly below average, as the full range is below zero).
CSP, however, appears to have rounded the progress score to -3.2 and rounded the confidence interval to +/-3.2, giving a range -6.4 to 0.0 (no longer statistically significantly below average, as the confidence interval now includes zero).
(Mathematically, this is an odd thing to do. Rounding the numbers should really only be done at the final stage of a calculation, not at intermediate points in the calculation, as that can distort the outcome, as it has here. Logically, rounding to 1dp should have given a confidence interval of -6.4 to -0.1: back to being ‘sig minus’.)
So, back to ASP. Because the whole range of the confidence interval is below zero, the progress score is statistically significantly below average. Coupled with that, the score of -3.22 is sufficiently low to place it in the ‘Well Below Average’ category. (This DFE guidance document tells us that, in writing, any score of -2.9 or lower, if statistically significant, is in the bottom 10% of scores and hence classified as 'Well Below Average'.)
Whereas CSP places this score in the 'Average' band as it has found the score to be not statistically significantly different to average, owing to a slightly quirky mathematical methodology.
So an interesting situation exists here whereby the data that is in the public domain (CSP) shows progress in writing as average. But both ASP and the IDSR, which Ofsted would use, show a very different picture.
As I pointed out to the Headteacher in question, the figures shown in ASP and IDSR are definitely the ones to focus on. The fact that CSP implies something else is unlikely to improve the situation in an inspection.
I hope that these two examples may prove useful to other users of school data who are finding that different data tools appear to be giving conflicting messages about their results.