Educational RCTs

Around a decade ago I did a PGCE. Part of that PGCE involved a small-scale investigative study, from which I learnt a great deal about educational research, but not – I suspect – any of the intended learning outcomes.

I was teaching a year-7 class, whose graphing skills needed improvement. Students would often forget to give meaningful titles* to graphs. They would forget to label axes; they would join-the-dots rather than adding a line of best fit; and so on.

So I prepared a model scatter-plot crib for them to stick in the back of their exercise books, as an educational intervention. I intended telling them to compare their own graphs with it each time they drew one. But then I thought, “might it not be better if I could highlight the things they forget most often too?” So I collected four sets of data from each student before the intervention so I could tailor each crib to the individual student.

Practice makes perfect [CC-BY-SA-3.0 Steve Cook]

The generic crib would remind Zainab-Zahra about the most commonly forgotten things; the specific crib would remind Billy-Bob to put in a line of best fit. I am quite aware this crib contains unironic use of Comic Sans. The shame still burns inside like thrush.

The hassle involved in collecting and classifying their errors to prepare individual cribs was large. Was it worth it?

The obvious answer was to randomly assign the students in the class to receive either the specific crib, or the generic crib, and then to measure the change (if any) in their graphing ability on the next four graphs they plotted. If the specific crib caused a marked improvement compared to the generic crib, and – particularly – if the time I spent correcting poor graphs in the generic crib group was larger than the time I spent preparing the specific cribs, then maybe the specific crib was worth the hassle.

I was not allowed to run this trial. The reasons I was given by (some, by no means all) teachers, my PGCE tutors and (implicitly) by the educational literature made me despair:

It is unethical to give students the generic crib, because the specific crib will give an advantage to those students receiving it.

This is doubly begging the question. How can we possibly know what intervention improves their graphing without testing it? How can we possibly know whether the more student-effective intervention is also the more cost-effective?

I wanted to do this intervention in my year-7 classes, but the year-7 classes down the corridor would be receiving a completely different focus on graphing skills (or none at all!) It wasn’t as if I would leave the students in the disadvantaged group (should there even be one) high-and-dry afterwards, but the same could not be said of the students down the corridor.

You cannot measure a student’s learning by ticking off features on a graph.

Goodhart’s Law (a.k.a. the Lucas Critique, or Campbell’s Law) is the observation that if you put pressure on a surrogate measure of an outcome for regulatory purposes, the surrogate measure will lose its value as a measure of that outcome.

In educational terms, if you measure ‘educational success’ as the number of students gaining A*-to-C grade GCSEs, it is quite rational to do whatever is necessary to ensure more students get A*-to-C grades. This might mean better teaching across the board, but it could equally well mean a disproportionate focus on students heading for D’s to the detriment of those heading for E’s and A’s. The latter is a completely rational response, but the number of students getting A*-to-C grades will lose the value it once had as a measure of overall ‘educational success’.

It is quite true that reducing ‘graphing ability’ to ticking eight boxes:

  • correct placement of outcome variable on y and input variable on x
  • consistent axis scales
  • properly labelled axes
  • use of units if appropriate
  • meaningful title describing the trend
  • correct plotting
  • appropriate line of best fit, not merely a joining up of points
  • general clarity

cannot measure a student’s true ‘graphing ability’. Certainly, it is important that a student knows when a scatter-plot is appropriate (instead of a histogram or pie-chart), and that s/he grasps the point of plotting data graphically.

However, mis/labelling axes is something even my undergraduates frequently forget to do: it’s a trivial skill, but clearly one many students struggle with. A score out of eight for a graph is not going to perfectly capture ‘graphing ability’; but it is not meaningless, and the pressure to devalue ‘ticks on graph features’ as a surrogate measure of ‘graphing ability’ was already present from the tick-boxy nature of the GCSE coursework mark-schemes.

Oddly, objections I did not hear from anyone considering my proposal were:

  • The sample size would be very small (about 30). The difference in the effect size would need to be enormous to be detectable.
  • The study is unblinded: both the teacher and the students would know which group each student was in. This would introduce bias, and the students with the generic cribs might create their own specific cribs, reducing the effect-size further.

I had answers for these objections (mostly: recruit more year-7 classes to the trial), but never had to use them. The fact that a ‘small-scale investigative study’ was part of my PGCE at all indicates that unmeasurably small effects were not a overriding concern. I am still not entirely sure what the ‘small-scale investigative study’ was actually supposed to achieve, for the (school or PGCE) students, given the tiny sample size, and the reticence of all concerned to do anything mildly scientific.

In reading the comments on Ben Goldacre’s piece on The Guardian‘s website, I see the same objections being raised to the sort of RCTs currently being supported by the EEF as were raised against my modest (trivial!) proposal ten years ago. These misconceptions should be challenged.

I was not allowed to do my trial. Instead I was made to give all the students the time-intensive specific crib. This intervention was deemed a success because the students’ graphing ability improved after the intervention.

If you force a class of year-7s to draw eight graphs in eight weeks, it is almost inconceivable that they would get worse at it. This study-eviscerating flaw was not noted in the critique of my write-up for the PGCE, but the fact I had not really “reflected on my experience as a teacher” was.

Perhaps this blog post is that reflection, a decade late.


* As an aside, “A graph to show how y varies with x” is not – and has never been – meaningful. I was made to teach this junk-title template. Judging by my undergraduate’s graph titles, someone is still getting teachers to teach this junk-title template. Whoever this malign force is – please stop. Please.

17 pings

Skip to comment form

  1. […] is to continually ask, about your own teaching, “does this actually work?”, and yet, as this blog post poignantly notes, that culture has not yet permeated all of education – there are still some who feel […]

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.