Educational RCTs

Around a decade ago I did a PGCE. Part of that PGCE involved a small-scale investigative study, from which I learnt a great deal about educational research, but not – I suspect – any of the intended learning outcomes.

I was teaching a year-7 class, whose graphing skills needed improvement. Students would often forget to give meaningful titles* to graphs. They would forget to label axes; they would join-the-dots rather than adding a line of best fit; and so on.

So I prepared a model scatter-plot crib for them to stick in the back of their exercise books, as an educational intervention. I intended telling them to compare their own graphs with it each time they drew one. But then I thought, “might it not be better if I could highlight the things they forget most often too?” So I collected four sets of data from each student before the intervention so I could tailor each crib to the individual student.

Practice makes perfect [CC-BY-SA-3.0 Steve Cook]

The generic crib would remind Zainab-Zahra about the most commonly forgotten things; the specific crib would remind Billy-Bob to put in a line of best fit. I am quite aware this crib contains unironic use of Comic Sans. The shame still burns inside like thrush.

The hassle involved in collecting and classifying their errors to prepare individual cribs was large. Was it worth it?

The obvious answer was to randomly assign the students in the class to receive either the specific crib, or the generic crib, and then to measure the change (if any) in their graphing ability on the next four graphs they plotted. If the specific crib caused a marked improvement compared to the generic crib, and – particularly – if the time I spent correcting poor graphs in the generic crib group was larger than the time I spent preparing the specific cribs, then maybe the specific crib was worth the hassle.

I was not allowed to run this trial. The reasons I was given by (some, by no means all) teachers, my PGCE tutors and (implicitly) by the educational literature made me despair:

It is unethical to give students the generic crib, because the specific crib will give an advantage to those students receiving it.

This is doubly begging the question. How can we possibly know what intervention improves their graphing without testing it? How can we possibly know whether the more student-effective intervention is also the more cost-effective?

I wanted to do this intervention in my year-7 classes, but the year-7 classes down the corridor would be receiving a completely different focus on graphing skills (or none at all!) It wasn’t as if I would leave the students in the disadvantaged group (should there even be one) high-and-dry afterwards, but the same could not be said of the students down the corridor.

You cannot measure a student’s learning by ticking off features on a graph.

Goodhart’s Law (a.k.a. the Lucas Critique, or Campbell’s Law) is the observation that if you put pressure on a surrogate measure of an outcome for regulatory purposes, the surrogate measure will lose its value as a measure of that outcome.

In educational terms, if you measure ‘educational success’ as the number of students gaining A*-to-C grade GCSEs, it is quite rational to do whatever is necessary to ensure more students get A*-to-C grades. This might mean better teaching across the board, but it could equally well mean a disproportionate focus on students heading for D’s to the detriment of those heading for E’s and A’s. The latter is a completely rational response, but the number of students getting A*-to-C grades will lose the value it once had as a measure of overall ‘educational success’.

It is quite true that reducing ‘graphing ability’ to ticking eight boxes:

  • correct placement of outcome variable on y and input variable on x
  • consistent axis scales
  • properly labelled axes
  • use of units if appropriate
  • meaningful title describing the trend
  • correct plotting
  • appropriate line of best fit, not merely a joining up of points
  • general clarity

cannot measure a student’s true ‘graphing ability’. Certainly, it is important that a student knows when a scatter-plot is appropriate (instead of a histogram or pie-chart), and that s/he grasps the point of plotting data graphically.

However, mis/labelling axes is something even my undergraduates frequently forget to do: it’s a trivial skill, but clearly one many students struggle with. A score out of eight for a graph is not going to perfectly capture ‘graphing ability’; but it is not meaningless, and the pressure to devalue ‘ticks on graph features’ as a surrogate measure of ‘graphing ability’ was already present from the tick-boxy nature of the GCSE coursework mark-schemes.

Oddly, objections I did not hear from anyone considering my proposal were:

  • The sample size would be very small (about 30). The difference in the effect size would need to be enormous to be detectable.
  • The study is unblinded: both the teacher and the students would know which group each student was in. This would introduce bias, and the students with the generic cribs might create their own specific cribs, reducing the effect-size further.

I had answers for these objections (mostly: recruit more year-7 classes to the trial), but never had to use them. The fact that a ‘small-scale investigative study’ was part of my PGCE at all indicates that unmeasurably small effects were not a overriding concern. I am still not entirely sure what the ‘small-scale investigative study’ was actually supposed to achieve, for the (school or PGCE) students, given the tiny sample size, and the reticence of all concerned to do anything mildly scientific.

In reading the comments on Ben Goldacre’s piece on The Guardian‘s website, I see the same objections being raised to the sort of RCTs currently being supported by the EEF as were raised against my modest (trivial!) proposal ten years ago. These misconceptions should be challenged.

I was not allowed to do my trial. Instead I was made to give all the students the time-intensive specific crib. This intervention was deemed a success because the students’ graphing ability improved after the intervention.

If you force a class of year-7s to draw eight graphs in eight weeks, it is almost inconceivable that they would get worse at it. This study-eviscerating flaw was not noted in the critique of my write-up for the PGCE, but the fact I had not really “reflected on my experience as a teacher” was.

Perhaps this blog post is that reflection, a decade late.


* As an aside, “A graph to show how y varies with x” is not – and has never been – meaningful. I was made to teach this junk-title template. Judging by my undergraduate’s graph titles, someone is still getting teachers to teach this junk-title template. Whoever this malign force is – please stop. Please.


1 ping

Skip to comment form

    • Stephen Lindsay on 20/03/2013 at 16:04
    • Reply

    I recently started a PGC in tHE and a similar idea was common in that. What mystified me was the implicit acceptance of much higher variance in quality of course taught year-on-year due to your continuing professional development whilst condemning the idea of randomisation of a single year.

    However, at the risk of sounding anti-science (I’m not – honest!) I do think that statistical significance is over-emphasised in this field. I like to think that what we are doing is path-finding, we are going to be teaching the topics we teach regardless of the presence or absence of studies supporting our approaches. There is no “neutral” approach to teaching, we have to select an appropriate approach and, given the lack of evidence, anything that suggests that one approach is better than another IS better than nothing. Just because you might not be able to get a statistically significant result with the number of students you have, that doesn’t mean you shouldn’t do the study or select one approach over another if it gets a better average result.

    1. If the difference in chosen outcome is not statistically significant, then there is nothing to choose between the two interventions, and you can do whichever you prefer; probably the cheapest/easiest one if this is not factored into the outcome directly.

      You can certainly argue that “statistically significant” (in the sense of p<0.05) is an arbitrary cut-off, but I'm not sure that's quite what you mean here.

    • Col on 20/03/2013 at 16:05
    • Reply

    I’d be laughing if I weren’t crying. I had a very similar experience while doing a PGDE in Scotland. My interest then (as indeed now) was about developing literacy and communication skills within the mathematics class. but I was told by the school that it would be unethical to do a split test, even on such a small scale. Heaven only knows how they would have felt about a *random* controlled test. The research project could be qualitative but not quantitative. To be fair, my tutor was all for it but said I had to bow to the wishes of the school. One wonders how we ever make progress?

    1. I’m not terribly sure we do make progress. Most of the educational psychology literature I was made to study seemed to be a mixture of pseudoscientific bunk and the bleedin’ obvious, but to suggest that the Emperor’s willy was flapping in the breeze was verboten.

    • Pete J on 21/03/2013 at 19:38
    • Reply

    I’m a current PGCE student (science) and am in the midst of this exact issue. I know that any results I collect as part of my ‘study’ (giving rich feedback, in the place of feedback and grades) will be absolutely worthless. I was also told that I can’t do an RCT, as it would not be ‘ethical’. Again, this presumes that we already know which treatment is best. In all honesty, I don’t have the time, will or ability, in the school I’ve been placed in, to actually do any of this, so I’m just going to fabricate my ‘results’, as anything I make up will have equal meaning and value to anything I found out by just giving a relatively small group some feedback, instead of grades, for a few weeks. I’m slightly worried that I’m already completely sick of the edu-babble and pseudoscience which seems to infect education and the morons who seem to swallow and regurgitate it wholeheartedly.

    1. I certainly wouldn’t condone faking results, but I understand your frustration. I collected the data I did for this study with a very heavy heart, knowing that it was essentially meaningless. The students got better at graphs, but whether they would have got just as much better at it with more easily produced cribs, or without my input at all, or – indeed – trapped within the confines of a Skinner box, I simply don’t know.

    • Patrick Meehan on 21/03/2013 at 20:34
    • Reply

    I did a PGCE a few years ago. What you need to realise is that the academic element of the PGCE is meaningless twaddle designed to keep the educational academics in jobs. You’re no better a teacher knowing the “theory” than you would be if you just spent time in class with support from an experienced teacher.
    The reason for this is obvious – whilst academic research in education is clearly important, like most social science it is mainly just fashionable tripe and so best ignored. The results are always made to fit the ideology.

    1. The support of excellent teachers (which I did have) was indeed far more useful than the theory.

  1. The theme here seems to be a story of student teachers getting poor advice and supervision on one aspect of their PGCE courses. Given the meagre resources allocated to such courses it would be surprising if research methods experts could be assigned to every student project. It would be surprising too if there weren’t a lot of teachers in schools who were unable to speak authoritatively about the ethical issues. At heart, the problems here seem to be about trust and anxiety rather than about technical correctness.

    Can I encourage any research-hungry PGCE students to keep the fire in their bellies for as long as it takes? As qualified teachers with good colleagues in school posts (supported when parents and Governors and OFSTED come asking what the hell you think you are doing) you should be able (as I did) to conduct and publish half decent research and to embark on higher degrees where disputes about methodology, statistical techniques and ethics can be conducted with better qualified supervisors. If that is what you want to do.

    Do supplement Goldacre’s highly rhetorical piece with something more considered from the extensive US literature. ERIC is a useful resource. Try this one

    1. The time devoted to doing this pointless study was time that was taken away from actually learning how to teach. If you want to do a proper study, it requires a well thought-out methodology. A well thought-out educational trial of any sort is incompatible with the premise of a ‘small scale investigative study’, and is actively infuriating to someone with a research background.

  2. Have just bought a natty little book by Ennio Cipano called ‘Practical Research Methods for Educators’ which is an encouraging and informative read for practitioners who are realistically only in the position to carry out single-case research. Good variety of single-case designs for use in classroom research and written in a style that is accessible for coffee-break reading.

    1. Torgerson & Torgerson’s “Designing Randomised Trials in Health, Education and the Social Sciences: An Introduction” is useful, although it suffers from being rather reliant on just a few real-world examples.

    • welf on 25/03/2013 at 15:00
    • Reply

    A very interesting post that I thoroughly enjoyed reading.

    A minor point in the context of the article certainly, but given your (very valid) footnote on graph titles, I thought I would comment on your suggestion that pie charts might be sometimes be appropriate. It seems that teachers are still teaching this, “Whoever this malign force is – please stop. Please.”

    More thoughts, and links to reasoned argument:

    1. You might not want to look at this then. I don’t mind them that much, but angled-perspective ones are simply awful, and I’m becoming much more aware of the issues of colour-coding anything for about 10% of my audience.

    • David H. on 22/04/2013 at 06:54
    • Reply

    ‘Judging by my undergraduate’s…’ Tell that undergraduate to stop.

    I never capitalise graph legends or table headings but lots of people seem to. Who’s right?

    1. Muphry’s Law (or some near relative) strikes again.

      I have no preference about capitalisation I’d want to impose. I generally don’t, but I rarely use words anyway: most of my axes are “[S] / µM” and “λ / nm”, etc.

  1. […] is to continually ask, about your own teaching, “does this actually work?”, and yet, as this blog post poignantly notes, that culture has not yet permeated all of education – there are still some who feel […]

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.