Julian Rossi

Often, so much time is dedicated to the description and interpretation of scientific research, that the experience of the individual practitioner is given less attention. However, the end goal of applied research is to ultimately make the job of the individual practitioner more effective and help guide their work. Therefore, there’s a need to understand what a typical person will see, on a day-to-day basis when using evidence-based interventions. Because of the tendency to condense findings into 1 or 2 sentences, terms such as works and doesn’t work are frequently used when talking about treatment effectiveness. When something works, this often refers to an intervention that has been compared to a competing treatment and found to produce better outcomes, while doesn’t work refers to a treatment that cannot be statistically differentiated from a treatment that has no effect. There is an expectation that practitioners will use treatments that work, and that these treatments will always provide benefit to the client because they are based on research. However, there are times when a research-based treatment will leave a person unaffected or worse off compared to when they began. Alternatively, treatments that have not been supported by scientific research sometimes appear to produce fantastic results. This can be interpreted as the scientific research being inaccurate in some way. However, we need to learn, as practitioners, that these types of results should be expected.

What I want to highlight, is that even when an intervention has a strong evidence-base for its effectiveness, we should not expect that this will translate into guaranteed improvements every single time, even if the treatment is delivered perfectly.

Using mean differences to demonstrate treatment effects

The average difference between two groups is useful for demonstrating how individuals will respond to a given treatment. Using data simulation, I can generate two groups; one that receives a hypothetical intervention, and another group that did not. The two groups generated will be:

The control group – randomly drawn from a population with a mean of 100
The treatment group – randomly drawn from a population with a mean of 115

Both groups will have standard deviations of 30

This difference in scores (½ a standard deviation; d = 0.5) was chosen because it is common among intervention studies in the social/education sciences (but may differ depending on the field).

Now that we have specified the differences between an intervention and a control group, it’s time to draw some samples; 10,000 people from each population. Unsurprisingly, the 10,000 people who were in the control group got an average score of 99.87 (with a standard deviation of 29.84), while the 10,000 who received the intervention scored on average 115.63 (with a standard deviation of 29.77). So, the people who received the intervention scored (on average) half a standard deviation higher.

Just looking at the numbers, we may intuitively expect every person should improve by about 15 points when applying these interventions clinically, but a visualisation provides a different perspective.

Figure 1. Overlapped distributions of those that did not receive an intervention (red) and a those that received an intervention (blue). Based on 20,000 randomly selected participants.

We can see that there is considerable overlap between the two distributions, and many scores of those who are in the control group are higher than those in the intervention group.

What this distribution looks like on a day-to-day basis for a practitioner

Each practitioner is not able to see 10,000 people… and plot all their data points… and then compare that to another 10,000 who did not receive any service. But we can simulate what it might look like if two people are randomly sampled from these populations and see who comes out with a higher score. We know that in general, the people who receive the intervention will be better off, but this is over a long period of time.

To demonstrate this, the sample size will be lowered to 1 for each group (instead of 10,000) and each person’s score will be plotted.

Let’s take a pair and plot it on a graph:

As before, the red dot represents a person from the control group, and the blue dot represents a student that received an intervention. We can see that the person who received no treatment was better off than the person who received an intervention. This, despite the person who received the intervention coming from a population that is ½ a standard deviation higher.

I’ll simulate a few more to show how these numbers jump around:

I hope it is clear now why saying something works or doesn’t work is fairly meaningless.

How often should we expect superiority?

The next step will be to see how many people in the intervention group make better progress than those in the control group. To do this, I have simulated the dot plot visualisations 20,000 times, and then recorded who obtained a higher score. Overall, when we choose two random people – one from the control group and one from the intervention group, the person in the intervention group will fare better 63% of the time.

Conclusions

In practice, it can sometimes feel as though little positive impact has been achieved after applying an intervention. What I aim to communicate is that we can provide a well-designed treatment that is followed with fidelity, and still come out with a result that would have been worse if no intervention would have been applied. However, providing treatments supported by evidence will mean that over the long run, many more people would have benefitted than if we did not intervene.

Little Learners Love Literacy is a whole-classroom approach to explicit phonics instruction that is designed for Prep and Grade 1 students. However, despite being based on well-established evidence, the program itself has not yet been formally evaluated to determine its efficacy. This article presents the results from the first attempt to assess the effectiveness of the Little Learners Love Literacy program, as well as provide ideas for future research.

Methods

The data was collected from 124 children in Prep from two primary schools. The teachers used the Little Learners Love Literacy program to teach reading for the entire year. Each child was tested using a draft version of the Little Learners Assessment of Reading Skills (LLARS) in the middle and at the end of the year. This assessment measured two aspects of literacy ability; phoneme-grapheme correspondences and sight-word reading. First, to measure phoneme-grapheme correspondences, the students were asked to read a list of 55 graphemes out aloud. Sight-word reading was measured by asking the students to read a list of 78 words that became progressively more difficult. For both measures, reported scores indicate how many items the students were able to correctly pronounce.

Results

The descriptive statistics for the students’ phoneme-grapheme scores and word reading scores at the 6-month and end of year are presented in the table below.

Note: PG = Phoneme-grapheme correspondence; SW = sight-word reading

As seen in the table, there were improvements in accuracy for phoneme-grapheme and sight-word identification from the middle to the end of the year. It should be noted that there were some ceiling effects present. This effect was especially noticeable at the end of the year, as some students learnt almost all possible phonemes and sight words.

A paired-samples t-test was conducted to determine whether the change in the means was statistically significant. Because two analyses were conducted, a Bonferroni correction was applied to maintain a 5% error rate, therefore the p-value cut-off was set at α = .025. For the phoneme-grapheme scores, the results of the test revealed a significant increase in mean scores over the 6-month period, t(112) = 23.77, p <.001, 95%CI [13.10, 15.48], d = 2.32. Further, for sight-word scores there was also a significant increase, t(107) = 21.18, p <.001, 95% CI [19.55, 23.58], d = 2.03. These results are depicted in the graphs presented below.

Figure 1. Improvement in phoneme-grapheme knowledge between the 6-month time point and the end of year

Figure 2. Improvement in word reading knowledge between the 6-month time point and the end of year

Limitations

There are three main issues to consider when interpreting this data. First is the lack of a control group; the statistical tests reported assume that the children will not improve from 6-months to post-test. However, the children in the study would have improved in a typical classroom environment without any specifically designed intervention. Secondly, there were no baseline measures taken. This means that it is not possible to know the ability of the children at the beginning of the year, and how much they improved in the first 6-months. And finally, because all data was paper-based, the individual responses for each student are inaccessible due to the time needed to convert it to an electronic format. As a result, psychometric evaluation of the measure was not able to be conducted to establish validity and reliability of the measures.

Future directions for research

The next logical step would be to conduct a year-long evaluation with control groups for comparison. This will address many of the limitations with the current study design and demonstrate whether Little Learners Love Literacy is superior to typical classroom instruction and other methods of literacy intervention (e.g., whole language approaches). Furthermore, while the LLARS has face validity and has been designed by literacy experts, there is a need to statistically validate the scale. Since the current data was collected, the LLARS has been revised to include additional sight-words and a reading comprehension test. The next step will be to electronically score these measures to allow psychometric evaluation of the assessment.

Author: Julian Rossi

Evidence-based interventions from the perspective of an individual practitioner

An evaluation of Little Learners Love Literacy