“The essential implementation question becomes not simply ‘what’s implementable and what works’ but what is implementable and what works for whom, where, when and why?”— Meredith Honig, New Directions in Education Policy Implementation: Confronting Complexity

Honig’s vision of education research methods is easier said than done: too often we use education studies to draw general conclusions about an intervention’s effectiveness without investigating the precise circumstances in which that intervention thrives or fails. It’s this very distinction that came to my mind reading this month’s flattering RAND study of Carnegie Learning’s Cognitive Tutor Algebra I (CTAI) program.

RAND found that the blended-learning Algebra program boosted the average student’s performance by approximately eight percentile points. Researchers emphasized that this study considered “authentic implementation” settings—they tested the relative effectiveness of the program in actual school settings, across a diverse array of students and teachers. The results prove exciting in part because Carnegie is focusing on personalization: its product integrates traditional textbooks and teaching guides with automated tutoring software that provides self-paced individualized instruction and attempts to bring students to mastery of a topic before progressing to more advanced topics. The findings are among a limited set of early indicators that student-centered blended-learning models are poised to drive student outcomes at scale.

Randomized controlled studies like this are extremely rare, mostly because they are so costly. The U.S. Department of Education (DOE) invested $6 million in the study. Thomas Brock, Commissioner of the National Center for Education Research, spoke openly about how pleased the DOE is with the results.

Although I’m encouraged by the rigor of this research, good theory building demands that the research can’t stop here. When thorough studies demonstrate that a high enough proportion of a students benefit from an intervention, we tend to double down on those promising signals. In turn, the fact that some portion of students didn’t fare well is treated as probabilistic noise from which statistically significant signals of efficacy must be isolated. As a result, little is learned from the trial beyond the probabilistic profile of side effects and the proportion of people for whom a given intervention works.

Are we asking the right questions about the results of the CTAI study? RAND intends to follow-up with research on whether the software itself “caused” the boost in outcomes or if other factors—such as the sample size, teaching methods, or textbooks—explain the results. This is a key question in light of the rapid growth of online-learning programs. RAND is also conducting a concurrent study on the costs of the program, which is relevant because the program’s price tag exceeds that of other math curriculum providers.

These may be important questions, but we need to think critically about the fact that average student performance went up among CTAI users. Said differently, what RAND’s initial findings tell us is that only a portion of students who used the cognitive tutoring program responded favorably to it. To establish a sound theory of precisely why and how CTAI can drive outcomes, the next stage of research should be what we call an enriched research trial. At this next stage, researchers should dig in on the anomalies hidden within averages—students or schools for whom the intervention was not at all successful or was wildly successful—to tease apart what was different about the circumstances where those anomalies came up.

The study itself hints at some such circumstances that might be salient. For example, as the study was conducted over two school years, researchers actually found no effects in the first year of implementation, but strong evidence in support of a positive effect (on the magnitude of the eight percentile points mentioned above) in the second year. There was no firm conclusion as to why these results varied so dramatically. The researchers speculated that during the first year, teachers appeared to deviate more radically from traditional teaching methods and that in the second year, resorted back to some traditional methods that may have actually raised outcomes. Such unknowns point to the important circumstances that could explain more about why and how CTAI helps students learn.

Many curriculum and software solutions are never tested in randomized controlled trials. Carnegie has honed a product that stood up to truly rigorous research. We should commend the company for its successes and applaud the federal government for investing in meaningful R&D in the blended-learning space. But educators looking for curriculum solutions should demand more information about the products they hire. To move this research forward one more step, the research community, foundations, and policymakers must frame these randomized controlled trials as an interwoven part of the research process rather than simply “tests” that occur at the end of the process.