Imagine that you’re a tutor. When you finish a session with a student, a dashboard on your laptop shares a summary of what went well and a strategy that could have made some parts stronger. It gives tips for the next session and points to training to help add tools to your tool kit. This isn’t a futurist fantasy—it’s a tool in development right now to leverage the power of artificial intelligence (AI) to bring out the best in tutors and enhance student learning. 

Over the last decade, research has demonstrated how tutoring programs can be organized and augmented with technology to be extremely cost-effective. We call this high-impact tutoring, and it’s delivered to small groups of students during the school day. Those students then work with the same tutor year-round. This type of tutoring can especially help in the early grades for literacy and in middle school math. It offers the attention and nurturing that students—especially disadvantaged youth who don’t often have access to high-impact tutoring—fundamentally need to succeed.

That’s why at Saga Education, we’re leveraging AI to help us understand the extensive data on interactions between individual tutors and their students from 50,000 hours in high-impact tutoring (HIT) sessions a year. With the support of AI, we can pinpoint where tutoring succeeds and where it falls short. We can determine which tutoring practices are most effective and for which students these practices work best. 

Given the highly effective results that the best tutoring models deliver, we want to accelerate their adoption across our nation—AI can make that kind of scale possible. We want to share how we leverage AI—both large language models (LLMs) and state-of-the-art techniques for natural language processing (NLP)—to distill and measure high-impact tutoring and generate a new scalable resource to ensure that tutors are effective.

What AI’s telling us about tutoring’s successful components

American students in grades six through nine often become disengaged in math and begin to feel like math is “not for them.” This happens in other subjects, but math is the most common driver of academic disengagement. When students lose a sense of belonging in school, or just in math, they’re more likely to drop out of school. Tutoring has impressive effects because it is able to combat and reverse this loss of engagement, but we don’t know exactly how or why it works for some kids but not for others. 

The challenge has been that the complexity of the interactions between tutors and students makes measuring what it means to do it well nearly impossible. We know that what works for one student might not work for another student. Randomized controlled trials—the gold standard of measurement—are designed to collect evidence on average treatment effects and to eliminate how individual differences between students or tutors might interact with program design. 

The new AI tools that we’re developing allow us to explore these important differences at the individual level. 

For example, recent research from the University of Colorado Boulder, which was presented at the 25th International Conference on Artificial Intelligence in Education in early July, shows that how tutors talk to students matters. The study examines the impact of different tutoring styles on math achievement among ninth-grade math students. 

Leveraging AI-powered analysis, the study shows that when tutors help students think through problems deeply, those who are already successful using intelligent tutoring systems (ITS) do even better. For students who struggled more with math content and had lower performance with the ITS, tutors who used “revoicing” (repeating the students’ responses by slightly rewording) helped predict student achievement.

Research to practice

Tutors, coaches, and districts are now broadly opting to have in-school tutoring sessions recorded, transcribed, and analyzed. Current LLMs are capable of analyzing and assessing the full context of a tutorial transcript. While the LLMs perform only approximately as well as human observers, their current limitations are far outweighed by the fact that they can carry out these assessments at scale, rapidly, and at very low cost. 

This LLM analysis supports human analysis and can create leading indicators that predict the effectiveness of tutoring program implementations, and can guide continuous improvement. With this comes the ability to give tutors timely feedback. This training can be delivered through their instructional coach or, soon, in lower-cost programs that deploy fewer instructional coaches by recommendations from generative LLMs.

To be successful, tutoring providers need deep institutional knowledge coupled with dedicated coaches for cohorts of tutors—a task not easy for school districts to emulate. If districts have a high-impact tutoring program, they might use part-time, retired teachers or college students looking to earn extra money. Many tutors in high-impact tutoring programs are not education specialists. With these new AI capabilities, we can clearly distill what makes effective tutoring, and we can provide automated coaching to tutors and train them over time to be effective tutors. 

With the help of AI to enhance human-led instruction, we are learning what magic is at the core of effective tutoring practices. Perhaps the most significant potential of AI for education is unlocking a deeper understanding of what makes certain types of human-led instruction effective—and to help us deliver this most human-centric of interventions at scale.

The innovative potential (by Julia Freeland Fisher, director of education research for the Institute):

As Michael B. Horn and I wrote in our 2016 “Blueprint for breakthroughs,” research rarely keeps pace with innovation. To deeply personalize education, effective R&D needs to 1) take advantage of technology-enabled structural shifts to study what works for specific students in specific circumstances, 2) invest in efforts that make data collection more seamless and less arduous on districts in order to allow schools and researchers to collect better, more real-time data on what is actually happening in schools, and 3) Support research that progresses past initial randomized controlled trials, or RCTs, and promotes alternative methods for unearthing what drives student outcomes in different circumstances.

Historically, technology-enabled instruction fell short of deeply personalized learning in part because the market did not reward individual student mastery and in part because data gleaned from online tools was often assessed in the aggregate to gauge efficacy. Marketing materials default to citing Bloom’s 2 sigma effect without interrogating the causal mechanisms behind it. 

That’s a thorny problem that has real consequences on what innovations scale. Disruptive innovations grow on the metrics we hold them to. In an ideal world, those metrics are not only pegged to individual student mastery but are also paired with an understanding of what works for what students to drive outcomes. Saga’s approach is an exciting and long overdue use of AI’s analytical horsepower: not only unearthing more precise insights into how to drive better learning outcomes but also the kind of research needed to drive the edtech market towards true quality.

Authors

  • Krista Marks
    Krista Marks

    Krista Marks is the Strategic Advisor for Saga Education, which provides high-impact, in-school tutoring by leveraging both human capital and technology to accelerate educational equity.

  • Brent Milne
    Brent Milne

    Brent Milne is VP of Product Research and Development at Saga Education.