Building TafelTrainer #4: Evaluating learner performance in the Multiplication Trainer 

In the previous blog posts, we gave an insight in the process of designing and developing the Multiplication Trainer, a multiplication app that can help learners to automatise multiplications through repeated retrieval practice. As part of the design and implementation process, we conducted a pilot to test the system in over 500 primary school students. In the current blog, we will show you the final steps in the process of designing, building and improving the Multiplication Trainer: evaluating its effectiveness by diving in the learners’ data. This blogpost is written as a short, accessible version of a more comprehensive paper on the design, development and evaluation of the multiplication trainer.

Multiplication Trainer

The application enables learners to practice multiplication tables in three progressively more challenging levels. This design is intended to reflect a stepwise progression from the use of procedural knowledge to the recall of declarative knowledge. In all three levels, learners respond to a sequence of multiplication cues until they reach a mastery criterion. The levels differ in how items are scheduled and in how item mastery is assessed. We conducted the pilot in 12 primary schools in the Netherlands, with students aged 6–10 years old. Teachers were encouraged to include the application in their lesson plan during school hours, but determined themselves when and how much they used it. 

Intelligent scheduling algorithm

The design of the learning algorithm used in the multiplication trainer was explained in more detail in an earlier blog in this series. In short, learners started in level 1, where all multiplication items were introduced in a fixed order. Level one was primarily intended to introduce the items to learners:  Learners finished this level if they were able to give a single correct response for each of the multiplication items. Next, learners could progress to levels 2 and 3, where an intelligent item scheduling algorithm was used. The algorithm is based on a computer model of human memory, and tries to estimate and predict memory retrieval performance for each individual learner and each individual multiplication item. Learners completed the level whenever the algorithm estimated that for a specific learner, all multiplication items were unlikely to be forgotten in the next 24 hours. Level 3 was more challenging than level 2, as a time limit of 8 seconds was introduced to encourage learners to recall the correct answer for a multiplication item quickly. 

Learners’ usage and  evaluation of the system 


Figure 1 provides an overview of how students utilized the application. The students  had the freedom to decide when and what to study, with developers having no influence over their choices. During the 106-day pilot period, 547 active learners engaged with the application, collectively completing 218,430 practice trials across 12,160 sessions (as depicted in Figure 1a). On average, each learner spent approximately 45 minutes practicing multiplication over 17 sessions (as shown in Figure 1b). The duration of a session varied based on the learner’s performance, with the option to end the session before completing a level.

Figure 1: Usage of the application. a the total number of sessions initiated each day, b the number of sessions initiated by each learner, per level, c the number of trials in each session, per level.

Figure 2 illustrates the level completion rates, indicating the proportion of learners who completed a level after logging at least 10 trials. Completion rates were notably high in Level 1, averaging 90%, and remained relatively high in Levels 2 (78%) and 3 (77%).

Figure 2: Level completion rate for each of the three levels, per multiplication table.


In the evaluation phase, subjective experiences of students and teachers were assessed through questionnaires. Seventy-nine students from four different schools completed the User eXperience Kids Questionnaire (UXKQ) after approximately four weeks of using the application. Their responses, measured on a five-point scale across hedonistic aspects and perceived learning quality, yielded an average rating of 4.35, indicating a positive evaluation of their experience. Additionally, eight teachers evaluated the application’s usability while observing its use in their classrooms, achieving an average percentile score of 80.9 on the System Usability Scale (SUS). This widely used measure in the field of human-computer interaction reflects a positive rating of the application’s user-friendliness for young children.

Model-based evaluations of learning performance

During practice, our memory model gauges the speed of forgetting for each fact a learner encounters, considering the accuracy and speed of their responses. With each new response, the model updates its estimate, incorporating the latest data and refining its accuracy. This approach contextualizes each response, considering the learner’s previous performance. Using this final rate of forgetting estimate, which accounts for the entire learning history of a fact, we can gauge its difficulty for a learner. This measure allows for comparing fact difficulty and learner ability on a unified scale.

Fact difficulty 

Fact difficulty can be assessed by aggregating final speed of forgetting estimates across learners. A higher average speed of forgetting indicates a more challenging fact. Figure 3 displays mean final estimates for each multiplication fact, derived from responses in Level 3, encompassing data from 47 to 403 individuals. The variability in estimated difficulty among facts is evident.

Comparing difficulty estimates across levels allows us to track changes in fact difficulty as learners progress. Figure 4a presents difficulty estimates based on mean speed of forgetting values from Levels 2 and 3. Despite independent estimation in each level, there’s a strong correlation in fact difficulty (r = 0.86, p < 0.001), indicating persistent relative differences. Additionally, facts generally exhibited lower speed of forgetting in Level 3 compared to Level 2, suggesting increased ease with experience.

Figure 3: Estimated difficulty of each multiplication fact. Each estimate is the final speed of forgetting for a fact, averaged over learners. Higher values reflect more difficult facts. 

Learner ability 

Assessing learner ability involves aggregating final speed of forgetting estimates within learners. Figure 4b shows speeds of forgetting in Levels 2 and 3, based on independently estimated values. Learners generally showed lower average speed of forgetting in Level 3 compared to Level 2. While there’s a weak correlation between levels (r = 0.29, p < 0.001), indicating overall improvement, the degree of improvement varied among participants.

Multiplication trainer: evaluating learner performance

This paper presents an adaptive learning system designed to aid young students in mastering multiplication facts through stepwise automatisation. This blogpost aims to show how we can use formal analysis techniques to shed light on the memory processes that underlie using the multiplication trainer, to evaluate the effectiveness of the system, and to come to suggestions for improvement of the system. Overall, the pilot showcased the system’s effectiveness in facilitating learning, identifying individual learning differences, and promoting a transition from slower computational strategies to quicker retrieval methods.

It’s widely recognized that learners employ various strategies to tackle multiplication problems, with strategies evolving with experience. Distinguishing between retrieval and computation strategies poses challenges, yet our system accommodates a mix of strategies while nudging learners towards retrieval. By providing repeated exposure to problems without penalization in the initial levels and incentivizing quick responses in the advanced level, we observed improvements in both retrieval speed and computational efficiency.

Figure 4: Speed of forgetting in levels 2 and 3. a mean speed of forgetting by facts, b mean speed of forgetting by learner.

The individualized difficulty estimates obtained through our system reveal nuances among multiplication facts, aligning partially with canonical effects found in other studies, such as the problem size effect (the difficulty of a multiplication item increases with larger operands, particularly when the second operand is large (e.g., see Campbell & Graham, 1985; Imbo, Duverne, & Lemaire, 2007)). Notably, while individual differences in learners’ abilities were substantial, our adaptive approach proved fruitful in addressing these differences.Improving the process of learning core mathematical skills like multiplication in young learners is of key importance, especially in light of declining mathematics performance trends in recent years. An adaptive learning system offers a promising avenue for addressing this challenge by supporting effective study methods and accommodating individual learning paces. In our study, both teachers and students reported positive experiences with the application, underscoring its potential in educational settings. Overall, the project demonstrates the effectiveness of an adaptive learning application grounded in computational cognitive models, offering insights into individual problem difficulty and learner ability, thus supporting personalized learning approaches.

Offerte aanvragen?

Vul het onderstaande formulier in:

Of mail naar:

Aan de slag!

Wilt u meer weten?

Vul het onderstaande formulier in:

Of mail naar: