Imagine facing a task you have never encountered before. Perhaps a broken vacuum, a challenge unfamiliar yet oddly conquerable. Recalling how you tackled a clogged drain by clearing a blocked pipe, you then remember using a broom to retrieve your shoes from under the couch. Skill A meets Skill B, and you wield a stick to clear the vacuum tube, making it breathe anew (see Figure 1A).

In mathematics, a similar fusion of skills occurs. As students face increasingly complex exercises, the underlying connections between skills become more intricate. Educational content creators strive to ensure subject-based skills progress logically into more integrated tasks. However, the order in which the curriculum is set up does not always coincide with the order in which students learn skills. For example, in school, addition is taught from 1-10, then 10-20, followed by 20-1000. The subject order sounds logical, but calculations with big numbers turn out to be much easier for children than calculations past the tens (Straatemeier, 2014). How can educators ensure each student's underlying skill proficiency is met for an optimal path to attempt new combined skills?

Cognitive tutors aim to tutor students based on building a fair representation of the skill to be learned and the student's current knowledge of the skill (Anderson, Corbett, Koedinger, & Pelletier, 1995). Representations of skill acquisition in cognitive tutors are often made by hand. Logically, simple addition equations and adding large numbers combine to make addition equations with large numbers possible (see Figure 1B). However,

it is not always possible to determine these relationships based on logic. How can you determine what correct skill representations are?

## A data-driven graph approach to find the optimal path to skill proficiency

We set out to assess whether underlying skill representations can be discovered from a set of maths exam data from first-year MBO (Mid level vocational education) students. We set up a graph ensuring that categories of skills are represented, while also maintaining a difficulty hierarchy.

The underlying idea is, say you have two exercises C and D. If most students score better on exercise D and worse on C, C should be hierarchically higher in the graph, and D should lead to C (see Figure 2B). In other words, the skill in the exercise that was easier for most people should be a prerequisite for the skill tested in the more difficult question. If scores are similar, both exercises should reside in the same node, suggesting that the two questions really test the same skill (see Figura 2A). If there's a very different scoring pattern, exercises are not hierarchically related, and might represent two separate skills entirely (Figure 2C). An item × student matrix was created, and similarly scoring students were grouped using a k-means clustering algorithm (Lloyd, 1982).

Next, we used an Artificial Intelligence algorithm called simulated annealing to determine what nodes of the graph exercise pairs should go (Xiang, Gubian, Suomela, & Hoeng, 2013). Simulated annealing reduces the number of violations exercise pairs make with regards to the constraints in Figure 2. Based on the total number of violations, the optimal number of skills is decided.

## Graaf Tel: A tutoring system in the classroom

During the summers of 2022 and 2023, a pilot was set up to visualize skill graphs based on this hierarchical organization of skill proficiency (Taatgen, Hoekstra & Blankestijn; 2024). A teacher conducted their regular instruction, after which students would work individually with our cognitive tutor, Graaf Tel (see Figure 3 and 4).

The teacher started the lesson with a recap of the previous lesson and some new materials. After this, the students received a short explanation from the teacher about Graaf Tel. The explanation included the idea that there are a number of skills present in the course materials (represented in the nodes) and that the students can practice these skills by making the assignments placed in these nodes. To make practice optimal and keep students motivated, we wanted to ensure that they were challenged enough but not too much. This is called deliberate practice (Ericsson, 2006). Therefore, the preferred way was to start at the bottom (with a question from the current paragraph) and then keep making assignments in this node until the progress bar was fully filled up.

**Results**

Node scores showed significant predictive power on the scores the students got when attempting the exercises. This indicates the system has merit. Students found that the graph helped them make decisions on which exercises to make and that it provided them insight into their own skill level. They didn't think it motivated them much more than regular exercise making. One reason for this might be that students had difficulty switching between the Noordhoff exercise environment and the Graaf Tel environment. Furthermore, there was no direct connection between scores on exercises and their graph, since the graph would only update the next lesson according to their performance.

The control and experimental groups were not large enough to perform statistical tests on. There was no significant difference in final test performance in the groups. What is notable is that the experimental group had a passing rate of 58% compared to the two controls who scored 35% and 37%. Further investigation showed that this difference is partly attributable to both controls having a larger dropout rate. To investigate whether more students felt they were capable of passing the test and therefore not dropping out is speculation. Therefore, further research will be conducted to investigate the performance benefits of this graph-based data-driven approach.

**Future directions: Using Elo ratings to determine skill proficiency – a chess game against maths exercises**

**A data-driven chess rating approach to tutoring skills**

As outlined, data-driven approaches can prevent handcrafting bias and find underlying skill representations. As with the current implementation of Graaf Tel, a data-based approach often needs data to train on, a pre-calibration step, where the difficulty of each question in a test is determined, which is a costly process.

Math Garden (rekentuin.nl; Klinkenberg, Straatemeier, & van der Maas, 2011), a maths learning application for children, circumvents this calibration step. Math Garden uses performance on previous exercises to decide what exercises to present to the student and perform on-the-fly assessments of exercise difficulty and student ability.

To assess difficulty and ability, it employs the Elo rating system (Elo, 1978), which is a popular way to rate the proficiency level of chess players and has spread to other sports like basketball and tennis. In chess, your opponent is chosen based on their rating compared to yours. In Math Garden, the opponent is an exercise. After every “match” between a student and a specific question or exercise, the difficulty rating for the exercise, and the student's ability rating is updated. The upside is an on-the-fly measurement of skill ability and exercise difficulty, eliminating the need for pre-tests.

Math Garden exercise sorts difficulty on a single dimension, generalized mathematical ability (see Figure 5). To assign your opponent to play a tennis match against, generalized tennis ability is a fine measurement to ensure both players have an equal chance of winning. However, to improve at tennis you actually perform specific skill-based exercises to fill in weaknesses. Likewise, to work on their weaknesses students need to practice particular skills. Generalized mathematical ability does not account for performance on skills that have no relation (see Figure 2C). To prevent this linearity in the order of exercises, one could organize the exercises in a graph form (Falmagne, Koppen, Villano, Doignon, & Johannesen, 1990, as in Figure 1). The challenge here is figuring out how well the graph represents the actual skills that need to be learned.

**Combining graphs and Elo ratings**

Currently, work is being done to combine the two approaches mentioned above; a graph setup including skill dimensions and Elo ratings. Elo in Math Garden is one dimensional, that is, one exercise is only generally more difficult than another, which would make the chance more unlikely that a student with a lower Elo rating can solve it (see Figure 5).

Teachers and students need to know not only how difficult something was, but also why something was difficult, so they can improve. A one-dimensional maths proficiency level does not take into account that an underlying skill requirement might not be met to warrant starting the exercise in the first place (see Figure 6). General maths proficiency is not the only determinant of performance on maths exercises. Two exercises requiring two different skills can be equally difficult on average. However, if one only possesses the skill required to solve one but not the skill for the other, they will not be able to solve other problems requiring the other skill. We will continue building this multidimensional Elo approach and launch a new research project after the summer.

All in all, education requires an unbiased data-driven approach that uncovers how skills are learned. A graph-based Elo rating approach helps learners practice at their optimal level of difficulty and gives insight into where practice is most beneficial. We look forward to testing our generalizable approach to help understand the underlying skill requirements for maths and improve maths learning.

## References

Anderson, J. R., Corbett, A. T., Koedinger, K. R., & Pelletier, R. (1995). Cognitive tutors: lesson learned. *The Journal of the Learning Sciences, 2*(2), 167–207.

Elo, A. (1978). *The rating of Chessplayers, Past and present.* New York: Arco Publishers.

Ericsson, K. A. (2006). *The Cambridge handbook of expertise and expert performance. chapter The Influence of experience and Deliberate Practice on the Development of Superior Expert Performance. *Cambridge University Press.

Falmagne, J.-C., Koppen, M., Villano, M., Doignon, J.-P., & Johannesen, L. (1990). Introduction to knowledge spaces: How to build, test, and search them. *Psychological Review, 97*(2), 201.

Klinkenberg, S., Straatemeier, M., & van der Maas, HLJ (2011). Computer adaptive practice of maths ability using a new item response model for on the fly ability and difficulty estimation. *Computers & Education, 57*(2), 1813–1824. https://doi.org/10.1016/j.compedu.2011.02.003

Straatemeier, M. (2014). Math Garden: A new educational and scientific instrument. [Thesis, fully internal, University of Amsterdam].

Taatgen, N.A., Hoekstra, C., Blankestijn, J.A. (2024). Data-driven cognitive skills with an application in personalized education. Proceedings of the Annual Meeting of the Cognitive Science Society. Accepted for publication.