Usability testing plays a crucial role in the product creation journey. Conducting usability testing sheds light on understanding whether your users find your product easy and intuitive to use, making it an essential skill for every UX designer in their daily jobs. In this blog, I will cover my personal experience in conducting testing of an educational app with children in a primary school classroom setting. While I won’t provide a comprehensive guide on best practices for usability testing, my aim is that the experiences I narrate, particularly in testing the pilot of Tafel Trainer, will serve as valuable insights for those who are starting their first usability testing.
What is usability testing?
When answering the question “What is usability testing?”, the most common answer is observing users as they interact with aspects of the app. When done right, it can improve the understanding and empathy you have for your users. Testing your users allows you to receive unbiased opinions on a product, pointing out areas that need improvement to make for a smoother experience.
Case study: Tafel Trainer
Tafel Trainer is an educational app designed to help children between 6 and 12 years old learn their times tables. It uses MemoryLab’s adaptive algorithm, a scientifically proven method that schedules fact repetition based on individual parameters. When designing Tafel Trainer, our team kept balanced gamification in mind. As an example, our team included intrinsic motivators such as rewards in the shape of numbers, closely tied to achieving mastery in multiplications. We designed the application to have three levels of difficulty, one of which includes a time bar, where children have a set time to answer (see Fig.1). The time bar is intended to encourage them to answer faster, and aims to make the game more enjoyable and engaging. Moreover, we included encouraging feedback and avoided creating competition between children. Testing Tafel Trainer involved more than 600 children from 12 different schools. Usability testing played an important role in assessing its functionality and children’s overall engagement with the application.
Usability testing, how did we do it?
To achieve your usability goals, a well-structured plan is essential. In line with this, we started usability testing by defining clear goals and research questions to guide our observations and we outlined the methods we would use in observing user behaviour. To clarify these goals, in the initial stages of the usability testing, we focused on secondary research, concentrating mostly on understanding how to measure the intended behaviours and identifying the current state-of-the-art best practices in the field. Our objectives were to identify features that are less intuitive for children to use, assess the engagement level of design elements, and evaluate their interaction with the app (i.e. the lesson duration appropriateness). We aimed to identify specific tasks or aspects within the app that lead to frustration, such as observing how their experience is affected by the time bar, or receiving feedback from one of the characters.
Usability testing can be conducted in various forms, from remote to face-to-face (Bastien, 2010). In the creation of the Tafel Trainer, we opted for moderated usability testing, a method in which a UX designer conducts the testing in person, guiding users and observing their interactions. This was ideal for us since we were testing children and needed additional guidance. Alternatively, one can opt for unmoderated usability testing, where the users are not guided during the process, and complete the testing on a computer, independently.
When it comes to methods used in usability testing, one can choose between qualitative and quantitative approaches. The most common form of qualitative method used in testing is the think-aloud method, where users are invited to verbalise what they are thinking. This is usually done while interacting with a product (McDonalds, Edwards & Zhao, 2012) during interviews or direct observations in real-world settings. As for the most common quantitative methods, most designers use key performance indicators (success rate, error rates, time on task, etc.) or scales aimed at measuring usability.
In this pilot, we adopted a combination of both methods while asking users to perform specific tasks within the app (i.e. Can you please log out?). On one hand, we used qualitative methods such as the think-aloud method during direct observations with open-ended questions. This offered us great insights into how users use the product and how they feel about it. On the other hand, we used quantitative methods such as completion rate (the percentage of users that have successfully finished a task) and error rate (the percentage of users that have made an error in finishing a task). This helped us understand which tasks were not completed and where improvements were needed.
Another quantitative method we used to measure usability was a questionnaire. For this, we administered an age-appropriate method, namely the UXKQ (User eXperience Kids Questionaire – Wöbbekind, et al., 2021), a scale designed to assess usability for young age groups. Here, children were presented with semantic differentials – two words with opposite meanings – describing the app, such as “Boring – Fun”. We tested a total of 8 semantic differentials (see Fig. 2). Children had to colour in stars that best represented their perception, with five stars indicating the app was (for example) fun.
Additionally, we surveyed teachers and conducted interviews to gather more insights into their experiences. The surveys focused on 4 key aspects: teachers’ perspective on child usability, comparison of learning outcomes with other apps, engagement levels, and gamification elements (i.e. Does Tafel Trainer’s content fit with the age and curriculum of children?). For the first aspect, we used an adaptation of the SUS (System Usability Scale), a widely used questionnaire for assessing usability in human-computer interactions (Brooke, 1996). Due to our target age being too young, we asked teachers to assess the usability of Tafel Trainer from their perspective, while observing the children in the classroom setting. To achieve this, we formulated questions such as: “I found the app unnecessarily complex for children”.
Insights from usability testing and exploring new possibilities
After collecting and plotting the results, it was time to uncover the new possibilities for our product and answer key questions such as What can be improved? What were the main themes observed? This was an important moment that helped in refining our product to better support the needs of children.
Through direct observations and the think-aloud method we discovered that most of the tasks were completed successfully. However, some gamification elements we chose may not have functioned as intended. An example was the display of the feedback given by a character, where we observed that children tend to skip reading it or to not have enough time, prompting us to shorten it. Furthermore, we identified accessibility elements that could enhance inclusivity, such as larger text and increased customizability of features.
From the quantitative methods, we learned that our product is perceived as usable. In fact, it received an excellent grade from teachers on the SUS scale. By assessing the UXKQ questionnaires from 79 children we learned that they feel enthusiastic about our app, with an average score of 4.35 across all the semantic differentials. This also shed light on possible improvements for the entertainment dimension (Boring/Fun) and indicated that the app is perceived as good for learning (see Fig. 3).
By assessing teachers’ interviews we learned that they displayed interest in our design choices and in particular our algorithm. In line with this, one teacher mentioned: “The nice difference is that there is no competitive element in TafelTrainer”. Another teacher added: “It proves to be very effective for memorisation. The pace is very nice, and I think it is important that this tool helps with learning fast and effectively. It is a useful examination/ practice method.”
This pilot helped us comprehend how the MemoryLab algorithm can be effectively utilised by younger learners within an application designed for this age group. The high appreciation expressed by teachers and children in classrooms reflected the usability of the pilot. While indicating enthusiasm, it also highlighted specific areas that could be further improved.
Bastien, J. C. (2010). Usability testing: a review of some methodological and technical aspects of the method. International journal of medical informatics, 79(4), e18-e23. https://doi.org/10.1016/j.ijmedinf.2008.12.004
Brooke, J. (1996). Sus: a “quick and dirty” usability scale. Usability evaluation in industry, 189(3), 189-194.
McDonald, S., Edwards, H. M., & Zhao, T. (2012). Exploring think-alouds in usability testing: An international survey. IEEE Transactions on Professional Communication, 55(1), 2-19. https://doi.org/10.1109/TPC.2011.2182569
Wöbbekind, L., Mandl, T., & Womser‐Hacker, C. (2021). Construction and First Testing of the UX Kids Questionnaire (UXKQ). https://doi.org/10.1145/3473856.3473875