Here, we explore the possibility of improving our adaptive learning model by using information present in speech signals during spoken retrieval attempts. We extracted high-level prosodic speech features, such as pitch dynamics, speaking speed, and intensity from over 7,000 utterances. We demonstrate that some prosodic speech features are associated with accuracy and response latency for retrieval attempts, and that speech feature-informed memory models make better predictions of future performance relative to models that only use accuracy and response latency.