Can you learn 80% of Spanish in 2 months? – A case study

Introduction

How much would you understand of a book written in a language you don’t know? Not much. But if you learned just 1.000 of the most common words, could you read Harry Potter? Surprisingly, you would understand 80% of the total words in the entire book. 1.000 words cover roughly 80% of Spanish fiction, and 88% of verbal communication (Davies, 2005). So I put this to the test: I used MemoryLab to learn the 1.000 most common Spanish words over the course of two months, and then tried reading Harry Potter y la piedra filosofal.

Do 1.000 unique words really cover 80% of a text?

Zipf’s law states the frequency at which unique words appear in a text follows a predictable pattern, which has been shown in different languages (e.g. 50 languages; Yu, Xu, & Liu, 2018). The most commonly used English word; ‘the’, appears twice as often (7% of the total words) as the second most frequent word ‘of’ (3.5%). The third; ‘and’, occurs ⅓ as often (2.9% ≈ ⅓*7%), and so forth (Kedia, 2017). To illustrate this power curve, Figure 1 shows an analysis of subtitles in 5 different languages.

Figure 1: Percentage of total words versus unique word rank in 5 language’s subtitles (extended graph from Wells (2017))

Similarly, I confirmed that 1.000 words cover exactly 80% of the text in Harry Potter part 1, La Piedra Filosofal (see Figure 2 and 3). The book contained a total word count of 78050. As can be seen, 81 unique words account for 50% of the total word coverage. Interestingly, just 81 words (1% of the 8234 unique words!) cover 50% of the book. 

Figure 2: Table of cumulative percentage of word use in Harry Potter part 1.
Source text:
Harry Potter 1 – La Piedra Filosofal
Figure 3: Plot of cumulative percentage of word use in Harry Potter part 1

But how much do the 1.000 most common words actually help us understand what we’re reading? I venture to guess that Longbottom isn’t the 1.000th most frequently occurring word in the Spanish language (while it is in Harry Potter, see Figure 2). What proportion of existing words in the Spanish language do you actually know with 1.000 words?

Dictionary definition of number of words

Defining the number of words in a language is tricky. The official number of Spanish words in the dictionary is roughly 93.000 (Real Academia Española, 2014). But the dictionary defines most “words” in a single form. Verbs are listed in their infinitive form (e.g. to live = vivir). However, verbs vary in tense (past vs present), person (e.g. I, you) and number (i.e. singular vs plural) possibly leading to 50 different conjugated forms of a single verb! (Spanish Academy Team, 2023; Wikipedia, 2025) Similarly, nouns and adjectives are listed in their (masculine) singular form. 

If you count all those inflected forms separately the number of total words increases marginally. For example, one of the biggest collections of analysable Spanish text – the Corpus del Español Actual (CEA) – found 635.000 unique words (Subirats & Ortega, 2012). This matters because I only focused on learning words in their base form.

Can you read a book knowing 1.000 words in simple form?

The Common European Framework of Reference for Languages does not prescribe vocabulary sizes for each proficiency level (Counsil of Europe, 2014). However, Milton & Alexiou (2009) estimate that 1.000 words would equate to level A2 of language comprehension. A2 covers simple conversation, but is it enough for fiction reading? 

I wondered if I could understand a text without learning any grammar. Researchers separate vocabulary knowledge in vocabulary breadth, i.e. the total number of words you know, and vocabulary depth, i.e. knowledge about word meaning, spelling, and hearing. Since earlier research (Li & Kirby, 2015) showed breadth is more highly associated with reading comprehension, than depth, my strategy of starting with only basic forms may be very efficient.

Furthermore, Ludewig, Hübner and Schroeder (2022) found that text coverage (the percentage of the total word count you understand) explains text comprehension better than overall vocabulary knowledge. Interestingly, above 56% text coverage, comprehension increased exponentially. In other words, if you know less than 56% of the words on a page, it’s going to be hard to understand. But once you know more than 56% of the words, each additional word you learn yields a bigger jump in understanding than it would have below that. Why? The researchers suggested that after this critical point you’re familiar with enough words to guess or infer meaning of other parts of the text. This likely isn’t some magic guideline though, but certainly interesting to put to the test.

When you learn 1.000 words in their simple form, how much text coverage do you achieve, and could you comprehend a Spanish book? 

Methods and results

​To evaluate the effectiveness of learning Spanish through the 1.000 most common vocabulary words, I converted this list into 44 MemoryLab lessons, each averaging 22 vocabulary items. Completing a lesson required three sessions, with the system awarding a ‘Mastery Crown’ upon sufficient progress. 

Over the course of 2 months I spent 30 min per day, achieving 132 crowns and answering 13,460 vocabulary prompts. By the end, I had reached a 68.3% accuracy rate on the 1.000 vocabulary items which I’ll equate to a 6.8 out of 10, more than enough for a passing grade in school. 

During the process I assessed my reading comprehension by reading a page of the book at three stages: prior to learning, after obtaining 50%, and after 100% of the Mastery Crowns (Figure 4). At the beginning, my text coverage was 47.2%, with unique word comprehension at 31.3%. By the end of the case study, I reached a text coverage of 59.3% (+12.1%), while understanding 49% (+17.7%) of the unique words. 

Figure 4. Progress in Spanish Reading comprehension on 3 different Harry Potter pages.

This increase was mainly theoretical however. Reading through a page became easier, but I still could only grasp a small portion of what was written. For example, I understood roughly where people were and who was involved, maybe even what object was being used, but I could not understand what was happening. Words that once blurred together into a language soup started forming meaning, but it felt like I still needed more context.

Discussion 

This case study confirmed vocabulary coverage matters, but more than vocabulary is needed. At ~60% coverage, I recognized 3 out of every 5 words in a sentence. But that didn’t mean I understood every sentence. Why? Grammar, context, and sentence structure still posed challenges. For instance, knowing haber (to have) doesn’t mean you understand había habido (there had been). Furthermore, you might know the word vivir (to live), but you won’t always recognise it in a sentence as vivieron or vivido.

Text coverage versus reading comprehension 

Ludewig et al. (2022) found that text coverage, not just vocabulary size, best predicts reading comprehension, especially when learners reach a critical threshold of around 56% coverage. Reading felt more approachable, and I could understand some sentences. But with 60% coverage I could not understand the book well enough to read it. However, as the paper’s authors suggest, I feel that short-text instructional materials with multiple choice questions might actually be manageable to learn with. One other factor that influenced my comprehension was the direction I learned the vocabulary in.

Learn from foreign-to-native for better reading comprehension

Despite learning the words from English-to-Spanish, my test required understanding from Spanish-to-English. This mismatch, well-known in the research literature (e.g. Morris et al., 1977), shows we recall information best in the same context or format we learned it in.  You need to understand and produce language to become a proficient user (Council of Europe, 2014). Moreover, what we’ve seen in this case study, and other research, corroborates (e.g. Nakata, 2016) that learning a language in one direction doesn’t mean you learned it both ways.

Conclusion

All in all, learning 1.000 words in Spanish gave me access to around 60% of the words in a typical fiction book, and a clearer understanding of how knowing vocabulary alone limits reading comprehension. However, watching a TV show with Spanish subtitles would likely be a better fit for learning, thanks to added audio, visuals, and emotion. That said, with just 30 minutes of practice each day, I built a foundation that made reading feel more approachable, if still a little mysterious. Retrieval-based learning works well, but 1.000 words may be just shy of what’s needed to comfortably dive into reading.

References

Council of Europe. (2014). Global scale – Table 1 (CEFR 3.3): Common Reference levels. Common European Framework of Reference for Languages (CEFR); Council of Europe. https://www.coe.int/en/web/common-european-framework-reference-languages/table-1-cefr-3.3-common-reference-levels-global-scale 

Davies, M. (2005). Vocabulary range and text coverage: Insights from the forthcoming Routledge frequency dictionary of Spanish. In D. Eddington (Ed.), Selected proceedings of the 7th Hispanic Linguistics Symposium (pp. 106–115). Cascadilla Proceedings Project. http://www.lingref.com/cpp/hls/7/paper1091.pdf

Li, M., & Kirby, J. R. (2015). The effects of vocabulary breadth and depth on english reading. Applied Linguistics, 36(5), 611–634. https://doi.org/10.1093/applin/amu007 

Ludewig, U., Hübner, N. & Schroeder, S. Vocabulary, text coverage, word frequency and the lexical threshold in elementary school reading comprehension. Read Writ 36, 2409–2431 (2023). https://doi.org/10.1007/s11145-022-10385-0 

Kedia, N. (2017). Zipf’s Law: Introduction to Text Analytics. Analytics Tuts. https://www.analytics-tuts.com/zipfs-law-introduction-text-analytics/

Milton, J. & Alexiou, T. (2009), Vocabulary size and the Common European Framework of Reference in Languages. https://link.springer.com/chapter/10.1057/9780230242258_12

Morris, C. D., Bransford, J. D., & Franks, J. J. (1997) Levels of processing versus transfer appropriate processing. Journal of Verbal Learning and Verbal Behavior, 16(5), 519-533. https://doi.org/10.1016/S0022-5371(77)80016-9 

Nakata, T. (2016). Effects of retrieval formats on second language vocabulary learning. International Review of Applied Linguistics in Language Teaching, 54(3). https://doi.org/10.1515/iral-2015-0022 

Real Academia Española. (2014). Diccionario de la lengua española (23rd ed.). Author. https://dle.rae.es/

Spanish Academy Team. (2023, February 1). Master the 18 Spanish tenses (and take our cheat sheet with you). Spanish Academy. https://www.spanish.academy/blog/master-the-18-spanish-tenses-and-take-our-cheat-sheet-with-you/​

Subirats, C. & Ortega, M. (2012). Corpus del Español Actual http://spanishfn.org/tools/cea/english 

Wikipedia (2025). Retrieved April 3, 2025, from https://en.wikipedia.org/wiki/Spanish_irregular_verbs

Yu, S., Xu, C., & Liu, H. (2018). Zipf’s law in 50 languages: Its structural pattern, linguistic interpretation, and cognitive motivation. arXiv. https://arxiv.org/abs/1807.01855

Offerte aanvragen?

Vul het onderstaande formulier in:

Of mail naar:

Aan de slag!

Wilt u meer weten?

Vul het onderstaande formulier in:

Of mail naar: