Effective vocabulary with Zipf's Law
Are you trying to learn Dutch but feeling overwhelmed by the sheer number of words you need to know? Well, you might not actually need all that many to hold a basic conversation. Insights from linguistics and mathematics show that you won’t need much more than 750 carefully selected words. And guess what, we’ve already collected them for you. It’s all about studying smarter. Let’s discuss Zipf’s Law and see why 750 is such a magical number.
What is Zipf’s Law?
Zipf’s law, named after linguist George Zipf, is a principle in linguistics that describes the frequency of words in a language. However, It seems that French stenographer Jean-Baptiste Estoup noticed the pattern before it was popularized by Zipf (Fairthorne, 1969). According to Zipf’s law, the frequency of a word is inversely proportional to its rank in a frequency table. In other words, the most frequent word will occur twice as often as the second most frequent word, three times as often as the third most frequent word, and so on.
Zipf’s law has been observed in various languages and has been found to hold true across a wide range of language corpora (Fagan & Gençay, 2010). The law has been used to analyze the vocabulary of languages, predict the difficulty of language learning, and even to design more efficient computer algorithms.
While it has been observed that Zipf’s law holds true for all languages, including constructed languages like Esperanto (Manaris et al., 2006), the exact reason for this is still not fully understood.
Why does it matter?
It matters because this means that you can cheat your way to basic Dutch fluency by carefully selecting the vocabulary you study. Luckily, we can estimate how many words we need to speak and understand Dutch, at least in a basic form.
According to Zipf’s Law:
- The top 50 words of a language cover approximately 25% of the vocabulary.
- The top 100 words of a language cover approximately 50% of the vocabulary.
- The top 250 words of a language cover approximately 65% of the vocabulary.
- The top 500 words of a language cover approximately 75% of the vocabulary.
- The top 750 words of a language cover approximately 80% of the vocabulary.
The first few words make up much more of the vocabulary than words that are ranked lower in frequency. From 50 to 100 words the coverage jumps from 25% to 50%. However, by adding an additional 250 words to the top ranked 500 words, you’ll only add 5% coverage. And after that each new word will cover only a smaller and smaller percentage of spoken or written Dutch.
How to use Zipf’s law to speed up learning
By focusing on just 750 words you can learn about 80% of the words found in common spoken and written Dutch. Don’t think you can manage 750 words? By learning just 250, you’ll already be able to understand 65% of spoken and written Dutch.
The first 250 words will cover mostly words that are absolutely core to the language. Verbs like ‘hebben’ (to have), ‘zijn’ (to be), ‘willen’ (to want), ‘eten’ (to eat), ‘drinken (to drink), etc. Nouns focussing mostly on the family and work. Adjectives that demonstrate clear differences such as ‘groot’ (big) and ‘klein’ (small) or ‘nieuw’ (new) and ‘oud’ (old). These words will be enough to understand very basic conversations and communicate simple ideas.
The next 250 words, bringing us up to 500 words, covers just 10% of the Dutch vocabulary. However it helps broaden understanding with verbs such as ‘geven’ (to give), ‘helpen’ (to help), ‘proberen’ (to try), and ‘wachten’ (to wait). Nouns focus more on spaces, the body and other people. You’ll also see a bunch of words that are used to express emotion. These 250 words help express your feelings and relate to other people and things.
The next 250 words, bringing us to a total of 750 words, cover just 5% of the words in spoken and written Dutch words. These words add important new skills, such as the words related to counting and family. Verbs such as ‘zoeken’ (to search), ‘vallen’ (to fall), and ‘slapen’ (to sleep) further expand your ability to have simple conversations.
Our Top Dutch Words To Learn
We’ve done a corpus analysis of over 20.000 hours of spoken Dutch in tv-programmes. From all that data we’ve collected the 750 most frequently occurring words and stuck them together in a convenient list to help you study. You can get started with our Core Vocabulary lists for free on Quizlet.
Want to actually put those new words into action? During our private lessons you’ll have more than enough time to practice. Personal 1 on 1 guidance from one of our tutors ensures you’ll make amazing progress in very little time. Read more about our online and offline Dutch lessons here.
Scientific references in this article
We believe science-backed education is the future. Therefore we will always include links to the scientific research on which we base our methodology. The articles cited in this article are:
Fagan, S., & Gençay, R. (2010), “An introduction to textual econometrics”, in Ullah, Aman; Giles, David E. A. (eds.), Handbook of Empirical Economics and Finance, CRC Press, pp. 133–153.
Fairthorne, R.A. (1969). “Empirical Hyperbolic Distributions (Bradford‐Zipf‐Mandelbrot) for Bibliometric Description and Prediction”. Journal of Documentation. 25 (4): 319–343.
Manaris, B., Pellicoro, L.; Pothering, G., Hodges, H. (2006). Investigating Esperanto’s statistical proportions relative to other languages using neural networks and Zipf’s law. Artificial Intelligence and Applications. Innsbruck, Austria. pp. 102–108.