Did you know?

Click any word in a definition or example to find the entry for that word

World English Corpus

Corpus

For a detailed article by Editor-in-Chief Michael Rundell about corpora & dictionaries, visit this page: From corpus to dictionary.

To know how words are used, how common they are, and what words they are frequently used with, lexicographers analyse a corpus. MacmillanDictionary.com is based on the World English Corpus, a unique corpus of over 200 million words from spoken and written sources.

How big is the corpus?

The corpus contains a total of around 220 million words of written and spoken text.

What are the major components of the corpus?

The World English Corpus is made up of the following:

  • a British English component
  • an American English component
  • a World English component
  • a corpus of learners' text
  • a corpus of ELT materials

What types of texts are included in the different corpora?

  • academic discourse
  • print and broadcast journalism
  • fiction
  • recorded conversations (including telephone calls)
  • recorded business meetings
  • general non-fiction
  • answerphone messages
  • emails
  • legal texts
  • academic seminars
  • cultural studies texts
  • radio documentaries
  • broadcast interviews
  • ELT course books
  • text written by learners of English, including essays and examination scripts

What is the ratio of the written and spoken texts in the corpus?

The ratio is about 9:1 (written:spoken).

How was the corpus used in creating the dictionary?

The corpus is our primary source of information about the way words behave. It forms the basis of our description of word meanings and of the way words combine with each other (syntactically and collocationally). It also provides information about frequency - of words, meanings, grammatical patterns, and collocations. And finally, it is the main source of the example sentences shown in the dictionary.

What types of computer program were used to get information from the corpus?

Like most dictionary publishers, we use 'concordancing' software to investigate word behaviour and word patterns. In addition, we use 'lexical profiling' software, which gives us the most detailed and most reliable information about collocations that has ever been available.