| |
|
In creating the Macmillan School Dictionary, the team
has benefitted from two sets of corpus data:
The World English Corpus
A
unique modern database of over 200 million words revealing fresh
information on how words are used and natural examples of English
as it is written and spoken now! This is the most up-to-date corpus
created from British and American written and spoken text collected
from a range of media – books, magazines, newspapers, e-mails,
television and radio; written and spoken text from learners of English;
and ELT written and spoken materials.
|
|
The Macmillan Curriculum Corpus
A
20 million-word corpus specially developed for the Macmillan
School Dictionary. This unique corpus includes texts from coursebooks
of different levels and school subjects, from countries where English
is used as a second language, and from countries where English is
the medium of instruction in schools.
|
 |
|
|
| |
Frequently asked questions about the
World English Corpus
How big is the corpus?
The corpus contains a total of around 220 million words of written and
spoken text.
What are the major components of the corpus?
The World English Corpus is made up of the following:
- a British English component;
- an American English component;
- World English component;
- a corpus of learners’ text;
- a corpus of ELT materials including coursebooks and readers.
What types of texts are included in the different corpora?
Academic discourse, print and broadcast journalism, fiction, recorded
conversations (including telephone calls), recorded business meetings,
general non-fiction, answerphone messages, emails, legal texts, academic
seminars, cultural studies texts, radio documentaries, broadcast interviews,
ELT course books, text written by learners of English, including essays
and examination scripts.
|
|
What is the ratio of the written and spoken texts in the corpus?
The ratio is about 9:1 (written:spoken).
How was the corpus used in creating the dictionary?
The corpus is our primary source of information about the way words
behave. It forms the basis of our description of word meanings and of
the way words combine with each other (syntactically and collocationally).
It also provides information about frequency – of words, meanings,
grammatical patterns, and collocations. And finally, it is the main source
of the example sentences shown in the dictionary.
What types of computer program were used to get information from the
corpus?
Like most dictionary publishers, we used ‘concordancing’
software to investigate word behaviour and word patterning. In addition,
the MED team used new, state-of-the-art ‘lexical profiling’
software, which gives us the most detailed and most reliable information
about collocations that has ever been available.
Read more about how lexicographers use corpora here.
|
|