|
Corpora Web Guide Introduction When
Samuel Johnson penned the first English Language Dictionary the only
arbitrator of what words should or shouldnt be included in the
dictionary was Johnson himself. Nowadays the choice of words for
inclusion into a dictionary is far more objective and scientific.
The use of corpora is not simply limited to the recording of the
frequency of a word use but can also provide us with patterns of
usage enabling subtle differences in use and meaning to become
clearer. With
the increased access and use of the Internet this tool has become
even more powerful. It is now possible for all of us to become
amateur lexicographers and the Internet provides us with an
extremely useful research and teaching tool. What
we have tried to do below is provide you with a guide to corpora on
the Internet. We have divided the sites into categories including
Corpora sites; Articles on corpora; and, Materials for teaching. Of
course, there is an overlap, for example, the Cobuild site is both a
corpora site and a teaching aid and some of the articles include
ideas that can be used in the classroom. If
you feel we have left out any sites that should be included we would
be grateful if you could let us know, send your ideas to
. Terminology Dont
understand some of the terminology being used? Dont worry help
is at hand! http://donelaitis.vdu.lt/publikacijos/SDoCL.htm One
of the aspects of Corpora Linguistics that often makes it less
appealing than it could be is the large amount of terminology that
is used. This basic online dictionary is devoted to simple
explanations of the most common terminology employed in the field of
Corpora Linguistics. Corpora
Sites The
first section of this Web Guide takes a look at a variety of Corpora
and Concordance sites. These sites are designed to give you access,
although sometimes limited, to a variety of Corpora and Concordance
facilities. Each site mentioned has a brief description. For a more
detailed outline of what each site contains and what can be accessed click on the link and
continue to read. http://devoted.to/corpora
Added September 2003 The
British National Corpus online. Follow the on-screen instruction to
be able to use the facility. For the full service you can subscribe
online at a cost of around £50. If you wish to try out the BNC
World Edition and see what you can do with it then click on the
simple search interface icon. You will then be given a
search box in which you can type you query before clicking on
the Solve it! button. A display of up to 50 hits will be
shown. http://titania.cobuild.collins.co.uk Online
access to the Bank of English corpus. You can sample the concordance
programme free of charge or subscribe to the full version for a fee
of £500 per year. For
the sampler click on simple concordance
demo and then in the query box type in the word you want
to check. Decide on the corpora basis that you would like to use
(British books, American books, British transcribed speech) and
finally click on Show Concs button to get forty lines of
concordance samples. Another
feature well worth visiting is the Wordwatch feature which
takes a look each week at a word or phrase using the Cobuild
concordance program. http://titania.cobuild.collins.co.uk/wordwatch.html The
site also contains an archive feature giving you access to over 300
articles. http://www.ucl.ac.uk/english-usage/ice-gb/sampler/download.htm This
site gives you a downloadable version of the International Corpus of
English. The sample corpus contains 10 texts (of over 20,000 words)
from the ICE-GB Corpus. This is split into five spoken texts
(dialogues, monologues, scripted and unscripted) and five written
texts (one each from student essay, academic writing,
correspondence, news report and instructional writing). To have
access to the full ICE-GB Corpus you need to purchase a CD-ROM which
contains a further 490 texts. The Corpus site also includes a number
of facilities to explore the corpus in detail. These include a
corpus map, text browser, fuzzy tree fragments, FTF matches and
syntactic trees. For
more information on ICE and other research projects, go to http://www.ucl.ac.uk/english-usage/ Mike
Scotts Web page includes various Wordsmith tools. There is
a downloadable demo of the six tools available and a complete
version can be acquired for a fee. The six tools include: Wordlist
generates wordlists in either alphabetical or frequency order
enabling text comparison at a lexical level. Concord
a fairly comprehensive concordance program. Keywords
enables you to compare frequency of words in a text in
comparison to general frequency and therefore helps define texts by
genre through lexis. The
other three tools are file management tools enabling you to
organise texts to make analysis easier. http://www.longman-elt.com/dictionaries/corpus/lccont.html Lots
of information on the Longman/Lancaster corpus, includes some good
examples of the type of information used to create a corpus, for
example http://www.longman-elt.com/dictionaries/corpus/sound.html.
There are also links and information on the Longman Learners
Corpus, The Longman Written, and Spoken, Corpora and samples from
the BNC. At this time there does not appear to be public access to
the Corpora. http://web.bham.ac.uk/johnstf/ddl_lib.htm This
is the start of an online Virtual DDL Library being put together by
Tim Johns at Birmingham University. It contains samples of
concordance based materials for teaching. The material contains some
interesting examples of concordance patterns using corpora as well
as being a basis for practical ideas that can be used. More
examples produced by some participants at a workshop run in North
Bohemia can be seen at http://web.bham.ac.uk/johnstf/unl_ddl.htm Also
access to 76 samples focusing predominately on academic writing can
be found at http://web.bham.ac.uk/johnstf/timeap3.htm#revision Also
available are two pieces of freeware (Cloze and Context) that can be
downloaded and used to generate activities and exercises based on
corpus material. These programs can be found at http://web.bham.ac.uk/johnstf/timcall.htm#cloze This
site includes an Internet based concordance program. Simply type the
word or phrase you are looking for into the search box and press
Go, a few seconds later you will be shown examples of the
word/phrase from a large Corpus of over 2 million texts. There
are also parallel corpora available as well as Chinese, French and
Japanese corpus. The
site also contains a selection of other authoring tools
one of these, the Cloze Maker, would be a useful tool to use
in combination with the concordance feature. http://www.kamakuranet.ne.jp/~someya/ This
site has an online concordance program that is almost entirely
devoted to business letters simply click on the Online BLC
KWIC Concordancer (E) to gain access to the program. The corpus
is one-million words taken from a Business Letter Corpus. You can
adjust the search type, line width and sort type as well as
accessing a few other corpora (including some letters from famous
people as well as a few works of literature). http://www.dundee.ac.uk/english/wics/wics.htm Limited
to the work of six of the most famous British poets this site, from
Dundee University, gives you the opportunity to use concordancing
for analysis of literary language. http://nora.hd.uib.no/icame.html This
address links you to the ICAME (International Organisation of
Linguists and Information Scientists of Machine Readable Texts).
There are details here of many aspects of ICAME including how to
access the Corpora (and how to become a registered user). http://www.nsknet.or.jp/~peterr-s/ Click
on the Concordancing icon to gain access to a wealth of information
from a section on terminology to suggestions on how to make the best
use of concordance programs. There is also a link to a JAVA based
concordancer. Click on the relevant icon and then select from a
wealth of print texts from the Guardian and a few classical novels. Other
Languages Although
this site is predominantly concerned with English Corpora we felt
that some people may well wish to access Corpora and Concordance
tools for other languages. Here are just a few that we have found. Maybe
you want to use a concordance program with another European language
(not just English). The TRACTOR archive, collected and collated by
the Centre of Corpus Linguistics at Birmingham University, provides
monolingual and multilingual language resources available on-line in
the following languages: Bulgarian, Croatian, Czech, Dutch, English,
Estonian, Finnish, French, German, Greek, Hungarian, Italian,
Latvian, Lithuanian, Polish, Romanian, Russian, Serbian, Slovak,
Slovene, Swedish, Turkish, Ukrainian and Uzbek. To
use the TRACTOR archives you need to become a member of (TUC). This
can be done by either contributing to the database (in which case
membership is free) or on payment of a small fee, details of all
this can be found on the site. The
Linguistic Data Consortium has a number of Corpora for sale
including Arabic and Chinese. http://www.icp.grenet.fr/ELRA/ The
European Language Resource Association offers Corpora resources for
a number of European Languages. This association is mainly aimed at
institutions and, like many European organisation sites, is not easy
to navigate around. http://www.ruf.rice.edu/~barlow/corpus.html This
site, maintained by Mike Barlow, contains lots of links to various
corpora sites for a variety of languages. Programs Do
you want to create your own Corpora, or perhaps make your own
Concordance program? Well, here are a selection of online programs
to help get you started. http://www.rjcw.freeserve.co.uk/ If
you would like to design your own Concordancer then here is a site
which allows you to buy the software necessary (at a cost of $89).
Simply follow the online instructions. http://www.marlodge.supanet.com/software.html This
site includes a number of freeware programs that can be used for
text patterns that in many cases complement concordance ideas. http://ourworld.compuserve.com/homepages/Christopher_Tribble/ This
personal homepage includes links to a few articles on using Corpora
in language learning as well as a few programs that can be
downloaded. For the freeware click on the Using Corpora in
English Language Education icon and then scroll down to the
section entitled Word Macros for English Language Teachers. Corpora
Resources What
do we mean by Corpora resources? Well, one of the features of the
Programs sections was the opportunity for you to design and create
your own Concordance programs. In order to do this effectively you
will need to have access to Corpora texts. The
Gutenburg project aims to provide online access to as many texts as
possible that no longer have copyright restrictions. Use these texts
as the basis for any Concordancing program you create (note the
Dundee University Concordance site mentioned in the Corpora Site
section of this Web Guide). http://harvest.rutgers.edu/ceth/etext_directory/ A
directory giving you access to hundreds of sites containing
electronic texts that could be used as the basis for your own
corpus. http://info.ox.ac.uk/ctitext/service/workshop/etext/ This
webpage gives some practical advice on how to find electronic texts. Articles There
is a lot to learn about Corpora from the technical side of things to
the practical applications. Here are a series of links to articles
that should give you lots of food for thought as well as answering
many of your questions. If you thought Corpora couldnt be
interesting just click and read. http://www.hltmag.co.uk/prev.asp Lots
of fascinating articles on corpora, most of them written by Michael
Rundell. To access them you need to click on the View by
Categories and click on Corpora ideas then Show. This
will bring up a list of all the articles from previous editions of
this online ELT magazine. http://helmer.hit.uib.no/icame.html The
International Computer Archive of Modern and Medieval English
Journal contains lots of articles on every aspect of corpora
available in Acrobat reader format online. If you have registered
using the CD-Rom you will also have access to use the ICAME corpora
online. http://www.hf.uib.no/i/Engelsk/colt/COLTinfo.html At
the bottom of this page you will find seven short articles based on
Corpus of London Teenage Language. Read the articles and compare the
information here to your expectations of language use or with
examples taken from other corpora. COLT is available on CD-Rom. http://www.bangkokpost.net/education/site2002/cvap0202.htm Start
off by reading a general article on dictionaries and technology. The
article begins by talking about the Cobuild corpus and has a brief
interview with Ramesh Krishnamurthy who still works on the Cobuild
project. This is followed by an interview with Gwyneth Fox in which
she gives a concrete example of how a corpus can be used to check
lexical usage. Finally the article discusses the need for more
sophisticated software as the size corpora gets ever bigger. There
are lots of other interesting articles and ideas for teaching on
this site some linked to vocabulary while others cover other
aspects of language teaching. http://www.longman-elt.com/dictionaries/corpus/lrcorpus1.html An
interesting article by Michael Rundell which looks at the
implications of corpus for ELT. Rundell mentions some limitations
and pitfalls of corpus as well as looking at the benefits. http://www.ruf.rice.edu/~barlow/stevens.html An
article that looks at three key questions connected with using
concordancing with language learners; why? when? and what? The
article first appeared in CAELL Journal, vol 6 #2, Summer 1995 pp.
2-10. http://www.ling.lancs.ac.uk/monkey/ihe/linguistics/contents.htm This
site is designed to supplement Corpus Linguistics by McEnery, T
& Wilson, T. Edinburgh
University Press (2001). They take a look at four main areas
connected with the topic: Early Corpus Linguistics and the Chomskyan
revolution; What is a Corpus and what is in it?; Quantitative data;
and, The use of Corpora in Language Studies. Materials
for teaching Ive
looked at everything so far and Im still not convinced this is
for me. Im a classroom teacher and I need to be able to apply
this stuff to my daily teaching. Well, here are a few links that
bring the use of Corpora ever nearer to the chalk face (or computer
screen). http://www.onestopenglish.com/News/Magazine/Vocab/vocab2.htm http://www.onestopenglish.com/News/Magazine/Vocab/collocationmain.htm The
Onestopenglish Magazine contains a section on vocabulary teaching.
Here are a set of lessons focusing on Metaphor in English and on
Phrasal verbs and collocation. The lessons here are prepared using
the Macmillan English Dictionary. http://www.plumbdesign.com/thesaurus/thinkmap.html This
Virtual thesaurus gives a new perspective into the
relationship between words. Once the homepage has loaded click on
the loaded, click to launch icon to enter the display. Words
will be displayed in a mind map format with fine lines showing the
relationship between the words. Click on a word to show words
related to that particular one while those unrelated disappear from
the display. Users can search for any word or phrase by using a
simple text-entry box. You can also search words based on part of
speech, for example, similar nouns or verbs. http://titania.cobuild.collins.co.uk Click
on The Definitions Game icon to play a wonderful word game
based on sentences from the corpus. You are given a sentence with a
word missing (####). Your task is to guess that missing word. Once
youve tried click for the answer and then move on to another
sentence and another word. http://www.worldwidewords.org/ A
really interesting site all about Words. http://rdues.uce.ac.uk/newwords.shtml
Link updated December 2004 Are you interested in new words? Take a look at this site dedicated to Neologisms taken from the Independent newspaper from the UK. Other Some
odds and ends that dont fit under any of our previous categories,
but that we think may still be of some interest. http://www.marlodge.supanet.com/wscape/index.html The results of concordancing are designed to show language patterns but can also show other patterns. Wordscapes is an intriguing idea of how artistic corpora and concordancing can be. http://nora.hd.uib.no/fileserv.html One of the many Lists concerned with Corpora. Follow the instructions to join. http://www.ltg.ed.ac.uk/helpdesk/faq/index.html This includes some useful links connected to FAQ (Frequently asked questions) on Corpora and Corpus Analysis tools and software. Check this site out as it may already contain answers to your questions. http://www.telenex.hku.hk/telec/mainmenu.htm Interesting site based in Hong Kong. Some parts have open access, some need registration which can be done for free and other parts are only open to teachers working locally. |