noun [uncountable]

a field of investigation which links cultural trends to a quantitative analysis of word use over a particular period of time

'Two hundred years of history in the form of 5,195,769 digitised books can now be probed for cultural trends using Google's new culturomics tool.'

The Guardian 17th December 2010

In January 2011 the word culturomics was voted the neologism of 2010 which was 'least likely to succeed' by the American Dialect Society. Nevertheless the term has sparked a flurry of discussion in recent months, engaging the attention of high-profile linguists across the world. With its strong claims about the link between language use and culture, the concept of culturomics is worth a second look.

In late 2010 it was announced that a team of researchers at Harvard University had put together a corpus of 500 billion words from a Google scanning project of more than 5 million texts. This represented an unprecedentedly large body of language data (including German, French, Spanish, Russian, Chinese and Hebrew as well as English), which was thought to constitute in the region of 4% of books ever printed. Alongside, they had developed a tool, entitled the n-gram viewer, which produced a graphic representation of the frequency of occurrence of a word or sequence of words (an n-gram) within the corpus over the 200 years between 1800 and 2000.

the graphs are windows into a broad spectrum of evolving cultures … fame, ethical issues, politics, religion or the adoption of technology

This experimental tool proves very interesting, because it can give us information about the evolution of language within a 200-year timeframe, indicating what lexical, or indeed grammatical, changes occurred. This graph, showing the comparative use of past participles dreamed and dreamt, is one example of a grammatical swing.

But what's particularly significant is that the graphs can also give us a very engaging insight into the way language use depicts historical and cultural hotspots. The graph for the word ration, for example, has a dramatic peak in the early 1940s (World War II and aftermath), and a similar effect can be seen for nuclear (1980s) and Beatles (1960s). Other words show a more gradual ascent (e.g. mobile), or decline (e.g. petticoat).

Though by no means a new idea, this kind of language analysis, showing the intersection of language use and concepts of cultural significance, has been dubbed culturomics by the authors of this research. They argue that the graphs are windows into a broad spectrum of evolving cultures – such as for example fame, ethical issues, politics, religion or the adoption of technology. The words themselves, they claim, somehow represent a chunk of our cultural make-up, which they refer to as a culturome.

However, though these graphs can indeed be fascinating, as with all such electronic language tools, the results are not perfect, and the cracks really begin to show when you ask the tool to analyse newer words. Searches for words podcast and webinar, for example, suggest that they were thriving way back in the late 18th and early 19th century! Such anomalies can be put down to a variety of problems, including incorrect dating of sources, typographical errors, the limitations of OCR (scanning) technology (which can skew data by favouring certain text types), and the basic fact that computational tools just can't interpret the data like humans can (e.g. a 'spike' for podcast might have occurred simply because the words pod and cast happened to appear next to each other in a sufficient number of late 18th century texts).

Nevertheless it's a clever exploitation of a large body of language data, and though its main assertion is something we always knew anyway – that cultural and historical events are reflected in language use – there's something extremely engaging and slightly addictive about seeing facts brought to life in graph form.

Background – culturomics

There has been almost as much discussion on the word formation strategy underlying the new term culturomics as there has on the concept itself. Though it looks like it's modelled on terms like economics, ergonomics, etc, the clue to its evolution is that it is in fact pronounced with a long 'o' vowel, as in 'home'. Culturomics is modelled on the term genomics – the study of gene sequences within living organisms. Such gene sequences are referred to as genomes, and hence, by extension, culturomics makes reference to culturomes.

Having started with the term genomics, the suffix -omics appears to be becoming increasingly productive, spawning a raft of related terms in the bio-sciences (see this page in Wikipedia for some examples). Use in the term culturomics is its first significant divergence into a domain outside of biology.

by Kerry Maxwell, author of Brave New Words

This article was first published on 25th March 2011.

