This article is from the archives of the UB Reporter.
Electronic Highways

Tracking cultural trends with Google

Published: February 3, 2011

Google Trends is a rather compulsive site—it provides insight on what’s popular on the Internet or in Google’s words, “a snapshot of what’s on the public’s collective mind.” To identify what’s “trending” at any given moment, the site monitors data on Internet activity, such as how frequently words and phrases are searched for and how often they have appeared in news stories, blogs, Twitter and so on. At the time of this writing, trending topics include tablet PCs, Oscar nominations and Taco Bell meat (eww!). You also can type in words or phrases to see how their trends compare back to 2004; for example, here’s the chart for SUV, hybrid, MPG, peak oil, Armageddon.

But what if you’re curious about what was on the public’s collective mind back before the Internet or, say, even the 1850s? Well, then Google’s latest entry in its encyclopedic offerings, the Books Ngram Viewer, is the site for you. The Ngram Viewer creates trend lines based on word frequencies in 5 million books published since 1800. The books come from Google Books, which so far contains online versions of 15 million out of the estimated 130 million books that have been published since the Gutenberg Bible debuted in the 1450s.

The Ngram Viewer is the result of work published by a group of researchers in a Science article called “Quantitative Analysis of Culture Using Millions of Digitized Books.” The researchers envision their work as a new way to “rigorously study the evolution of culture on a grand scale” and have dubbed this new field “culturomics” (the word echoes the similarly data-driven “genomics” not “economics”). An interesting finding from their work is that the English language is growing remarkably fast, much faster than is reflected in dictionaries—about half of the words in the Ngram Viewer (even with proper nouns excluded) are not in any dictionaries. Perhaps this finding will give comfort to politicians like Sarah (“refudiate“) Palin and George (“misunderestimate”) Bush who are prone to expressing themselves using such “lexical dark matter.”

While culturomics offers up a new and potentially revelatory way to quantitatively analyze prodigious amounts of cultural data (and the data, 2 billion words and phrases, can be downloaded at the Culturomics website), the merits of this new field are hotly debated by social scientists and humanists, who point out the dangers of analyzing data out of context. Then again, one of the compelling things about the Ngram Viewer is that it’s fun to forget all about context and try a bunch of queries to see what the trend lines look like, for example, for University of Buffalo, University at Buffalo, SUNY Buffalo, SUNY at Buffalo or buffalo wings, hot wings, chicken wings or fired, laid off, sacked, canned. For more, check out the compilations of ngrams at N-teresting N-grams and The Atlantic.

It’s also fun to plug in people’s names to see how their fame trends. While Theodore Roosevelt is the most famous president according to the data, it’s more interesting to look at lesser-known presidents like Grover Cleveland, Millard Fillmore and Chester Arthur (As an aside, isn’t it time Grover, Millard and Chester became trendy baby names? Well, there’s a site for that data, too). According to the culturomists, people rise to fame more quickly and at a younger age than in the past, but their fame fades more precipitously. Perhaps a disheartening, though not unexpected, finding for people in academia is that “science is a poor route to fame” and that mathematicians find fame particularly hard to come by.

Of course, to find more accurate trends in scholarship, Web of Science is a good place to go (the UB Libraries subscription provides access to data back to 1965). Try searching on a topic and then click on “Create Citation Report” or “Analyze Results” to view all sorts of trends in the data. And another emerging source for data on scholarly literature is JSTOR’s Data for Research site that allows data mining of JSTOR’s deep archive of scholarly articles.

—Charles Lyons, University Libraries