Skip to main content

Do men write more about coffee?

Based on HathiTrust + NovelTM,BookCorpus,Blog Authorship Corpus,TED Talks data.

When you read a lot of one genre, the tropes get heavy-handed fast. For me that genre is detective fiction, the Adrian McKinty series and Volker Kutscher’s Babylon Berlin novels in particular. The setup is always the same: a tough, male anti-hero arrives in a new city (fine), drinks whiskey (fine), scores girls (still fine), and reaches for coffee constantly (not fine). The constant coffee drinking felt like an inherently masculine thing, so I checked whether male authors actually write the word coffee more than female ones.

Turns out my hunch was wrong:

Coffee and tea are female beverages, beer and whiskey male.

The data sources

Before focusing on literature, four datasets of varying size and origin let us check whether the pattern holds across contexts. For coffee, the answer is clear: female authors use the word roughly 30% more often than male authors. TED talks are the outlier, which makes sense because they’re spoken word and the smallest dataset.

Gender imbalance in beverage references across corpora
Four anglophone corpora, female-to-male reference ratio
1.5×11.5×← men reference coffee more oftenwomen reference coffee more often →Fiction books125k books · 14.1B words1700-20101.33×Self-published novels10k books · 614M wordsc. 2010s1.31×Internet blogs19k blogs · 136M words20041.30×TED talks2k talks · 4.2M words2006-20171.14×Source: HathiTrust + NovelTMBlog Authorship CorpusTED TalksBookCorpus

Gendered beverages

From here we look at literature only, using the HathiTrust + NovelTM corpus. It covers about 125,000 anglophone novels from 1700 to 2010, with author gender inferred from first names using NomQuamGender.

Beverages split clearly by author gender:

Gender imbalance per beverage
125k anglophone books, 1700-2010
1.5×11.5×tea1.78×coffee1.33×water0.85×wine0.80×beer0.69×whiskey0.68×← men reference morewomen reference more →Source: HathiTrust + NovelTM

This split has been stable across centuries. The use of the word coffee has been growing since 1820, and female authors stay ahead of male ones throughout. The one switch is wine: female authors overtake male ones around 1970.

Gender imbalance in beverage references over time
123k anglophone books, 1820-2010
WomenMenTotal18201840186018801900192019401960198020002k5k10k20k50k100kWords per referencePublication yearSource: HathiTrust + NovelTM

Anatomy of the gap

But maybe women write more romance, romance has more coffee scenes, and the gender imbalance is really a genre gap. For about 5,700 of the HathiTrust books we have NovelTM genre tags.

The 12 biggest genres show the gap holds in most of them, and detective novels sit near the top for coffee references (about 33% above the corpus average).

Reference rates within genre, by author gender
12 fiction subgenres in 5,740 anglophone books, 1700-2010
05101520all men10.4all women13.5coffee references per 80,000-word bookhumormystery/detectivepsychological fictionsuspense fictiondomestic fictionlove storiesbildungsromanadventurehistorical fictionbiographical fictionscience fictionfantasySource: HathiTrust + NovelTM

One step past genre, we look at what other words show up on pages that mention coffee. For the 40 with the biggest male-female gap, we check whether female or male authors use each one more often.

Food, setting, and container words lean female (kitchen, breakfast, cream, sugar, cups, butter, milk). Consumption verbs lean male (drank, drinking, drink).

Words co-occurring with coffee, by author gender
125k anglophone books, 1700-2010
1%5%10%25%0.711.31.52parity (F = M)Share of coffee pages with this wordwomen reference more →← men reference morecupcupsbreakfasttablekitchenpotsugarsippeddrankeggsbacontoasttraydrinkingcreammugsippingpouredhotwaitermorningsipatesatbreaddrinkstovecounterwaitresslunchsandwichescigarettemilkroombutterteafriedrestauranteatplatecontainerfoodsettingverbotherSource: HathiTrust + NovelTM