Which accents does automatic speech recognition work best for?

July 11, 2016July 11, 2016 ~ Rachael Tatman ~ 5 Comments

If your primary dialect is something other than Standardized American English (that sort of from-the-US-but-not-anywhere-in-particular type of English you hear a lot of on the news) you may have noticed that speech recognition software doesn’t generally work very well for you. You can see the sort of thing I’m talking about in this clip:

This clip is a little old, though (2010). Surely voice recognition technology has improved since then, right? I mean, we’ve got more data and more computing power than ever. Surely somebody’s gotten around to making sure that the current generation of voice-recognition software deals equally well with different dialects of English. Especially given that those self-driving cars that everyone’s so excited about are probably going to use voice-based interfaces.

To check, I spent some time on Youtube looking at the accuracy automatic captions for videos of the accent tag challenge, which was developed by Bert Vaux. I picked Youtube automatic captions because they’re done with Google’s Automatic Speech Recognition technology–which is one of the most accurate commercial systems out there right now.

Data: I picked videos with accents from Maine (U.S), Georgia (U.S.), California (U.S), Scotland and New Zealand. I picked these locations because they’re pretty far from each other and also have pretty distinct regional accents. All speakers from the U.S. were (by my best guess) white and all looked to be young-ish. I’m not great at judging age, but I’m pretty confident no one was above fifty or so.

What I did: For each location, I checked the accuracy of the automatic captions on the word-list part of the challenge for five male and five female speakers. So I have data for a total of 50 people across 5 dialect regions. For each word in the word list, I marked it as “correct” if the entire word was correctly captioned on the first try. Anything else was marked wrong. To be fair, the words in the accent tag challenge were specifically chosen because they have a lot of possible variation. On the other hand, they’re single words spoken in isolation, which is pretty much the best case scenario for automatic speech recognition, so I think it balances out.

Ok, now the part you’ve all been waiting for: the results. Which dialects fared better and which worse? Does dialect even matter? First the good news: based on my (admittedly pretty small) sample, the effect of dialect is so weak that you’d have to be really generous to call it reliable. A linear model that estimated number of correct classifications based on total number of words, speaker’s gender and speaker’s dialect area fared only slightly better (p = 0.08) than one that didn’t include dialect area. Which is great! No effect means dialect doesn’t matter, right?

Weellll, not really. Based on a power analysis, I really should have sampled forty people from each dialect, not ten. Unfortunately, while I love y’all and also the search for knowledge, I’m not going to hand-annotate two hundred Youtube videos for a side project. (If you’d like to add data, though, feel free to branch the dataset on Github here. Just make sure to check the URL for the video you’re looking at so we don’t double dip.)

So while I can’t confidently state there is an effect, based on the fact that I’m sort of starting to get one with only a quarter of the amount of data I should be using, I’m actually pretty sure there is one. No one’s enjoying stellar performance (there’s a reason that they tend to be called AutoCraptions in the Deaf community) but some dialect areas are doing better than others. Look at this chart of accuracy by dialect region:

accuracyByDialect — Proportion of correctly recognized words by dialect area, color coded by country.

There’s variation, sure, but in general the recognizer seems to be working best on people from California (which just happens to be where Google is headquartered) and worst on Scottish English. The big surprise for me is how well the recognizer works on New Zealand English, especially compared to Scottish English. It’s not a function of country population (NZ = 4.4 million, Scotland = 5.2 million). My guess is that it might be due to sample bias in the training sets, especially if, say, there was some 90’s TV shows in there; there’s a lot of captioned New Zealand English in Hercules, Xena and related spin-offs. There’s also a Google outreach team in New Zealand, but not Scotland, so that might be a factor as well.

So, unfortunately, it looks like the lift skit may still be current. ASR still works better for some dialects than others. And, keep in mind, these are all native English speakers! I didn’t look at non-native English speakers, but I’m willing to bet the system is also letting them down. Which is a shame. It’s a pity that how well voice recognition works for you is still dependent on where you’re from. Maybe in another six years I’ll be able to write a blog post says it isn’t.

What types of emoji do people want more of?

June 21, 2016 ~ Rachael Tatman ~ Leave a comment

So if you’re a weird internet nerd like me, you might already know that Unicode 9.0 was released today. The deets are here, but they’re fairly boring unless you really care about typography. What’s more interesting to me, as someone who studies visual, spoken and written language, is that there are a whole batch of new emoji. And it’s led to lots of interesting speculation about, for example, what is the most popular new emoji is going to be (tldr: probably the ROFL face. People have a strong preference for using positive face emojis.) This led me to wonder: what obvious lexical gaps are there?

[I]n some cases it is useful to refer to the words that are not part of the vocabulary: the nonexisting words. Instead of referring to nonexisting words, it is common to speak about lexical gaps, since the nonexisting words are indications of “holes” in the lexicon of the language that could be filled.

Janssen, M. 2012. “Lexical Gaps”. The Encyclopedia of Applied Linguistics.

This question is pretty easy to answer about emoji– we can just find out what words people are most likely to use when they’re complaining about not being able to use emoji. There’s even a Twitter bot that collects these kind of tweets. I decided to do something similar, but with a twist. I wanted to know what kinds of emoji people complain about wanting the most.

Boring technical details 💤

Yesterday, I grabbed 4817 recent tweets that contained both the words “no” and “emoji”. (You can find the R script I used for this on my Github.)
For each tweet, I took the two words occurring directly in front of the word “emoji” and created a corpus from them using the tm (text mining) package.
I tidied up the corpus–removing super-common words like “the”, making everything lower-case, and so on. (The technical term is “cleaning“, but I like the sound of tidying better. It sounds like you’re getting comfy with your data, not delousing it.)
I ranked these words by frequency, or how often then showed up. There were 1888 distinct words, but the vast majority (1280) showed up only once. This is completely normal for word frequency data and is modelled by Zipf’s law.
I then took all words that occurred more than three times and did a content analysis.

Exciting results! 😄

At the end of my content analysis, I arrived at nine distinct categories. I’ve listed them below, with the most popular four terms from each. One thing I noticed right off is how many of these are emoji that either already exist or are in the Unicode update. To highlight this, I’ve italicized terms in the list below that don’t have an emoji.

animal: shark, giraffe, butterfly, duck
color: orange, red, white, green
face: crying, angry, love, hate
(facial) feature: mustache, redhead, beard, glasses
flag: flag, England, Welsh, pride
food: bacon, avocado, salt, carrot
gesture: peace, finger, middle, crossed
object: rifle, gun, drum, spoon
person: mermaid, pirate, clown, chef

(One note: the rifle is in unicode 9.0, but isn’t an emoji. This has been the topic of some discussion, and is probably why it’s so frequent.)

Based on these categories, where are the lexical gaps? The three categories that have the most different items in them are, in order 1) food, 2) animals and 3) objects. These are also the three categories with the most mentions across all items.

So, given that so many people are talking about emojis for animals, food and objects, why aren’t the bulk of emojis in these categories? We can see why this might be by comparing how many different items get mentioned in each category to how many times each item is mentioned.

Rplot02 — Yeah, people talk about food a lot… but they also talk about a lot of different types of food. On the other hand you have categories like colors, which aren’t talked about as much but where the same colors come up over and over again.

As you can see from the figure above, the most popular categories have a lot of different things in them, but each thing is mentioned relatively rarely. So while there is an impassioned zebra emoji fanbase, it only comes up three times in this dataset. On the other hand, “red” is fairly common but shows up because of discussion of, among other things, flowers, shoes and hair color. Some categories, like flags, fall in a happy medium–lots of discussion and fairly few suggestions for additions.

Based on this teeny data set, I’d say that if the Unicode consortium continues to be in charge of putting emoji standardization it’ll have its hands full for quite some time to come. There’s a lot of room for growth, and most of it is in food, animals and objects, which all have a lot of possible items, rather than gestures or facial expressions, which have much fewer.

Why do Canadians say ‘eh’?

May 31, 2016 ~ Rachael Tatman ~ 1 Comment

Perhaps it’s because Seattle is so close to Canada, but for some reason when I ask classes of undergraduate students what they want to know about language and language use, one question I tend to get a lot is:

Why do Canadians say ‘eh’?

Fortunately for my curious students, this is actually an active area of inquiry. (It’s actually one those research questions where there was a flurry of work–in this case in the 1970’s–and then a couple quiet decades followed by a resurgence in interest. The ‘eh’ renaissance started in the mid-2000’s and continues today. For some reason, at least in linguistics, this sort of thing tends to happen a lot. I’ll leave discussing why this particular pattern is so common to the sociologists of science.) So what do we know about ‘eh’?

Is ‘eh’ actually Canadian?

‘Eh’ has quite the pedigree–it’s first attested in Middle English and even shows up in Chaucer. Canadian English, however, boasts a more frequent use of ‘eh’, which can fill the same role as ‘right?’, ‘you know?’ or ‘innit?’ for speakers of other varieties of English.

What does ‘eh’ mean?

The real thing that makes an ‘eh’ Canadian, though, is how it’s used. Despite some claims to the contrary, “eh” is far from meaningless. It has a limited number of uses (Elaine Gold identified an even dozen in her 2004 paper) some of which aren’t found outside of Canada. Walter Avis described two of these uniquely Canadian uses in his 1972 paper, “So eh? is Canadian, eh” (it’s not available anywhere online as far as I can tell):

Narrative use: Used to punctuate a story, in the same way that an American English speaker (south of the border, that is) might use “right?” or “you know?”
1. Example: I was walking home from school, eh? I was right by that construction site where there’s a big hole in the ground, eh? And I see someone toss a piece of trash right in it.
Miscellaneous/exclamation use: Tacked on to the end of a statement. (Although more recent work, presented by Martina Wiltschko and Alex D’Arcy at last year’s NWAV suggests that there’s really a limited number of ways to use this type of ‘eh’ and that they can be told apart by the way the speaker uses pitch.)
1. Example: What a litterbug, eh?

And these uses seems to be running strong. Gold found that use of ‘eh’ in a variety of contexts has either increased or remained stable since 1980.

That’s not to say there’s no change going on, though. D’Arcy and Wiltschko found that younger speakers of Canadian English are more likely than older speakers to use ‘right?’ instead of ‘eh?’. Does this mean that ‘eh’ may be going the way of the dodo or ‘sliver’ to mean ‘splinter’ in British English?

Probably not–but it may show up in fewer places than it used to. In particular, in their 2006 study Elaine Gold and Mireille Tremblay found that almost half of their participants feel negatively about the narrative use of ‘eh’ and only 16% actually used it themselves. This suggests this type of uniquely-Canadian usage may be on its way out.

A Linguistic Analysis of #PronouncingThingsIncorrectly

May 4, 2016 ~ Rachael Tatman ~ Leave a comment

One of the really cool things about the internet is that it’s a great medium to observe linguistic innovations. A lot of examples of linguistic play that would have been pretty ephemeral are now safely recorded and shared. (Can you imagine being able to listen to the first examples of Pig Latin? In addition to being cool, it might have told us even more about syllable structure than the game itself already does.)

One example that I’m pretty excited about is #PronouncingThingsIncorrectly, which is a language game invented by Chaz Smith. Smith is a Viner, Cinema Studies student at the University of Pennsylvania and advocate for sexual assault prevention. But right now, I’m mostly interested in his role as a linguistic innovator. In that role he’s invented a new type of language game, which you can see an example of here:

It’s been picked up by a lot of other viners, as well. You can seem some additional examples here.

So why is this linguistically interesting? Because, like most other language games, it has rules to it. I don’t think Chaz necessarily sat down and came up with them (he could have, but I’d be surprised) but they’re there none the less. This is a great example of one of the big True Things linguists know about language: even in play, it tends to be structured. This particular game has three structures I noticed right away: vowel harmony,re-syllabification and new stress assignment.

Vowel Harmony

Vowel harmony is where all the vowels in a word tend to sound alike. It’s not really a big thing in English, but you may be familiar with it from the nursery rhyme “I like to eat Apples and Bananas“. Other languages, though, use it all the time: Finnish, Nez Perce, Turkish and Maasai all have vowel harmony.

It’s also part of this language game. For example, “tide” is pronounced so that it rhymes with “speedy” and “tomatoes” rhymes with “toe so toes”. Notice that both words have the same vowel sound throughout. Not all words have the same vowel all the way through, but there’s more vowel harmony in the #PronouncingThingsIncorrectly words than there are in the original versions.

Re-syllabification

Syllables are a way of chunking up words–you probably learned about them in school at some point. (If not, I’ve talked about them before.) But languages break words up in different places. And in the game, the boundaries get moved around. We’ve already seen one example: “tide”. It’s usually one chunk, but in the game it gets split in to two: “tee.dee”. (Linguists like to put periods in the middle of words to show where the syllable boundaries are.)

You might have noticed that “tide” is spelled with two a silent “e” on the end. My strong intuition is that spelling plays a big role in this word game. (Which is pretty cool! Usually language games like this rely on mostly on sounds and not the letters used to write them.) Most words get each of the vowels in thier spelling produced separately, which is where a lot of these resyllabifications come from. Two consonants in a row also tend to each get their syllables. You can see some examples of each below:

Hawaiian -> ha.why.EE.an
Mayonnaise -> may.yon.nuh.ASS.ee
Skittles -> ski.TI.til.ees

New Stress Assignment

English stress assignment (how we pick which syllables in a word get the most emphasis) is a mess. It depends on, among other things, which language we borrowed the word from (words from Latin and words from Old English work differently), whether you can break the word down into smaller meaning bits (like how “bats” is “bat” + “s”) and what part of speech it is (the “compact” in “powder compact” and “compact car” have stress in different places). People have spent entire careers trying to describe it.

In this word game, however, Smith fixes English stress. After resyllabificaiotn, almost all words with more than one syllable have stress one syllable in from the right edge:

suc.CESS -> SUC.cess
pe.ROK.side -> pee.rok.SEED.dee
col.OGNE -> col.OG.nee
HON.ey stays the same

But if you’ve been paying attention, you’ll notice that there are some exceptions, like Skittles:

Skittles -> ski.TI.til.ees
Jalapenos -> djuh.LA.pen.os

Why are these ones different? I think it’s probably because they’re plural, and if the final syllable is plural it doesn’t really count. You can hear some more examples of this in the Vine embedded above:

bubbles -> BOO.buh.lees
drinks -> duh.RIN.uh.kus
bottles -> BOO.teh.less

So what?

Ok, so why is this important or interesting? Well, for one thing it’s a great example of how humans can’t help but be systematic. This is very informal linguistic play that still manages to be pretty predictable. By investigating this sort of language game we can better characterize what it is to be a human using language.

Secondly, this particular language games shows us some of the pressures on English. While it’s my impression that the introduction of vowel harmony is done to be funny (especially since there are other humorous processes at work here–if a word can be pronounced like “booty” or “ass” is usually is) I’m really interested in the resyllabification and stress assignment–or is that ree.sill.luh.ah.bee.fee.ca.TEE.oin and STUH.rees ass.see.guh.nuh.MEN.tee? The ways they’re done in this game is real improvement over the current way of doing things, at least in terms of being systematic and easy to learn. Who knows? In a couple centuries maybe we’ll all be #PronouncingThingsIncorrectly.

Should you go to grad school for linguistics?

April 28, 2016April 28, 2016 ~ Rachael Tatman ~ 5 Comments

So I’ve had this talk, in different forms, with lots of different people over the last couple of years. Mainly undergrads thinking about applying to PhD programs in linguistics but, occasionally, people in industry thinking about going back to school as well. Every single one of these people was smart, cool, dedicated, hard-working, a great linguist and would have been an asset to the field. And when they asked me, a current linguistics graduate student, whether it was a good idea to go to grad school in linguistics, I gave them all the same answer:

“But Rachael,” you say, “you’re going to grad school in linguistics and having all sorts of fun. Why are you trying to keep me from doing the same thing?” Two big reasons.

The Job Market for Linguistics PhDs

What do you want to do when you get out of grad school? If you’re like most people, you’ll probably say you want to teach linguistics at the college or university level. What you should know is that this is an increasingly unsustainable career path.

In 1975, 30 percent of college faculty were part-time. By 2011, 51 percent of college faculty were part-time, and another 19 percent were non–tenure track, full-time employees. In other words, 70 percent were contingent faculty, a broad classification that includes all non–tenure track faculty (NTTF), whether they work full-time or part-time.

More Than Half of College Faculty Are Adjuncts: Should You Care? by Dan Edmonds.

And most of these part-time faculty, or adjuncts, are very poorly paid. This survey from 2015 found that 62% of adjuncts made less than $20,000 a year. This is even more upsetting you consider that you need a PhD and scholarly publications to even be considered for one of these posts.

(“But what about being paid for your research publications?” you ask. “Surely you can make a few bucks by publishing in those insanely expensive academic journals.” While I understand where you’re coming from–in almost any other professional publishing context it’s completely normal to be paid for your writing–authors of academic papers are not paid. Nor are the reviewers. Furthermore, authors are often charged fees by the publishers. One journal I was recently looking at charges $2,900 per article, which is about three times the funding my department gives us for research over our entire degree. Not a scam journal, either–an actual reputable venue for scholarly publication.)

Yes, there are still tenure-track positions available in linguistics, but they are by far the minority. What’s more, even including adjunct positions, there are still fewer academic posts than graduating linguists with PhDs. It’s been that way for a while, too, so even for a not-so-great adjunct position you’ll be facing stiff competition. Is it impossible to find a good academic post in linguistics? No. Are the odds in your (or my, or any other current grad student’s) favor? Also no. But don’t take it from me. In Surviving Linguistics: A Guide for Graduate Students (which I would highly recommend) Monica Macaulay says:

[It] is common knowledge that we are graduating more PhDs than there are faculty positions available, resulting in certain disappointment for many… graduates. The solution is to think creatively about job opportunities and keep your options open.

As Dr. Macaulay goes on to outline, there are jobs for linguists outside academia. Check out the LSA’s Linguistics Beyond Academia special interest group or the Linguists Outside Academia mailing list. There are lots of things you can do with a linguistics degree, from data science to forensic linguistics.

That said, there are degrees that will better prepare you for a career than a PhD in theoretical linguistics. A master’s degree in Speech Language Pathology (SLP) or Computational Linguistics or Teaching English to Speakers of Other Languages (TESOL) will prepare you for those careers far better than a general PhD.

Even if you’re 100% dead set on teaching post-secondary students, you should look around and see what linguists are doing outside of universities. Sure, you might win the job-lottery, but at least some of your students probably won’t, and you’ll want to make sure they can find well-paying, fulfilling work.

Grad School is Grueling

Yes, grad school can absolutely be fun. On a good day, I enjoy it tremendously. But it’s also work. (And don’t give me any nonsense about it not being real work because you do it sitting down. I’ve had jobs that required hard physical and/or emotional labor, and grad school is exhausting.) I feel like I probably have a slightly better than average work/life balance–partly thanks to my fellowship, which means I have limited teaching duties and don’t need a second job any more–and I’m still actively trying to get better about stopping work when I’m tired. I fail, and end up all tearful and exhausted, about once a week.

It’s also emotionally draining. Depression runs absolutely rampant among grad students. This 2015 report from Berkeley, for example, found that over two thirds of PhD students in the arts and sciences were depressed. The main reason? Point number one above–the stark realities of the job market. It can be absolutely gutting to see a colleague do everything right, from research to teaching, and end up not having any opportunity to do the job they’ve been preparing for. Especially since you know the same lays in wait for you.

And “doing everything right” is pretty Herculean in and of itself. You have to have very strong personal motivation to finish a PhD. Sure, your committee is there to provide oversight and you have drop-dead due dates. But those deadlines are often very far away and, depending on your committee, you may have a lot of independence. That means motivating yourself to work steadily while manage several ongoing projects in parallel (you’re publishing papers in addition to writing your dissertation, right?) and not working yourself to exhaustion in the process. Basically you’re going to need a big old double helping of executive functioning.

And oh by the way, to be competitive in the job market you’ll also need to demonstrate you can teach and perform service for your school/discipline. Add in time to sleep, eat, get at least a little exercise and take breaks (none of which are optional!) and you’ve got a very full plate indeed. Some absolutely iron-willed people even manage all of this while having/raising kids and I have nothing but respect for them.

Main take-away

Whether inside or outside of academia, it’s true that a PhD does tend to correlate with higher salary–although the boost isn’t as much as you’d get from a related professional degree. BUT in order to get that higher salary you’ll need to give up some of your most productive years. My spouse (who also has a bachelors in linguistics) got a master’s degree, found a good job, got promoted and has cultivated a professional social network in the time it’s taken me just to get to the point of starting my dissertation.The opportunity cost of spending five more years (at a minimum–I’ve heard of people who took more than a decade to finish) in school, probably in your twenties, is very, very high. And my spouse can leave work at work, come home on weekends and just chill. This month I’ve got four full weekends of either conferences or outreach. Even worse, no matter how hard I try to stamp it out, I’ve got a tiny little voice in my head that’s very quietly screaming “you should be working” literally all the time.

I’m being absolutely real right now: going to grad school for linguistics is a bad investment of your time and labor. I knew that going in–heck, I knew that before I even applied–and I still went in. Why? Because I decided that, for me, it was a worthwhile trade-off. I really like doing research. I really like being part of the scientific community. Grad school is hard, yes, but overall I’m enjoying myself. And even if I don’t end up being able to find a job in academia (although I’m still hopeful and still plugging away at it) I really, truly believe that the research I’m doing now is valuable and interesting and, in some small way, helping the world. What can I say? I’m a nerdy idealist.

But this is 100% a personal decision. It’s up to you as an individual to decide whether the costs are worth it to you. Maybe you’ll decide, as I have, that they are. But maybe you won’t. And to make that decision you really do need to know what those costs are. I hope I’ve helped to begin making them clear.

One final thought: Not going to grad school doesn’t mean you’re not smart. In fact, considering everything I’ve discussed above, it probably means you are.

What is linguistic discrimination?

April 18, 2016April 19, 2016 ~ Rachael Tatman ~ 1 Comment

Recently, UC Berkeley student Khairuldeen Makhzoomi was removed from his flight. The reason: he was speaking Arabic. And this isn’t the first time this has happened. Nor the second. These are all, in addition to being deeply disturbing and illegal, examples of linguistic discrimination.

What is linguistic discrimination?

Linguistic discrimination is discrimination based on someone’s language use. And it’s not restricted to the instances I discussed above:

African American English is often discriminated against. For example, Rachel Jeantel’s testimony in the Trayvon Martin case was largely dismissed by the white jury–not because of the content of the testimony, but because of her use of African American English, and there is a long history of landlords linguistically profiling and discriminating against African American and Latina/o prospective tenants.
Sometimes it’s the language itself under attack, as in this letter about American Sign Language, claiming that it’s unnecessary for deaf children to learn to sign. (Here’s a rebuttal from the Gallaudet Linguistics department, which is a major center of sign language research at the world’s only university “devoted to deaf and hard of hearing students.”)
In multilingual countries, it’s unfortunately common for speakers of marginalized language to find themselves denied services in their language.

As I’ve talked about before, linguistic discrimination can be a way to discriminate against a specific group of people without saying so in so many words. Linguistic discrimination, in addition to being morally repugnant,is illegal in the U.S. under Titles VI and VII of the Civil Rights Act of 1964.

These are important legal protections and the number of people affected by them is huge: There are over 350 different languages spoken in the United States. In Seattle, where I live, over a fifth of people over age five speak a language other than English at home. That’s a lot of people! Further, most of these individuals are bilingual or multilingual; 90% of second-generation immigrants speak English. And since multilingualism has both neurological benefits for individuals and larger positive impacts on society, I see this as no bad thing. And I’m hardly the only one: how many people that you know are learning or want to learn another language?

Unfortunately, linguistic discrimination threatens this rich diversity, and every person who speaks anything other than the standardized variety of the dominant language.

What can you do?

Don’t participate in linguistic discrimination. It can be hard to retrain yourself to reduce the impact of negative stereotypes but, especially if you’re in a position of privilege (as I am), it’s literally the least you can do. Don’t make assumptions about people based on their language use.
Stand up for people who may be facing linguistic discrimination. If you see someone being discriminated in in the workplace (like being given lower performance evaluations for having a non-native accent) point out that this is illegal, and back up people who are being discriminated against.
Be patient with non-native speakers. Appreciate that they’ve gone through a lot of effort to learn your language. If possible, try and arrange for an interpreter (for face-to-face communication) or translator (for written communications). Sometimes non-native speakers are more comfortable with reading and writing than speaking; offer to communicate through e-mails or other written correspondence.

What’s the difference between frosting and icing?

March 29, 2016 ~ Rachael Tatman ~ 2 Comments

Fair warning: this post is full of pictures of baked goods. I can’t claim responsibly for any impulsive cake-baking that may result from reading further.

This is the second post in this series. The first half, here, focused on responses to whether “frosting” and “icing” were different things, or different words for the same thing. This post gets a little more in-depth. In the first part, I was just asking people what they thought they said. In the second part, I was asking them to pick words for specific pictures. It’s not a perfect design–by asking people what they think they saw first I primed them pretty heavily–but it does reveal some interesting patterns of usage.

The main thing I was interested in was this–did people who said frosting and icing were interchangeable for them actually use them as if they were the same? Why is this a good question to ask? Because it turns out that a lot of the time people aren’t the best judges of how they use language. Especially if there’s some sort of “rule” about how you’re “supposed” to do it. For example, there’s something of a running joke among linguists how often people will use the passive voice while they’re telling people not to! I don’t think anyone would intentionally lie about their usage, but it’s possible that respondents aren’t always doing exactly what they think they are.

I split my dataset into people who said they thought the words “frosting” and “icing” meant the same thing and those who thought they were different. In the charts below these groups are labelled “same” and “different” respectively. For this stage of analysis, I left out people who weren’t sure; there weren’t a whole lot of them anyway.

Cupcake

So this picture was a pretty canonical example of what people brought up a lot–it’s on a cake, and it’s been both whipped and piped. For a lot of people, then, this should be “frosting”. So what did people say?

cupcakeChart The results here were pretty much what I expected. (Whew!) People who thought the words meant different things pretty much all thought this was “frosting”. And there was a pretty strong different between the groups. But this still doesn’t answer some of my questions. Is it the texture that makes it “frosting” or, as the AP Styleguide suggests, the fact that it’s on a cake? After all, you can definitely put buttercream on a cookie, as evinced by Lofthouse.

Doughnuts

Next I had some doughnuts. A lot of people, when I first started asking around, brought up doughnuts as something that they thought were iced rather than frosted. So what did people say?

donughts

That does seem to hold true.There was no strong difference between the groups, but there were also a lot of write-in answers. (“Glaze” was especially popular, which, for the record, is probably what I’d say. ) So there seems to be more variety in what people call doughnut toppings but there is a tendency towards “icing”.

Cake with fondant

Ok, so this image was a bit of a trick. The cake here is covered in fondant. Which, to me, isn’t really frosting or icing. But if it’s really “being on a cake” that makes something “frosting”, we should see a strong “frosting” bias from people with a distinction. fondant And that’s just not the case. There’s also a pretty big difference between the groups here. Interestingly, people who thought “frosting” and “icing” are different things were more likely to write in “fondant”. (Remember that level of baking knowledge had no effect on whether people said there was a difference or not, so it’s probably not just specialized knowledge.)

Bundt Cake

I included this image for a couple of reasons. Again, I’m poking at this “on a cake” idea. But I also had a lot of people tell me that, for them, the distinction between the words was texture-based. So responses here could have gone two ways: If anything on a cake is frosting, then we’d expect frosting to win. But, if frosting has to be fluffy/whipped, then we’d expect icing to win.

bundt

And icing wins! This is no surprise, given the written results summarized in my previous post and the responses for the cake pictures above, but for me it really puts the nail in the coffin of the “on cakes” argument. (Take note, AP Styleguide!) Even on this one, though, people with no distinction are much more likely to be able to use “frosting”.

Sweet Roll

So this is an interesting one. I included it because, for me, cinnamon rolls are synonymous with cream cheese frosting/icing. Since several people I talked to said specifically that cream cheese had to be frosting and not icing, I was expecting a large “frosting” response on this one.

cinnamonRoll

That was definitely not what I saw, though. (Although people with no distinction were much more likely to be able to say “frosting”, so I guess I came by it natural.) Most people, and especially people with a distinction, thought it was “icing”.

Overview

So there are two main takeaways here:

There’s a strong difference in usage between people who say that “frosting” and “icing” are different things and those who say they aren’t. (For most of the pictures, these groups responded significantly differently.)
If there is a difference, it’s got everything to do with texture and nothing to do with cake.

That’s not to say that these things will always hold true; no one knows better than linguists that language is in a constant state of flux. But for now, these generalizations seem to hold for most of the people surveyed. So if you’re going to make a usage distinction between these words, please make one that’s based on the actual usage and not some completely made-up rule!

A final note: if you’re interested in seeing the (slightly sanitized) data and the R code I used for analysis, both are available here.

Is there a difference between frosting and icing?

March 1, 2016March 1, 2016 ~ Rachael Tatman ~ Leave a comment

So recently, the Associated Press Stylebook posted this on Twitter:

AP Style tip: Use “icing” to describe sugar decorations applied to cookies; “frosting” for cupcakes and cakes.

— AP Stylebook (@APStylebook) February 22, 2016

This struck me as 1) kind of a petty usage distinction and 2) completely at odds with my personal usage and what I knew about the dialectal research. The Dictionary of American Regional English, for example, notes that “Frosting” is “widespread, but chiefly North, North Midland, West“. “Icing”, on the other hand, is found all over,”but less freq North, Pacific“. As someone from Virginia but currently living in Seattle, I have no problem using either frosting or icing for a nice buttercream. I’m hardly the only one, either. This baking blog post even says “I use lots of different icings to frost cupcakes”.

Chai white chocolate cupcakes (2) — Frosting or icing, I’ll take a dozen.

BUT when I posted about this Twitter, some people replied that they did have a very strong distinction between the two words. And the same thing happened when I brought it up with different groups of friends. A lot of people brought up texture, or that they’d say that some things are frosted and others are iced. This was really fascinating to me, both as a baker and a linguist, so I did what any social scientist would and set out to collect some data to get a better idea of what’s going on.

I set up a survey on Google forms and got 109 responses. First I collected info on where speakers were from, how old they were and how knowledgeable they were about baking. Then I asked them for both their general impression of use and then used pictures to ask what they’d call the sweet topping on a variety of baked goods. To avoid making this blog post absolutely huge, I’m going to split up data discussion. The first half (this one) will look at whether people make a distinction between frosting and icing and whether that’s related to any of their social characteristics. The second half (I’ll link it here when it’s done) will focus on responses to specific images.

Are “frosting” and “icing” different, or are they different words for the same thing?

The first question I asked people was whether frosting and icing were different, or just different words for the same thing. Most people (over 60%) thought that they were different things, while about a third (27% ) thought they were different words for the same thing, and the rest weren’t sure. So it does look like there’s some difference in how people use these words. But in and of itself, that’s not very interesting. What I want to know is this: how do people with different social characteristics use these words? (You may remember that I wrote a while ago that this is the central question in sociolinguistics.)

Region

The first thing I wanted to look at was region. I was expecting to see a pretty big difference here, and I wasn’t disappointed. Once I broke down the data by the states people were from, I found a definite pattern: people from the South were far more likely to say that frosting and icing were different words for the same thing. (Virginia isn’t really patterning with the rest of the South, here, but that may be due to bit of sampling bias–I recruited participants through my social network, and a lot of my friends are from Northern Virginia, which tends not to pattern with the South.)

mapUseThisOne — Most people in the South thought frosting and icing were the same thing, while outside of the South more people thought they were different things. (The darker the blue, the more likely someone from that state was to say that they were different things–black states I didn’t get any respondents from.)

Why is there a distinction? Honestly, I’m not really sure. My intuition, though, is that people from the South probably have pretty wide exposure to both terms. (Since books, TV and movies tend to come from outside of the South, there’s plenty of chances to come across other dialectal variants.) However, people from outside the South historically had less exposure to one of the terms–icing–when they started to come across it they decided that it must refer to something different. As a result, the meanings of both words changed to become more narrow. (This is actually a pretty common process in languages.) I don’t have strong evidence for this theory right now, though, so take it with a couple shakes of salt!

Age

Another thing I wanted to look at was whether the age of respondents played a role in how they used these words. If younger respondents seem to use the word differently than older respondents, it might be because there’s a change happening in the language. Given time, everyone might end up doing the same thing as the younger people.

age — While it looks like there’s a slight tendency for younger participants to say there’s a difference between frosting and icing, the effect isn’t strong enough to be reliable.

I didn’t find a strong pattern, though. Again, this might be due to sampling problems, since most of my respondents were roughly the same age (21-30). But it could also be that there’s simply not anything to find–that this is neither an on ongoing change, nor one where younger people and older people do things differently.

Baking Knowledge

Ok, so it looks like people are varying by region, but not by age… but what about by level of baking knowledge? Maybe you don’t care about the difference if you almost never make or eat baked goods. It could be that people who know a lot about baking make a distinction, and it’s only people who don’t know a beater from a dough hook that are lumping things together.

bakingExp — Baking knowledge also isn’t closely tied to how people use these words. So it’s not just that people who don’t know a lot about baking say they’re the same.

But that’s not what I found. People at all levels of baking knowledge tended to have a pretty even balance between the two uses of the words.

Comments

I also collected comments from people, to get more information on what people thought in their own words. Two big themes emerged. One was that the most consistent thing people pointed to as the difference was texture. The other was that people tended to say that one of them was for the cake and the other wasn’t… but which one was which was pretty much random.

Just under half of the comments mentioned texture. I’ve compiled some of the differences below, but the general consensus seems to be that frosting is thick, fluffy and soft, while icing is thin and hard. Take note, AP Stylebook!

Frosting	Icing
creamy or buttery	syrupy, like a glaze
	plasticy looking
spread	squeezed or piped
thick and creamy	thin, hardens as it dries
thicker
thicker	clear crust, dried
fluffy	thin
	thin layer, smooth, glossy
more solid, less flowing	watery, gooey
stays soft	hardens once it sets
thicker, softer	thinner, harder
thick, textured	thin, flat

Six people did specifically mention how the words could be used for cake toppings in their comments. Two people said cakes could be either frosted or iced, two said that cakes could only be iced, and two said that cakes could only be frosted. Here’s an example of an icing is for cakes comment:

icing is for cakes! frosting is for all the other deliciousness. usually.

And someone who suggests frosting is for cakes:

I usually apply the word frosting solely to cakelike goods (cupcakes, regular cake) and then icing to everything else.

So… if you are going to claim there’s a difference between frosting and icing, pulling the “it goes on cakes” card is pretty likely to start a fight. You’re much safer talking about texture. Unless you’re in the South, of course; then you can pretty much say what you like.

Is there a difference between frosting and icing? It looks like the answer mainly depends on where you are. But there were also some pretty interesting differences between different baked goods, so stay tuned for that part of the analysis.

P.S. If you’re interested in seeing the (slightly sanitized) data and the R code I used for analysis, both are available here.

Does white noise really help you study?

February 15, 2016 ~ Rachael Tatman ~ Leave a comment

So midterms have started here at the University of Washington (already, I know!) and I’m starting to notice more stressed-out study sessions. Around this time of year I always think about all the crazy study hints and tips I’ve heard over the years. (My personal favorite tip is to drink sage tea while I’m reading over notes–it’s been shown to help improve memory.) But one tip that people often share is that listening to white noise can help you concentrate while studying. Being the sort of person I am (read: huge nerd) I decided to set out and see what the research has to say about it.

First things first: some noises can definitely be bad for learning. For example, one study which compared schools near major airports (which are a big source of noise pollution) and some which were not found that children who were in the noisier environment had reduced reading comprehension. An earlier, similar study showed that students in classrooms near a very noisy train track did worse academically than those that were not.

And noisy environments are bad for concentration, too. One survey of office workers found that 99% of participants were bothered by noises like ringing telephones and conversations, and that the negative effects of these noises didn’t fade over time. And we know that some types of speech noise–especially half of a telephone conversation–are incredibly distracting.

Ok, so we know that some noise can hurt both learning and concentration… so why fight fire with fire? Wouldn’t listening to white noise just be more of the same? Or even worse?

Well, not necessarily. The really distracting thing about noise is that it’s not predictable. It’s pretty easy to “tune out” a clock ticking because your brain can figure out when it’s going to tick again. When a new noise suddenly starts, however, or keeps happening in an unpredictable way, like a faucet dripping juuuust out of rhythm, your attention snaps to it. There’s actually a special set of “novelty detector neurons” that are looking for any new types of sounds that might show up. There are two ways to avoid this happening. One is to make sure that all your environmental sounds are ones you can easily ignore… or you can cover them up. And white noise is very effective at covering up other noises.

White noise is random noise that covers a wide frequency spectrum, usually 20 to 20,000 Hz. That means that other sounds that are the same volume or quieter than the white noise can’t “get thorough”. As a result, you don’t hear anything surprising, your novelty detector neurons stay quiet, and you can focus on what you’re doing. And don’t take my word for it: this study shows that students who listened to a recording of office noises masked with white noise preformed much better on tasks then those who listened to the office noises unmasked.

Now, keep in mind, just because a noise is “white” doesn’t mean it’s good for you. Volume, for one thing, is very important. Exposing rats to 100-dB white noise for 45 minutes was enough for them to undergo measurable stress-induced neurological changes. To be fair, that’s about as loud as a power mower but it does takes you out of the “relaxed concentration” range. So grab your headphones and favorite white noise source (if you’ve no other options, a radio set to static will work just fine) but remember to keep the volume down!

The problem with the grammar police

February 8, 2016February 9, 2016 ~ Rachael Tatman ~ 1 Comment

I’ll admit it: I used to be a die-hard grammar corrector. I practically stalked around conversations with a red pen, ready to jump out and shout “gotcha!” if someone ended a sentence with a preposition or split an infinitive or said “irregardless”. But I’ve done a lot of learning and growing since then and, looking back, I’m kind of ashamed. The truth is, when I used to correct people’s grammar, I wasn’t trying to help them. I was trying to make myself look like a language authority, but in doing so I was actually hurting people. Ironically, I only realized this after years of specialized training to become an actual authority on language.

But what do I mean when I say I was hurting people? Well, like some other types of policing, the grammar police don’t target everyone equally. For example, there has been a lot of criticism of Rihanna’s language use in her new single “Work” being thrown around recently. But that fact is that her language is perfectly fine. She’s just using Jamaican Patois, which most American English speakers aren’t familiar with. People claiming that the language use in “Work” is wrong is sort of similar to American English speakers complaining that Nederhop group ChildsPlay’s language use is wrong. It’s not wrong at all, it’s just different.

And there’s the problem. The fact is that grammar policing isn’t targeting speech errors, it’s targeting differences that are, for many people, perfectly fine. And, overwhelmingly, the people who make “errors” are marginalized in other ways. Here are some examples to show you what I mean:

Misusing “ironic”: A lot of the lists of “common grammar errors” you see will include a lot of words where the “correct” use is actually less common then other ways the word is used. Take “ironic”. In general use it can mean surprising or remarkable. If you’re a literary theorist, however, irony has a specific technical meaning–and if you’re not a literary theorist you’re going to need to take a course on it to really get what irony’s about. The only people, then, who are going to use this word “correctly” will be those who are highly educated. And, let’s be real, you know what someone means when they say ironic and isn’t that the point?
Overusing words like “just”: This error is apparently so egregious that there’s an e-mail plug-in, targeted mainly at women, to help avoid it. However, as other linguists have pointed out, not only is there limited evidence that women say “just” more than men, but even if there were a difference why would the assumption be that women were overusing “just”? Couldn’t it be that men aren’t using it enough?
Double negatives: Also called negative concord, this “error” happens when multiple negatives are used in a sentence, as in, “There isn’t nothing wrong with my language.” This particular construction is perfectly natural and correct in a lot of dialects of American English, including African American English and Southern English, not to mention the standard in some other languages, including French.

In each of these cases, the “error” in question is one that’s produced more by certain groups of people. And those groups of people–less educated individuals, women, African Americans–face disadvantages in other aspects of their life too. This isn’t a mistake or coincidence. When we talk about certain ways of talking, we’re talking about certain types of people. And almost always we’re talking about people who already have the deck stacked against them.

Think about this: why don’t American English speakers point out whenever the Queen of England says things differently? For instance, she often fails to produce the “r” sound in words like “father”, which is definitely not standardized American English. But we don’t talk about how the Queen is “talking lazy” or “dropping letters” like we do about, for instance, “th” being produced as “d” in African American English. They’re both perfectly regular, logical language varieties that differ from standardized American English…but only one group gets flack for it.

Now I’m not arguing that language errors don’t exist, since they clearly do. If you’ve ever accidentally said a spoonerism or suffered from a tip of the tongue moment then you know what it feel like when your language system breaks down for a second. But here’s a fundamental truth of linguistics: barring a condition like aphasia, a native speaker of a language uses their language correctly. And I think it’s important for us all to examine exactly why it is that we’ve been led to believe otherwise…and who it is that we’re being told is wrong.

Share this:

Boring technical details 💤

Exciting results! 😄

Share this:

Is ‘eh’ actually Canadian?

What does ‘eh’ mean?

Share this:

Share this:

The Job Market for Linguistics PhDs

Grad School is Grueling

Share this:

What is linguistic discrimination?

What can you do?

Share this:

Cupcake

Doughnuts

Cake with fondant

Bundt Cake

Sweet Roll

Overview

Share this:

Are “frosting” and “icing” different, or are they different words for the same thing?

Region

Age

Comments

Share this:

Share this:

Share this: