Which accents does automatic speech recognition work best for?

July 11, 2016July 11, 2016 ~ Rachael Tatman ~ 5 Comments

If your primary dialect is something other than Standardized American English (that sort of from-the-US-but-not-anywhere-in-particular type of English you hear a lot of on the news) you may have noticed that speech recognition software doesn’t generally work very well for you. You can see the sort of thing I’m talking about in this clip:

This clip is a little old, though (2010). Surely voice recognition technology has improved since then, right? I mean, we’ve got more data and more computing power than ever. Surely somebody’s gotten around to making sure that the current generation of voice-recognition software deals equally well with different dialects of English. Especially given that those self-driving cars that everyone’s so excited about are probably going to use voice-based interfaces.

To check, I spent some time on Youtube looking at the accuracy automatic captions for videos of the accent tag challenge, which was developed by Bert Vaux. I picked Youtube automatic captions because they’re done with Google’s Automatic Speech Recognition technology–which is one of the most accurate commercial systems out there right now.

Data: I picked videos with accents from Maine (U.S), Georgia (U.S.), California (U.S), Scotland and New Zealand. I picked these locations because they’re pretty far from each other and also have pretty distinct regional accents. All speakers from the U.S. were (by my best guess) white and all looked to be young-ish. I’m not great at judging age, but I’m pretty confident no one was above fifty or so.

What I did: For each location, I checked the accuracy of the automatic captions on the word-list part of the challenge for five male and five female speakers. So I have data for a total of 50 people across 5 dialect regions. For each word in the word list, I marked it as “correct” if the entire word was correctly captioned on the first try. Anything else was marked wrong. To be fair, the words in the accent tag challenge were specifically chosen because they have a lot of possible variation. On the other hand, they’re single words spoken in isolation, which is pretty much the best case scenario for automatic speech recognition, so I think it balances out.

Ok, now the part you’ve all been waiting for: the results. Which dialects fared better and which worse? Does dialect even matter? First the good news: based on my (admittedly pretty small) sample, the effect of dialect is so weak that you’d have to be really generous to call it reliable. A linear model that estimated number of correct classifications based on total number of words, speaker’s gender and speaker’s dialect area fared only slightly better (p = 0.08) than one that didn’t include dialect area. Which is great! No effect means dialect doesn’t matter, right?

Weellll, not really. Based on a power analysis, I really should have sampled forty people from each dialect, not ten. Unfortunately, while I love y’all and also the search for knowledge, I’m not going to hand-annotate two hundred Youtube videos for a side project. (If you’d like to add data, though, feel free to branch the dataset on Github here. Just make sure to check the URL for the video you’re looking at so we don’t double dip.)

So while I can’t confidently state there is an effect, based on the fact that I’m sort of starting to get one with only a quarter of the amount of data I should be using, I’m actually pretty sure there is one. No one’s enjoying stellar performance (there’s a reason that they tend to be called AutoCraptions in the Deaf community) but some dialect areas are doing better than others. Look at this chart of accuracy by dialect region:

accuracyByDialect — Proportion of correctly recognized words by dialect area, color coded by country.

There’s variation, sure, but in general the recognizer seems to be working best on people from California (which just happens to be where Google is headquartered) and worst on Scottish English. The big surprise for me is how well the recognizer works on New Zealand English, especially compared to Scottish English. It’s not a function of country population (NZ = 4.4 million, Scotland = 5.2 million). My guess is that it might be due to sample bias in the training sets, especially if, say, there was some 90’s TV shows in there; there’s a lot of captioned New Zealand English in Hercules, Xena and related spin-offs. There’s also a Google outreach team in New Zealand, but not Scotland, so that might be a factor as well.

So, unfortunately, it looks like the lift skit may still be current. ASR still works better for some dialects than others. And, keep in mind, these are all native English speakers! I didn’t look at non-native English speakers, but I’m willing to bet the system is also letting them down. Which is a shame. It’s a pity that how well voice recognition works for you is still dependent on where you’re from. Maybe in another six years I’ll be able to write a blog post says it isn’t.

What types of emoji do people want more of?

June 21, 2016 ~ Rachael Tatman ~ Leave a comment

So if you’re a weird internet nerd like me, you might already know that Unicode 9.0 was released today. The deets are here, but they’re fairly boring unless you really care about typography. What’s more interesting to me, as someone who studies visual, spoken and written language, is that there are a whole batch of new emoji. And it’s led to lots of interesting speculation about, for example, what is the most popular new emoji is going to be (tldr: probably the ROFL face. People have a strong preference for using positive face emojis.) This led me to wonder: what obvious lexical gaps are there?

[I]n some cases it is useful to refer to the words that are not part of the vocabulary: the nonexisting words. Instead of referring to nonexisting words, it is common to speak about lexical gaps, since the nonexisting words are indications of “holes” in the lexicon of the language that could be filled.

Janssen, M. 2012. “Lexical Gaps”. The Encyclopedia of Applied Linguistics.

This question is pretty easy to answer about emoji– we can just find out what words people are most likely to use when they’re complaining about not being able to use emoji. There’s even a Twitter bot that collects these kind of tweets. I decided to do something similar, but with a twist. I wanted to know what kinds of emoji people complain about wanting the most.

Boring technical details 💤

Yesterday, I grabbed 4817 recent tweets that contained both the words “no” and “emoji”. (You can find the R script I used for this on my Github.)
For each tweet, I took the two words occurring directly in front of the word “emoji” and created a corpus from them using the tm (text mining) package.
I tidied up the corpus–removing super-common words like “the”, making everything lower-case, and so on. (The technical term is “cleaning“, but I like the sound of tidying better. It sounds like you’re getting comfy with your data, not delousing it.)
I ranked these words by frequency, or how often then showed up. There were 1888 distinct words, but the vast majority (1280) showed up only once. This is completely normal for word frequency data and is modelled by Zipf’s law.
I then took all words that occurred more than three times and did a content analysis.

Exciting results! 😄

At the end of my content analysis, I arrived at nine distinct categories. I’ve listed them below, with the most popular four terms from each. One thing I noticed right off is how many of these are emoji that either already exist or are in the Unicode update. To highlight this, I’ve italicized terms in the list below that don’t have an emoji.

animal: shark, giraffe, butterfly, duck
color: orange, red, white, green
face: crying, angry, love, hate
(facial) feature: mustache, redhead, beard, glasses
flag: flag, England, Welsh, pride
food: bacon, avocado, salt, carrot
gesture: peace, finger, middle, crossed
object: rifle, gun, drum, spoon
person: mermaid, pirate, clown, chef

(One note: the rifle is in unicode 9.0, but isn’t an emoji. This has been the topic of some discussion, and is probably why it’s so frequent.)

Based on these categories, where are the lexical gaps? The three categories that have the most different items in them are, in order 1) food, 2) animals and 3) objects. These are also the three categories with the most mentions across all items.

So, given that so many people are talking about emojis for animals, food and objects, why aren’t the bulk of emojis in these categories? We can see why this might be by comparing how many different items get mentioned in each category to how many times each item is mentioned.

Rplot02 — Yeah, people talk about food a lot… but they also talk about a lot of different types of food. On the other hand you have categories like colors, which aren’t talked about as much but where the same colors come up over and over again.

As you can see from the figure above, the most popular categories have a lot of different things in them, but each thing is mentioned relatively rarely. So while there is an impassioned zebra emoji fanbase, it only comes up three times in this dataset. On the other hand, “red” is fairly common but shows up because of discussion of, among other things, flowers, shoes and hair color. Some categories, like flags, fall in a happy medium–lots of discussion and fairly few suggestions for additions.

Based on this teeny data set, I’d say that if the Unicode consortium continues to be in charge of putting emoji standardization it’ll have its hands full for quite some time to come. There’s a lot of room for growth, and most of it is in food, animals and objects, which all have a lot of possible items, rather than gestures or facial expressions, which have much fewer.

Why do Canadians say ‘eh’?

May 31, 2016 ~ Rachael Tatman ~ 1 Comment

Perhaps it’s because Seattle is so close to Canada, but for some reason when I ask classes of undergraduate students what they want to know about language and language use, one question I tend to get a lot is:

Why do Canadians say ‘eh’?

Fortunately for my curious students, this is actually an active area of inquiry. (It’s actually one those research questions where there was a flurry of work–in this case in the 1970’s–and then a couple quiet decades followed by a resurgence in interest. The ‘eh’ renaissance started in the mid-2000’s and continues today. For some reason, at least in linguistics, this sort of thing tends to happen a lot. I’ll leave discussing why this particular pattern is so common to the sociologists of science.) So what do we know about ‘eh’?

Is ‘eh’ actually Canadian?

‘Eh’ has quite the pedigree–it’s first attested in Middle English and even shows up in Chaucer. Canadian English, however, boasts a more frequent use of ‘eh’, which can fill the same role as ‘right?’, ‘you know?’ or ‘innit?’ for speakers of other varieties of English.

What does ‘eh’ mean?

The real thing that makes an ‘eh’ Canadian, though, is how it’s used. Despite some claims to the contrary, “eh” is far from meaningless. It has a limited number of uses (Elaine Gold identified an even dozen in her 2004 paper) some of which aren’t found outside of Canada. Walter Avis described two of these uniquely Canadian uses in his 1972 paper, “So eh? is Canadian, eh” (it’s not available anywhere online as far as I can tell):

Narrative use: Used to punctuate a story, in the same way that an American English speaker (south of the border, that is) might use “right?” or “you know?”
1. Example: I was walking home from school, eh? I was right by that construction site where there’s a big hole in the ground, eh? And I see someone toss a piece of trash right in it.
Miscellaneous/exclamation use: Tacked on to the end of a statement. (Although more recent work, presented by Martina Wiltschko and Alex D’Arcy at last year’s NWAV suggests that there’s really a limited number of ways to use this type of ‘eh’ and that they can be told apart by the way the speaker uses pitch.)
1. Example: What a litterbug, eh?

And these uses seems to be running strong. Gold found that use of ‘eh’ in a variety of contexts has either increased or remained stable since 1980.

That’s not to say there’s no change going on, though. D’Arcy and Wiltschko found that younger speakers of Canadian English are more likely than older speakers to use ‘right?’ instead of ‘eh?’. Does this mean that ‘eh’ may be going the way of the dodo or ‘sliver’ to mean ‘splinter’ in British English?

Probably not–but it may show up in fewer places than it used to. In particular, in their 2006 study Elaine Gold and Mireille Tremblay found that almost half of their participants feel negatively about the narrative use of ‘eh’ and only 16% actually used it themselves. This suggests this type of uniquely-Canadian usage may be on its way out.

Should you go to grad school for linguistics?

April 28, 2016April 28, 2016 ~ Rachael Tatman ~ 5 Comments

So I’ve had this talk, in different forms, with lots of different people over the last couple of years. Mainly undergrads thinking about applying to PhD programs in linguistics but, occasionally, people in industry thinking about going back to school as well. Every single one of these people was smart, cool, dedicated, hard-working, a great linguist and would have been an asset to the field. And when they asked me, a current linguistics graduate student, whether it was a good idea to go to grad school in linguistics, I gave them all the same answer:

“But Rachael,” you say, “you’re going to grad school in linguistics and having all sorts of fun. Why are you trying to keep me from doing the same thing?” Two big reasons.

The Job Market for Linguistics PhDs

What do you want to do when you get out of grad school? If you’re like most people, you’ll probably say you want to teach linguistics at the college or university level. What you should know is that this is an increasingly unsustainable career path.

In 1975, 30 percent of college faculty were part-time. By 2011, 51 percent of college faculty were part-time, and another 19 percent were non–tenure track, full-time employees. In other words, 70 percent were contingent faculty, a broad classification that includes all non–tenure track faculty (NTTF), whether they work full-time or part-time.

More Than Half of College Faculty Are Adjuncts: Should You Care? by Dan Edmonds.

And most of these part-time faculty, or adjuncts, are very poorly paid. This survey from 2015 found that 62% of adjuncts made less than $20,000 a year. This is even more upsetting you consider that you need a PhD and scholarly publications to even be considered for one of these posts.

(“But what about being paid for your research publications?” you ask. “Surely you can make a few bucks by publishing in those insanely expensive academic journals.” While I understand where you’re coming from–in almost any other professional publishing context it’s completely normal to be paid for your writing–authors of academic papers are not paid. Nor are the reviewers. Furthermore, authors are often charged fees by the publishers. One journal I was recently looking at charges $2,900 per article, which is about three times the funding my department gives us for research over our entire degree. Not a scam journal, either–an actual reputable venue for scholarly publication.)

Yes, there are still tenure-track positions available in linguistics, but they are by far the minority. What’s more, even including adjunct positions, there are still fewer academic posts than graduating linguists with PhDs. It’s been that way for a while, too, so even for a not-so-great adjunct position you’ll be facing stiff competition. Is it impossible to find a good academic post in linguistics? No. Are the odds in your (or my, or any other current grad student’s) favor? Also no. But don’t take it from me. In Surviving Linguistics: A Guide for Graduate Students (which I would highly recommend) Monica Macaulay says:

[It] is common knowledge that we are graduating more PhDs than there are faculty positions available, resulting in certain disappointment for many… graduates. The solution is to think creatively about job opportunities and keep your options open.

As Dr. Macaulay goes on to outline, there are jobs for linguists outside academia. Check out the LSA’s Linguistics Beyond Academia special interest group or the Linguists Outside Academia mailing list. There are lots of things you can do with a linguistics degree, from data science to forensic linguistics.

That said, there are degrees that will better prepare you for a career than a PhD in theoretical linguistics. A master’s degree in Speech Language Pathology (SLP) or Computational Linguistics or Teaching English to Speakers of Other Languages (TESOL) will prepare you for those careers far better than a general PhD.

Even if you’re 100% dead set on teaching post-secondary students, you should look around and see what linguists are doing outside of universities. Sure, you might win the job-lottery, but at least some of your students probably won’t, and you’ll want to make sure they can find well-paying, fulfilling work.

Grad School is Grueling

Yes, grad school can absolutely be fun. On a good day, I enjoy it tremendously. But it’s also work. (And don’t give me any nonsense about it not being real work because you do it sitting down. I’ve had jobs that required hard physical and/or emotional labor, and grad school is exhausting.) I feel like I probably have a slightly better than average work/life balance–partly thanks to my fellowship, which means I have limited teaching duties and don’t need a second job any more–and I’m still actively trying to get better about stopping work when I’m tired. I fail, and end up all tearful and exhausted, about once a week.

It’s also emotionally draining. Depression runs absolutely rampant among grad students. This 2015 report from Berkeley, for example, found that over two thirds of PhD students in the arts and sciences were depressed. The main reason? Point number one above–the stark realities of the job market. It can be absolutely gutting to see a colleague do everything right, from research to teaching, and end up not having any opportunity to do the job they’ve been preparing for. Especially since you know the same lays in wait for you.

And “doing everything right” is pretty Herculean in and of itself. You have to have very strong personal motivation to finish a PhD. Sure, your committee is there to provide oversight and you have drop-dead due dates. But those deadlines are often very far away and, depending on your committee, you may have a lot of independence. That means motivating yourself to work steadily while manage several ongoing projects in parallel (you’re publishing papers in addition to writing your dissertation, right?) and not working yourself to exhaustion in the process. Basically you’re going to need a big old double helping of executive functioning.

And oh by the way, to be competitive in the job market you’ll also need to demonstrate you can teach and perform service for your school/discipline. Add in time to sleep, eat, get at least a little exercise and take breaks (none of which are optional!) and you’ve got a very full plate indeed. Some absolutely iron-willed people even manage all of this while having/raising kids and I have nothing but respect for them.

Main take-away

Whether inside or outside of academia, it’s true that a PhD does tend to correlate with higher salary–although the boost isn’t as much as you’d get from a related professional degree. BUT in order to get that higher salary you’ll need to give up some of your most productive years. My spouse (who also has a bachelors in linguistics) got a master’s degree, found a good job, got promoted and has cultivated a professional social network in the time it’s taken me just to get to the point of starting my dissertation.The opportunity cost of spending five more years (at a minimum–I’ve heard of people who took more than a decade to finish) in school, probably in your twenties, is very, very high. And my spouse can leave work at work, come home on weekends and just chill. This month I’ve got four full weekends of either conferences or outreach. Even worse, no matter how hard I try to stamp it out, I’ve got a tiny little voice in my head that’s very quietly screaming “you should be working” literally all the time.

I’m being absolutely real right now: going to grad school for linguistics is a bad investment of your time and labor. I knew that going in–heck, I knew that before I even applied–and I still went in. Why? Because I decided that, for me, it was a worthwhile trade-off. I really like doing research. I really like being part of the scientific community. Grad school is hard, yes, but overall I’m enjoying myself. And even if I don’t end up being able to find a job in academia (although I’m still hopeful and still plugging away at it) I really, truly believe that the research I’m doing now is valuable and interesting and, in some small way, helping the world. What can I say? I’m a nerdy idealist.

But this is 100% a personal decision. It’s up to you as an individual to decide whether the costs are worth it to you. Maybe you’ll decide, as I have, that they are. But maybe you won’t. And to make that decision you really do need to know what those costs are. I hope I’ve helped to begin making them clear.

One final thought: Not going to grad school doesn’t mean you’re not smart. In fact, considering everything I’ve discussed above, it probably means you are.

What is linguistic discrimination?

April 18, 2016April 19, 2016 ~ Rachael Tatman ~ 1 Comment

Recently, UC Berkeley student Khairuldeen Makhzoomi was removed from his flight. The reason: he was speaking Arabic. And this isn’t the first time this has happened. Nor the second. These are all, in addition to being deeply disturbing and illegal, examples of linguistic discrimination.

What is linguistic discrimination?

Linguistic discrimination is discrimination based on someone’s language use. And it’s not restricted to the instances I discussed above:

African American English is often discriminated against. For example, Rachel Jeantel’s testimony in the Trayvon Martin case was largely dismissed by the white jury–not because of the content of the testimony, but because of her use of African American English, and there is a long history of landlords linguistically profiling and discriminating against African American and Latina/o prospective tenants.
Sometimes it’s the language itself under attack, as in this letter about American Sign Language, claiming that it’s unnecessary for deaf children to learn to sign. (Here’s a rebuttal from the Gallaudet Linguistics department, which is a major center of sign language research at the world’s only university “devoted to deaf and hard of hearing students.”)
In multilingual countries, it’s unfortunately common for speakers of marginalized language to find themselves denied services in their language.

As I’ve talked about before, linguistic discrimination can be a way to discriminate against a specific group of people without saying so in so many words. Linguistic discrimination, in addition to being morally repugnant,is illegal in the U.S. under Titles VI and VII of the Civil Rights Act of 1964.

These are important legal protections and the number of people affected by them is huge: There are over 350 different languages spoken in the United States. In Seattle, where I live, over a fifth of people over age five speak a language other than English at home. That’s a lot of people! Further, most of these individuals are bilingual or multilingual; 90% of second-generation immigrants speak English. And since multilingualism has both neurological benefits for individuals and larger positive impacts on society, I see this as no bad thing. And I’m hardly the only one: how many people that you know are learning or want to learn another language?

Unfortunately, linguistic discrimination threatens this rich diversity, and every person who speaks anything other than the standardized variety of the dominant language.

What can you do?

Don’t participate in linguistic discrimination. It can be hard to retrain yourself to reduce the impact of negative stereotypes but, especially if you’re in a position of privilege (as I am), it’s literally the least you can do. Don’t make assumptions about people based on their language use.
Stand up for people who may be facing linguistic discrimination. If you see someone being discriminated in in the workplace (like being given lower performance evaluations for having a non-native accent) point out that this is illegal, and back up people who are being discriminated against.
Be patient with non-native speakers. Appreciate that they’ve gone through a lot of effort to learn your language. If possible, try and arrange for an interpreter (for face-to-face communication) or translator (for written communications). Sometimes non-native speakers are more comfortable with reading and writing than speaking; offer to communicate through e-mails or other written correspondence.

What’s the difference between frosting and icing?

March 29, 2016 ~ Rachael Tatman ~ 2 Comments

Fair warning: this post is full of pictures of baked goods. I can’t claim responsibly for any impulsive cake-baking that may result from reading further.

This is the second post in this series. The first half, here, focused on responses to whether “frosting” and “icing” were different things, or different words for the same thing. This post gets a little more in-depth. In the first part, I was just asking people what they thought they said. In the second part, I was asking them to pick words for specific pictures. It’s not a perfect design–by asking people what they think they saw first I primed them pretty heavily–but it does reveal some interesting patterns of usage.

The main thing I was interested in was this–did people who said frosting and icing were interchangeable for them actually use them as if they were the same? Why is this a good question to ask? Because it turns out that a lot of the time people aren’t the best judges of how they use language. Especially if there’s some sort of “rule” about how you’re “supposed” to do it. For example, there’s something of a running joke among linguists how often people will use the passive voice while they’re telling people not to! I don’t think anyone would intentionally lie about their usage, but it’s possible that respondents aren’t always doing exactly what they think they are.

I split my dataset into people who said they thought the words “frosting” and “icing” meant the same thing and those who thought they were different. In the charts below these groups are labelled “same” and “different” respectively. For this stage of analysis, I left out people who weren’t sure; there weren’t a whole lot of them anyway.

Cupcake

So this picture was a pretty canonical example of what people brought up a lot–it’s on a cake, and it’s been both whipped and piped. For a lot of people, then, this should be “frosting”. So what did people say?

cupcakeChart The results here were pretty much what I expected. (Whew!) People who thought the words meant different things pretty much all thought this was “frosting”. And there was a pretty strong different between the groups. But this still doesn’t answer some of my questions. Is it the texture that makes it “frosting” or, as the AP Styleguide suggests, the fact that it’s on a cake? After all, you can definitely put buttercream on a cookie, as evinced by Lofthouse.

Doughnuts

Next I had some doughnuts. A lot of people, when I first started asking around, brought up doughnuts as something that they thought were iced rather than frosted. So what did people say?

donughts

That does seem to hold true.There was no strong difference between the groups, but there were also a lot of write-in answers. (“Glaze” was especially popular, which, for the record, is probably what I’d say. ) So there seems to be more variety in what people call doughnut toppings but there is a tendency towards “icing”.

Cake with fondant

Ok, so this image was a bit of a trick. The cake here is covered in fondant. Which, to me, isn’t really frosting or icing. But if it’s really “being on a cake” that makes something “frosting”, we should see a strong “frosting” bias from people with a distinction. fondant And that’s just not the case. There’s also a pretty big difference between the groups here. Interestingly, people who thought “frosting” and “icing” are different things were more likely to write in “fondant”. (Remember that level of baking knowledge had no effect on whether people said there was a difference or not, so it’s probably not just specialized knowledge.)

Bundt Cake

I included this image for a couple of reasons. Again, I’m poking at this “on a cake” idea. But I also had a lot of people tell me that, for them, the distinction between the words was texture-based. So responses here could have gone two ways: If anything on a cake is frosting, then we’d expect frosting to win. But, if frosting has to be fluffy/whipped, then we’d expect icing to win.

bundt

And icing wins! This is no surprise, given the written results summarized in my previous post and the responses for the cake pictures above, but for me it really puts the nail in the coffin of the “on cakes” argument. (Take note, AP Styleguide!) Even on this one, though, people with no distinction are much more likely to be able to use “frosting”.

Sweet Roll

So this is an interesting one. I included it because, for me, cinnamon rolls are synonymous with cream cheese frosting/icing. Since several people I talked to said specifically that cream cheese had to be frosting and not icing, I was expecting a large “frosting” response on this one.

cinnamonRoll

That was definitely not what I saw, though. (Although people with no distinction were much more likely to be able to say “frosting”, so I guess I came by it natural.) Most people, and especially people with a distinction, thought it was “icing”.

Overview

So there are two main takeaways here:

There’s a strong difference in usage between people who say that “frosting” and “icing” are different things and those who say they aren’t. (For most of the pictures, these groups responded significantly differently.)
If there is a difference, it’s got everything to do with texture and nothing to do with cake.

That’s not to say that these things will always hold true; no one knows better than linguists that language is in a constant state of flux. But for now, these generalizations seem to hold for most of the people surveyed. So if you’re going to make a usage distinction between these words, please make one that’s based on the actual usage and not some completely made-up rule!

A final note: if you’re interested in seeing the (slightly sanitized) data and the R code I used for analysis, both are available here.

Is there a difference between frosting and icing?

March 1, 2016March 1, 2016 ~ Rachael Tatman ~ Leave a comment

So recently, the Associated Press Stylebook posted this on Twitter:

AP Style tip: Use “icing” to describe sugar decorations applied to cookies; “frosting” for cupcakes and cakes.

— AP Stylebook (@APStylebook) February 22, 2016

This struck me as 1) kind of a petty usage distinction and 2) completely at odds with my personal usage and what I knew about the dialectal research. The Dictionary of American Regional English, for example, notes that “Frosting” is “widespread, but chiefly North, North Midland, West“. “Icing”, on the other hand, is found all over,”but less freq North, Pacific“. As someone from Virginia but currently living in Seattle, I have no problem using either frosting or icing for a nice buttercream. I’m hardly the only one, either. This baking blog post even says “I use lots of different icings to frost cupcakes”.

Chai white chocolate cupcakes (2) — Frosting or icing, I’ll take a dozen.

BUT when I posted about this Twitter, some people replied that they did have a very strong distinction between the two words. And the same thing happened when I brought it up with different groups of friends. A lot of people brought up texture, or that they’d say that some things are frosted and others are iced. This was really fascinating to me, both as a baker and a linguist, so I did what any social scientist would and set out to collect some data to get a better idea of what’s going on.

I set up a survey on Google forms and got 109 responses. First I collected info on where speakers were from, how old they were and how knowledgeable they were about baking. Then I asked them for both their general impression of use and then used pictures to ask what they’d call the sweet topping on a variety of baked goods. To avoid making this blog post absolutely huge, I’m going to split up data discussion. The first half (this one) will look at whether people make a distinction between frosting and icing and whether that’s related to any of their social characteristics. The second half (I’ll link it here when it’s done) will focus on responses to specific images.

Are “frosting” and “icing” different, or are they different words for the same thing?

The first question I asked people was whether frosting and icing were different, or just different words for the same thing. Most people (over 60%) thought that they were different things, while about a third (27% ) thought they were different words for the same thing, and the rest weren’t sure. So it does look like there’s some difference in how people use these words. But in and of itself, that’s not very interesting. What I want to know is this: how do people with different social characteristics use these words? (You may remember that I wrote a while ago that this is the central question in sociolinguistics.)

Region

The first thing I wanted to look at was region. I was expecting to see a pretty big difference here, and I wasn’t disappointed. Once I broke down the data by the states people were from, I found a definite pattern: people from the South were far more likely to say that frosting and icing were different words for the same thing. (Virginia isn’t really patterning with the rest of the South, here, but that may be due to bit of sampling bias–I recruited participants through my social network, and a lot of my friends are from Northern Virginia, which tends not to pattern with the South.)

mapUseThisOne — Most people in the South thought frosting and icing were the same thing, while outside of the South more people thought they were different things. (The darker the blue, the more likely someone from that state was to say that they were different things–black states I didn’t get any respondents from.)

Why is there a distinction? Honestly, I’m not really sure. My intuition, though, is that people from the South probably have pretty wide exposure to both terms. (Since books, TV and movies tend to come from outside of the South, there’s plenty of chances to come across other dialectal variants.) However, people from outside the South historically had less exposure to one of the terms–icing–when they started to come across it they decided that it must refer to something different. As a result, the meanings of both words changed to become more narrow. (This is actually a pretty common process in languages.) I don’t have strong evidence for this theory right now, though, so take it with a couple shakes of salt!

Age

Another thing I wanted to look at was whether the age of respondents played a role in how they used these words. If younger respondents seem to use the word differently than older respondents, it might be because there’s a change happening in the language. Given time, everyone might end up doing the same thing as the younger people.

age — While it looks like there’s a slight tendency for younger participants to say there’s a difference between frosting and icing, the effect isn’t strong enough to be reliable.

I didn’t find a strong pattern, though. Again, this might be due to sampling problems, since most of my respondents were roughly the same age (21-30). But it could also be that there’s simply not anything to find–that this is neither an on ongoing change, nor one where younger people and older people do things differently.

Baking Knowledge

Ok, so it looks like people are varying by region, but not by age… but what about by level of baking knowledge? Maybe you don’t care about the difference if you almost never make or eat baked goods. It could be that people who know a lot about baking make a distinction, and it’s only people who don’t know a beater from a dough hook that are lumping things together.

bakingExp — Baking knowledge also isn’t closely tied to how people use these words. So it’s not just that people who don’t know a lot about baking say they’re the same.

But that’s not what I found. People at all levels of baking knowledge tended to have a pretty even balance between the two uses of the words.

Comments

I also collected comments from people, to get more information on what people thought in their own words. Two big themes emerged. One was that the most consistent thing people pointed to as the difference was texture. The other was that people tended to say that one of them was for the cake and the other wasn’t… but which one was which was pretty much random.

Just under half of the comments mentioned texture. I’ve compiled some of the differences below, but the general consensus seems to be that frosting is thick, fluffy and soft, while icing is thin and hard. Take note, AP Stylebook!

Frosting	Icing
creamy or buttery	syrupy, like a glaze
	plasticy looking
spread	squeezed or piped
thick and creamy	thin, hardens as it dries
thicker
thicker	clear crust, dried
fluffy	thin
	thin layer, smooth, glossy
more solid, less flowing	watery, gooey
stays soft	hardens once it sets
thicker, softer	thinner, harder
thick, textured	thin, flat

Six people did specifically mention how the words could be used for cake toppings in their comments. Two people said cakes could be either frosted or iced, two said that cakes could only be iced, and two said that cakes could only be frosted. Here’s an example of an icing is for cakes comment:

icing is for cakes! frosting is for all the other deliciousness. usually.

And someone who suggests frosting is for cakes:

I usually apply the word frosting solely to cakelike goods (cupcakes, regular cake) and then icing to everything else.

So… if you are going to claim there’s a difference between frosting and icing, pulling the “it goes on cakes” card is pretty likely to start a fight. You’re much safer talking about texture. Unless you’re in the South, of course; then you can pretty much say what you like.

Is there a difference between frosting and icing? It looks like the answer mainly depends on where you are. But there were also some pretty interesting differences between different baked goods, so stay tuned for that part of the analysis.

P.S. If you’re interested in seeing the (slightly sanitized) data and the R code I used for analysis, both are available here.

Does white noise really help you study?

February 15, 2016 ~ Rachael Tatman ~ Leave a comment

So midterms have started here at the University of Washington (already, I know!) and I’m starting to notice more stressed-out study sessions. Around this time of year I always think about all the crazy study hints and tips I’ve heard over the years. (My personal favorite tip is to drink sage tea while I’m reading over notes–it’s been shown to help improve memory.) But one tip that people often share is that listening to white noise can help you concentrate while studying. Being the sort of person I am (read: huge nerd) I decided to set out and see what the research has to say about it.

First things first: some noises can definitely be bad for learning. For example, one study which compared schools near major airports (which are a big source of noise pollution) and some which were not found that children who were in the noisier environment had reduced reading comprehension. An earlier, similar study showed that students in classrooms near a very noisy train track did worse academically than those that were not.

And noisy environments are bad for concentration, too. One survey of office workers found that 99% of participants were bothered by noises like ringing telephones and conversations, and that the negative effects of these noises didn’t fade over time. And we know that some types of speech noise–especially half of a telephone conversation–are incredibly distracting.

Ok, so we know that some noise can hurt both learning and concentration… so why fight fire with fire? Wouldn’t listening to white noise just be more of the same? Or even worse?

Well, not necessarily. The really distracting thing about noise is that it’s not predictable. It’s pretty easy to “tune out” a clock ticking because your brain can figure out when it’s going to tick again. When a new noise suddenly starts, however, or keeps happening in an unpredictable way, like a faucet dripping juuuust out of rhythm, your attention snaps to it. There’s actually a special set of “novelty detector neurons” that are looking for any new types of sounds that might show up. There are two ways to avoid this happening. One is to make sure that all your environmental sounds are ones you can easily ignore… or you can cover them up. And white noise is very effective at covering up other noises.

White noise is random noise that covers a wide frequency spectrum, usually 20 to 20,000 Hz. That means that other sounds that are the same volume or quieter than the white noise can’t “get thorough”. As a result, you don’t hear anything surprising, your novelty detector neurons stay quiet, and you can focus on what you’re doing. And don’t take my word for it: this study shows that students who listened to a recording of office noises masked with white noise preformed much better on tasks then those who listened to the office noises unmasked.

Now, keep in mind, just because a noise is “white” doesn’t mean it’s good for you. Volume, for one thing, is very important. Exposing rats to 100-dB white noise for 45 minutes was enough for them to undergo measurable stress-induced neurological changes. To be fair, that’s about as loud as a power mower but it does takes you out of the “relaxed concentration” range. So grab your headphones and favorite white noise source (if you’ve no other options, a radio set to static will work just fine) but remember to keep the volume down!

How to Read a Linguistics Article in 8 Easy Steps

January 29, 2016February 1, 2016 ~ Rachael Tatman ~ Leave a comment

Disclaimer: this mostly applies to experimental or quantitative articles, since those are what are common in my field. Your milage, especially in more formal fields like syntax or semantics, may vary dramatically.

Ok, so you’re not a professional linguist or anything, but you’ve come across an article in a linguistics journal and it sounds interesting. Or maybe you’ve just taken your first linguistics class and you heard about something really cool you want to learn more about. But when you start reading you’re quickly swamped by terms you don’t understand, IPA symbols you’ve never seen before and all sorts of statistics. You’re tempted to just throw in the towel.

Don’t panic! I’m here to help you out with Rachael’s patented* guide to reading linguistics articles.

The first thing to do is take a deep breath and accept that you may not understand everything right away. That’s ok! If you could easily read scientific literature in a field it would mean you were already an expert. Academic writing is designed to be read by other academics, and so it’s full of terms that have very specific meanings in the field. It’s a sort of time-saving code and it takes time to learn. Don’t beat yourself up for being at the beginning of your journey!

With that in mind, here’s the steps I like to follow when I’m starting a new article, especially if it’s in a field I’m less familiar with.

Read the abstract. This will give you a broad outline of what the paper will be about and help you know if the whole article would be interesting or relevant for you.
I like to call this the “sandwich step”. I read the introduction and then the conclusion. Why? Again, this gives me idea about what will be in the article. Sure, there may be spoilers, but knowing the answer will make it easier to understand how questions were asked.
1. Notice any new terms that are both in the introduction and the abstract but don’t get explained? This might be a good time to look them up, since the author might be assuming you already know about it.
2. Some places to look up terms:
  1. The SIL linguistics glossary can be a good place to start.
  2. Linguistics topics on Wikipedia are also a good choice. Linguists even get together at professional events to edit and add to linguistics-related pages.
  3. For a bit more in-depth introduction, Language and Linguistics Compass publishes short articles written by experts that are designed to be introductions to whatever topic they’re on.
Flip through and look for any charts or figures and read their captions. These will be where the author(s) highlight their results. Now that you have a general idea about what’s going on you’ll have a better chance of interpreting these.
Next, read the background section. This is where the author will talk about things that other people have done and how thier work fits in to the big picture of the field. This is the second place you’re likely to find new terms you’re unfamiliar with. If they’re only used once or twice, don’t worry about looking them up. Your aim is to understand the general thrust of the article, not every little detail! (Now, if you’re a grad student, on the other hand… 😉 )
Now read the methods section. You can probably skim this; unless you’re interested in replicating the study or reviewing its merit you’re not going to have to have a full grasp of all the nitty-gritty nuances of item design and participant recruitment.
Finally read the results. Unless you have some stats background, you’re probably safe in skipping over the statistical analyses. Again, you just want to understand the general point.
Extra credit: Go back and read the abstract again. This is a very condensed version of what was in the article and is a good way to review/check your understanding.
Sit back and enjoy having read a linguistics article!

Grats on making it through! Now that you’ve caught the bug, what are some ways to find more stuff to read?

Go find one of the articles referenced in the one you just read. Since you’re already familiar with similar work, you’ll probably have an easier time understanding the new article.
Or read something more recent that cites the article you’ve read. You can look up articles that cite the one you’ve read on Google Scholar, as this video explains.
Look up other issues of the journal your paper was in. Most journals publish in a pretty narrow range of topics so you’ll have a leg up on understanding the new articles.
Ask a linguist! We’re a friendly bunch and pretty responsive to e-mail. You might even see if you can find the contact info of the author(s) of the article you read to ask them for suggestions for other stuff to read.

I hope this has been helpful and piqued your interest about diving into linguistics research. Now get out there are get reading!

*Not actually patented.

Why can you mumble “good morning” and still be understood?

January 17, 2016 ~ Rachael Tatman ~ Leave a comment

I got an interesting question on Facebook a while ago and though it might be a good topic for a blog post:

I say “good morning” to nearly everyone I see while I’m out running. But I don’t actually say “good”, do I? It’s more like “g’ morning” or “uh morning”. Never just morning by itself, and never a fully articulated good. Is there a name for this grunt that replaces a word? Is this behavior common among English speakers, only southeastern speakers, or only pre-coffee speakers?

This sort of thing is actually very common in speech, especially in conversation. (Or “in the wild” as us laboratory types like to call it.) The fancy-pants name for it is “hypoarticulation”. That’s less (hypo) speech-producing movements of the mouth and throat (articulation). On the other end of the spectrum you have “hyperarticulation” where you very. carefully. produce. each. individual. sound.

Ok, so you can change how much effort you put into producing speech sounds, fair enough. But why? Why don’t we just sort of find a happy medium and hang out there? Two reasons:

Humans are fundamentally lazy. To clarify: articulation costs energy, and energy is a limited resource. More careful articulation also takes more time, which, again, is a limited resource. So the most efficient speech will be very fast and made with very small articulator movements. Reducing the word “good” to just “g” or “uh” is a great example of this type of reduction.
On the other hand, we do want to communicate clearly. As my advisor’s fond of saying, we need exactly enough pointers to get people to the same word we have in mind. So if you point behind someone and say “er!” and it could be either a tiger or a bear, that’s not very helpful. And we’re very aware of this in production: there’s evidence that we’re more likely to hyperarticulate words that are harder to understand.

So we want to communicate clearly and unambiguously, but with as little effort as possible. But how does that tie in with this example? “G” could be “great” or “grass” or “génial “, and “uh” could be any number of things. For this we need to look outside the linguistic system.

The thing is, language is a social activity and when we’re using language we’re almost always doing so with other people. And whenever we interact with other people, we’re always trying to guess what they know. If we’re pretty sure someone can get to the word we mean with less information, for example if we’ve already said it once in the conversation, then we will expend less effort in producing the word. These contexts where things are really easily guessable are called “low entropy“. And in a social context like jogging past someone in the morning, phrases liked “good morning” have very low entropy. Much lower than, for example “Could you hand me that pickle?”–if you jogged past someone and said that you’d be very likely to hyperarticulate to make sure they understood.

Share this:

Boring technical details 💤

Exciting results! 😄

Share this:

Is ‘eh’ actually Canadian?

What does ‘eh’ mean?

Share this:

The Job Market for Linguistics PhDs

Grad School is Grueling

Share this:

What is linguistic discrimination?

What can you do?

Share this:

Cupcake

Doughnuts

Cake with fondant

Bundt Cake

Sweet Roll

Overview

Share this:

Are “frosting” and “icing” different, or are they different words for the same thing?

Region

Age

Comments

Share this:

Share this:

Disclaimer: this mostly applies to experimental or quantitative articles, since those are what are common in my field. Your milage, especially in more formal fields like syntax or semantics, may vary dramatically.

Share this:

Share this: