Can what you think you know about someone affect how you hear them?

I’ll get back to “a male/a female” question in my next blog post (promise!), but for now I want to discuss some of the findings from my dissertation research. I’ve talked about my dissertation research a couple times before, but since I’m going to be presenting some of it in Spain (you can read the full paper here), I thought it would be a good time to share some of my findings.

In my dissertation, I’m looking at how what you think you know about a speaker affects what you hear them say. In particular, I’m looking at American English speakers who have just learned to correctly identify the vowels of New Zealand English. Due to an on-going vowel shift, the New Zealand English vowels are really confusing for an American English speaker, especially the vowels in the words “head”, “head” and “had”.

tokensVowelPlot

This plot shows individual vowel tokens by the frequency of thier first and second formants (high-intensity frequency bands in the vowel). Note that the New Zealand “had” is very close to the US “head”, and the New Zealand “head” is really close to the US “hid”.

These overlaps can be pretty confusing when American English speakers are talking to New Zealand English speakers, as this Flight of the Conchords clip shows!

The good news is that, as language users, we’re really good at learning new varieties of languages we already know, so it only takes a couple minutes for an American English speaker to learn to correctly identify New Zealand English vowels. My question was this: once an American English speaker has learned to understand the vowels of New Zealand English, how do they know when to use this new understanding?

In order to test this, I taught twenty one American English speakers who hadn’t had much, if any, previous exposure to New Zealand English to correctly identify the vowels in the words “head”, “heed” and “had”. While I didn’t play them any examples of a New Zealand “hid”–the vowel in “hid” is said more quickly in addition to having different formants, so there’s more than one way it varies–I did let them say that they’d heard “hid”, which meant I could tell if they were making the kind of mistakes you’d expect given the overlap between a New Zealand “head” and American “hid”.

So far, so good: everyone quickly learned the New Zealand English vowels. To make sure that it wasn’t that they were learning to understand the one talker they’d been listening to, I tested half of my listeners on both American English and New Zealand English vowels spoken by a second, different talker. These folks I told where the talker they were listening to was from. And, sure enough, they transferred what they’d learned about New Zealand English to the new New Zealand speaker, while still correctly identifying vowels in American English.

The really interesting results here, though, are the ones that came from the second half the listeners. This group I lied to. I know, I know, it wasn’t the nicest thing to do, but it was in the name of science and I did have the approval of my institutional review board, (the group of people responsible for making sure we scientists aren’t doing anything unethical).

In an earlier experiment, I’d played only New Zealand English as this point, and when I told them the person they were listening to was from America, they’d completely changed the way they listened to those vowels: they labelled New Zealand English vowels as if they were from American English, even though they’d just learned the New Zealand English vowels. And that’s what I found this time, too. Listeners learned the New Zealand English vowels, but “undid” that learning if they thought the speaker was from the same dialect as them.

But what about when I played someone vowels from their own dialect, but told them the speaker was from somewhere else? In this situation, listeners ignored my lies. They didn’t apply the learning they’d just done. Instead, the correctly treated the vowels of thier own dialect as if they were, in fact, from thier dialect.

At first glance, this seems like something of a contradiction: I just said that listeners rely on social information about the person who’s talking, but at the same time they ignore that same social information.

So what’s going on?

I think there are two things underlying this difference. The first is the fact that vowels move. And the second is the fact that you’ve heard a heck of a lot more of your own dialect than one you’ve been listening to for fifteen minutes in a really weird training experiment.

So what do I mean when I say vowels move? Well, remember when I talked about formants above? These are areas of high acoustic energy that occur at certain frequency ranges within a vowel and they’re super important to human speech perception. But what doesn’t show up in the plot up there is that these aren’t just static across the course of the vowel–they move. You might have heard of “diphthongs” before: those are vowels where there’s a lot of formant movement over the course of the vowel.

And the way that vowels move is different between different dialects. You can see the differences in the way New Zealand and American English vowels move in the figure below. Sure, the formants are in different places—but even if you slid them around so that they overlapped, the shape of the movement would still be different.

formantDynamics

Comparison of how the New Zealand and American English vowels move. You can see that the shape of the movement for each vowel is really different between these two dialects.  

Ok, so the vowels are moving in different ways. But why are listeners doing different things between the two dialects?

Well, remember how I said earlier that you’ve heard a lot more of your own dialect than one you’ve been trained on for maybe five minutes? My hypothesis is that, for the vowels in your own dialect, you’re highly attuned to these movements. And when a scientist (me) comes along and tells you something that goes against your huge amount of experience with these shapes, even if you do believe them, you’re so used to automatically understanding these vowels that you can’t help but correctly identify them. BUT if you’ve only heard a little bit of a new dialect you don’t have a strong idea of what these vowels should sound like, so if you’re going to rely more on the other types of information available to you–like where you’re told the speaker is from–even if that information is incorrect.

So, to answer the question I posed in the title, can what you think you know about someone affect how you hear them? Yes… but only if you’re a little uncertain about what you heard in the first place, perhaps becuase it’s a dialect you’re unfamiliar with.

Advertisements

What does the National Endowment for the Humanities even do?

From the title, you might think this is a US-centric post. To a certain extent, it is. But I’m also going to be talking about topics that are more broadly of interest: what are some specific benefits of humanities research? And who should fund basic research? A lot has been written about these topics generally, so I’m going to be talking about linguistics and computational linguistics specifically.

This blog post came out of a really interesting conversation I had on Twitter the other day, sparked by this article on the potential complete elimination of both the National Endowment for the Humanities and the National Endowment for the Arts. During the course of the conversation, I realized that the person I was talking to (who was not a researcher, as far as I know) had some misconceptions about the role and reach of the NEH. So I thought it might be useful to talk about the role the NEH plays in my field, and has played in my own development as a researcher.

Curriculo

Oh this? Well, we don’t have funding to buy books anymore, so I put a picture of them in my office to remind myself they exist.

What does the NEH do?

I think the easiest way to answer this is to give you specific examples of projects that have been funded by the National Endowment for the Humanities, and talk about thier individual impacts. Keep in mind that this is just the tip of the iceberg; I’m only going to talk about projects that have benefitted my work in particular, and not even all of those.

  • Builds language teaching resources. One of my earliest research experiences was as a research assistance for Jack Martin, working with the Koasati tribe in Louisiana on a project funded by the NEH. The bulk of the work I did that summer was on a talking dictionary of the Koasati language, which the community especially wanted both as a record of the language and to support Koasati language courses. I worked with speakers to record the words for the dictionary, edit and transcribe the sound files to be put into the talking dictionaries. In addition to creating an important resource of the community, I learned important research skills that led me towards my current work on language variation. And the dictionary? It’s available on-line.
  • Helps fight linguistic discrimination. One of my main research topics is linguistic bias in automatic speech recognition (you can see some of that work here and here). But linguistic bias doesn’t only happen with computers. It’s a particularly pernicious form of discrimination that’s a big problem in education as well. As someone who’s both from the South and an educator, for example, I have purposefully cultivated my ability to speak mainstream American English becuase I know that, fair or not, I’ll be taken less seriously the more southern I sound. The NEH is at the forefront of efforts to help fight linguistic discrimination.
  • Document linguistic variation. This is a big one for my work, in particular: I draw on NEH-funded resources documenting linguistic variation in the United States in almost every research paper I write.

How does funding get allocated?

  • Which projects are funded is not decided by politicians. I didn’t realize this wasn’t common knowledge, but which projects get funded by federal funding agencies, including the NEH, NSF (which I’m currently being funded through) and NEA (National Endowment for the Arts) are not decided by politicians. This is a good thing–even the most accomplished politician can’t be expected to be an expert on everything from linguistics to history to architecture. You can see the breakdown of the process of allocating funding here.
  • Who looks at funding applications? Applications are peer reviewed, just like journal articles and other scholarly publications. The people looking at applications are top scholars in thier field. This means that they have a really good idea of which projects are going to have the biggest long-term impact, and that they can insure no one’s going to be reinventing the wheel.
  • How many projects are funded? All federal  research funding is extremely competitive, with many more applications submitted than accepted. At the NEH, this means as few as 6% of applications to a specific grant program will be accepted. This isn’t just free money–you have to make a very compelling case to a panel of fellow scholars that your project is truly exceptional.
  • What criteria are used to evaluate projects? This varies from grant to grant, but for the documenting endangered languages grant (which is what my work with the Koasati tribe was funded through), the evaluation criteria includes the following:
    • What is the potential for the proposed activity to
      1. Advance knowledge and understanding within its own field or across different fields (Intellectual Merit); and
      2. Benefit society or advance desired societal outcomes (Broader Impacts)?
    • To what extent do the proposed activities suggest and explore creative, original, or potentially transformative concepts?
    • Is the plan for carrying out the proposed activities well-reasoned, well-organized, and based on a sound rationale? Does the plan incorporate a mechanism to assess success?
    • How well qualified is the individual, team, or organization to conduct the proposed activities?
    • Are there adequate resources available to the PI (either at the home organization or through collaborations) to carry out the proposed activities?

Couldn’t this research be funded by businesses?

Sure, it could be. Nothing’s stopping companies from funding basic research in the humanities… but in my experience it’s not a priority, and they don’t. And that’s a real pity, because basic humanities research has a tendency of suddenly being vitally needed in other fields. Some examples from Natural Language Processing that have come up in just the last year:

  • Ethics: I’m currently taking what will  probably be my last class in graduate school. It’s a seminar course, filled with a mix of NLP researchers, electrical engineers and computer scientists, and we’re all reading… ethics texts. There’s been a growing awareness in the NLP and machine learning communities that algorithmic design and data selection is leading to serious negative social impacts (see this paper for some details). Ethics is suddenly taking center stage, and without the work of scholars working in the humanities, we’d be working up from first principles.
  • Pragmatics: Pragmatics, or the study of how situational factors affect meaning, is one of the more esoteric sub-disciplines in linguistics–many linguistics departments don’t even teach it as a core course. But one of the keynotes at the 2016 Empirical Methods in Natural Language Processing conference was about it (in NLP, conferences are the premier publication venue, so that’s a pretty big deal). Why? Because dialog systems, also known as chatbots, are a major research area right now. And modelling things like what you believe the person you’re talking to already knows is going to be critical to making interacting with them more natural.
  • Discourse analysis: Speaking of chatbots, discourse analysis–or the analysis of the structure of conversations–is another area of humanities research that’s been applied to a lot of computational systems. There are currently over 6000 ACL publications that draw on the discourse analysis literature. And given the strong interest in chatbots right now, I can only see that number going up.

These are all areas of research we’d traditionally consider humanities that have directly benefited the NLP community, and in turn many of the products and services we use day to day. But it’s hard to imagine companies supporting the work of someone working in the humanities whose work might one day benefit their products. These research programs that may not have an immediate impact but end up being incredibly important down-the-line is exactly the type of long-term investment in knowledge that the NEH supports, and that really wouldn’t happen otherwise.

Why does it matter?

“Now Rachael,” you may be saying, “your work definitely counts as STEM (science, technology, engineering and math). Why do you care so much about some humanities funding going away?”

I hope the reasons that I’ve outlined above help to make the point that humanities research has long-ranging impacts and is a good investment. NEH funding was pivotal in my development as a researcher. I would not be where I am today without early research experience on projects funded by the NEH.  And as a scholar working in multiple disciplines, I see how humanities research constantly enriches work in other fields, like engineering, which tend to be considered more desirable.

One final point: the National Endowment for the Humanities is, compared to other federal funding programs, very small indeed. In 2015 the federal government spent 146 million on the NEH, which was only 2% of the 7.1  billion dollar Department of Defense research budget. In other words, if everyone in the US contributed equally to the federal budget, the NEH would cost us each less than fifty cents a year. I think that’s a fair price for all of the different on-going projects the NEH funds, don’t you?

agencies3b

The entire National Endowment for the Humanities & National Endowment for the Arts, as well as the National Park Service research budget, all fit in that tiny “other” slice at the very top.

 

Six Linguists of Color (who you can follow on Twitter!)

In the light of some recent white supremacist propaganda showing up on my campus, I’ve decided to spotlight a tiny bit of the amazing work being done around the country by linguists of color. Each of the scholars below is doing interesting, important linguistics research and has a Twitter account that I personally enjoy following. If you’re on this blog, you probably will as well! I’ll give you a quick intro to their research and, if it piques your interest, you can follow them on Twitter for all the latest updates.

(BTW, if you’re wondering why I haven’t included any grad students on this list, it’s becuase we generally don’t have as well developed of a research trajectory and I want this to be a useful resource for at least a few years.)

Anne Charity Hudley

Dr. Charity Hudley is professor at the College of William and Mary (Go Tribe!). Her research focuses on language variation, especially the use of varieties such as African American English, in the classroom. If you know any teachers, they might find her two books on language variation in the classroom a useful resource. She and Christine Mallinson have even released an app to go with them!

Michel DeGraff

Dr. Michel DeGraff is a professor at MIT. His research is on Haitian Creole, and he’s been very active in advocating for the official recognition of Haitian Creole as a distinct language. If you’re not sure what Haitian Creole looks like, go check out his Twitter; many of his tweets are in the language! He’s also done some really cool work on using technology to teach low-resource languages.

Nelson Flores

Dr. Nelson Flores is a professor at the University of Pennsylvania. His work focuses on how we create the ideas of race and language, as well as bilingualism/multilingualism and bilingual education. I really enjoy his thought-provoking discussions of recent events on his Twitter account. He also runs a blog, which is a good resource for more in-depth discussion.

Nicole Holliday

Dr. Nicole Holliday is (at the moment) Chau Mellon Postdoctoral Scholar at Pomona College. Her research focuses on language use by biracial speakers. I saw her talk on how speakers use pitch differently depending on who they’re talking to at last year’s LSA meeting and it was fantastic: I’m really looking forwards to seeing her future work! She’s also a contributor to Word., an online journal about African American English.

Rupal Patel

Dr. Rupal Patel is a professor at Northeastern University, and also the founder and CEO of VocaliD. Her research focuses on the speech of speakers with developmental  disabilities, and how technology can ease communication for them. One really cool project she’s working on that you can get involved with is The Human Voicebank. This is collection of voices from all over the world that is used to make custom synthetic voices for those who need them for day-to-day communication. If you’ve got a microphone and a quiet room you can help out by recording and donating your voice.

John R. Rickford

Last, but definitely not least, is Dr. John Rickford, a professor at Stanford. If you’ve taken any linguistics courses, you’re probably already familiar with his work. He’s one of the leading scholars working on African American English and was crucial in bringing a research-based evidence to bare on the Ebonics controversy. If you’re interested, he’s also written a non-academic book on African American English that I would really highly recommend; it even won the American Book Award!

What’s a “bumpus”?

So I recently had a pretty disconcerting experience. It turns out that almost no one else has heard of a word that I thought was pretty common. And when I say “no one” I’m including dialectologists; it’s unattested in the Oxford English Dictionary and the Dictionary of American Regional English. Out of the twenty two people who responded to my Twitter poll (which was probably mostly other linguists, given my social networks) only one other person said they’d even heard the word and, as I later confirmed, it turned out to be one of my college friends.

So what is this mysterious word that has so far evaded academic inquiry? Ladies, gentlemen and all others, please allow me to introduce you to…

bumpis

Pronounced ‘bʌm.pɪs or ‘bʌm.pəs. You can hear me say the word and use it in context by listening to this low quality recording.

The word means something like “fool” or “incompetent person”. To prove that this is actually a real word that people other than me use, I’ve (very, very laboriously) found some examples from the internet. It shows up in the comments section of this news article:

THAT is why people are voting for Mr Trump, even if he does act sometimes like a Bumpus.

I also found it in a smattering of public tweets like this one:

If you ever meet my dad, please ask him what a “bumpus” is

And this one:

Having seen horror of war, one would think, John McCain would run from war. No, he runs to war, to get us involved. What a bumpus.

And, my personal favorite, this one:

because the SUN(in that pic) is wearing GLASSES god karen ur such a bumpus

There’s also an Urban Dictionary entry which suggests the definition:

A raucous, boisterous person or thing (usually african-american.)

I’m a little sceptical about the last one, though. Partly because it doesn’t line up with my own intuitions (I feel like a bumpus is more likely to be silent than rowdy) and partly becuase less popular Urban Dictionary entries, especially for words that are also names, are super unreliable.

I also wrote to my parents (Hi mom! Hi dad!) and asked them if they’d used the word growing up, in what contexts, and who they’d learned it from. My dad confirmed that he’d heard it growing up (mom hadn’t) and had a suggestion for where it might have come from:

I am pretty sure my dad used it – invariably in one of the two phrases [“don’t be a bumpus” or “don’t stand there like a bumpus”]….  Bumpass, Virginia is in Lousia County …. Growing up in Norfolk, it could have held connotations of really rural Virginia, maybe, for Dad.

While this is definitely a possibility, I don’t know that it’s definitely the origin of the word. Bumpass, Virginia, like  Bumpass Hell (see this review, which also includes the phrase “Don’t be a bumpass”), was named for an early settler. Interestingly, the college friend mentioned earlier is also from the Tidewater region of Virginia, which leads me to think that the word may have originated there.

My mom offered some other possible origins, that the term might be related to “country bumpkin” or “bump on a log”. I think the latter is especially interesting, given that “bump on a log” and “bumpus” show up in exactly the same phrase: standing/sitting there like a _______.

She also suggested it might be related to “bumpkis” or “bupkis”. This is a possibility, especially since that word is definitely from Yiddish and Norfolk, VA does have a history of Jewish settlement and Yiddish speakers.

A usage of “Bumpus” which seems to be the most common is in phrases like “Bumpus dog” or “Bumpus hound”. I think that this is probably actually a different use, though, and a direct reference to a scene from the movie A Christmas Story:

One final note is that there was a baseball pitcher in the late 1890’s who went by the nickname “Bumpus”: Bumpus Jones. While I can’t find any information about where the nickname came from, this post suggests that his family was from Virginia and that he had Powhatan ancestry.

I’m really interesting in learning more about this word and its distribution. My intuition is that it’s mainly used by older, white speakers in the South, possibly centered around the Tidewater region of Virginia.

If you’ve heard of or used this word, please leave a comment or drop me a line letting me know 1) roughly how old you are, 2) where you grew up and 3) (if you can remember) where you learned it. Feel free to add any other information you feel might be relevant, too!

 

Google’s speech recognition has a gender bias

In my last post, I looked at how Google’s automatic speech recognition worked with different dialects. To get this data, I hand-checked annotations  more than 1500 words from fifty different accent tag videos .

Now, because I’m a sociolinguist and I know that it’s important to stratify your samples, I made sure I had an equal number of male and female speakers for each dialect. And when I compared performance on male and female talkers, I found something deeply disturbing: YouTube’s auto captions consistently performed better on male voices than female voice (t(47) = -2.7, p < 0.01.) . (You can see my data and analysis here.)

accuarcyByGender

On average, for each female speaker less than half (47%) her words were captioned correctly. The average male speaker, on the other hand, was captioned correctly 60% of the time.

It’s not that there’s a consistent but small effect size, either, 13% is a pretty big effect. The Cohen’s d was 0.7 which means, in non-math-speak, that if you pick a random man and random woman from my sample, there’s an almost 70% chance the transcriptions will be more accurate for the man. That’s pretty striking.

What it is not, unfortunately, is shocking. There’s a long history of speech recognition technology performing better for men than women:

This is a real problem with real impacts on people’s lives. Sure, a few incorrect Youtube captions aren’t a matter of life and death. But some of these applications have a lot higher stakes. Take the medical dictation software study. The fact that men enjoy better performance than women with these technologies means that it’s harder for women to do their jobs. Even if it only takes a second to correct an error, those seconds add up over the days and weeks to a major time sink, time your male colleagues aren’t wasting messing with technology. And that’s not even touching on the safety implications of voice recognition in cars.

 

So where is this imbalance coming from? First, let me make one thing clear: the problem is not with how women talk. The suggestion that, for example, “women could be taught to speak louder, and direct their voices towards the microphone” is ridiculous. In fact, women use speech strategies that should make it easier for voice recognition technology to work on women’s voices.  Women tend to be more intelligible (for people without high-frequency hearing loss), and to talk slightly more slowly. In general, women also favor more standard forms and make less use of stigmatized variants. Women’s vowels, in particular, lend themselves to classification: women produce longer vowels which are more distinct from each other than men’s are. (Edit 7/28/2016: I have since found two papers by Sharon Goldwater, Dan Jurafsky and Christopher D. Manning where they found better performance for women than men–due to the above factors and different rates of filler words like “um” and “uh”.) One thing that may be making a difference is that women also tend not to be as loud, partly as a function of just being smaller, and cepstrals (the fancy math thing what’s under the hood of most automatic voice recognition) are sensitive to differences in intensity. This all doesn’t mean that women’s voices are more difficult; I’ve trained classifiers on speech data from women and they worked just fine, thank you very much. What it does mean is that women’s voices are different from men’s voices, though, so a system designed around men’s voices just won’t work as well for women’s.

Which leads right into where I think this bias is coming from: unbalanced training sets. Like car crash dummies, voice recognition systems were designed for (and largely by) men. Over two thirds of the authors in the  Association for Computational Linguistics Anthology Network are male, for example. Which is not to say that there aren’t truly excellent female researchers working in speech technology (Mari Ostendorf and Gina-Anne Levow here at the UW and Karen Livescu at TTI-Chicago spring immediately to mind) but they’re outnumbered. And that unbalance seems to extend to the training sets, the annotated speech that’s used to teach automatic speech recognition systems what things should sound like. Voxforge, for example, is a popular open source speech dataset that “suffers from major gender and per speaker duration imbalances.” I had to get that info from another paper, since Voxforge doesn’t have speaker demographics available on their website. And it’s not the only popular corpus that doesn’t include speaker demographics: neither does the AMI meeting corpus, nor the Numbers corpus.  And when I could find the numbers, they weren’t balanced for gender. TIMIT, which is the single most popular speech corpus in the Linguistic Data Consortium, is just over 69% male. I don’t know what speech database the Google speech recognizer is trained on, but based on the speech recognition rates by gender I’m willing to bet that it’s not balanced for gender either.

Why does this matter? It matters because there are systematic differences between men’s and women’s speech. (I’m not going to touch on the speech of other genders here, since that’s a very young research area. If you’re interested, the Journal of Language and Sexuality is a good jumping-off point.) And machine learning works by making computers really good at dealing with things they’ve already seen a lot of. If they get a lot of speech from men, they’ll be really good at identifying speech from men. If they don’t get a lot of speech from women, they won’t be that good at identifying speech from women. And it looks like that’s the case. Based on my data from fifty different speakers, Google’s speech recognition (which, if you remember, is probably the best-performing proprietary automatic speech recognition system on the market) just doesn’t work as well for women as it does for men.

The problem with the grammar police

I’ll admit it: I used to be a die-hard grammar corrector. I practically stalked around conversations with a red pen, ready to jump out and shout “gotcha!” if someone ended a sentence with a preposition or split an infinitive or said “irregardless”. But I’ve done a lot of learning and growing since then and, looking back, I’m kind of ashamed. The truth is, when I used to correct people’s grammar, I wasn’t trying to help them. I was trying to make myself look like a language authority, but in doing so I was actually hurting people. Ironically, I only realized this after years of specialized training to become an actual authority on language.

Chicago police officer on segway

I’ll let you go with a warning this time, but if I catch you using “less” for “fewer” again, I’ll have to give you a ticket.

But what do I mean when I say I was hurting people? Well, like some other types of policing, the grammar police don’t target everyone equally. For example, there has been a lot of criticism of Rihanna’s language use in her new single “Work” being thrown around recently. But that fact is that her language is perfectly fine. She’s just using Jamaican Patois, which most American English speakers aren’t familiar with. People claiming that the language use in “Work” is wrong is sort of similar to American English speakers complaining that Nederhop group ChildsPlay’s language use is wrong. It’s not wrong at all, it’s just different.

And there’s the problem. The fact is that grammar policing isn’t targeting speech errors, it’s targeting differences that are, for many people, perfectly fine. And, overwhelmingly, the people who make “errors” are marginalized in other ways. Here are some examples to show you what I mean:

  • Misusing “ironic”: A lot of the lists of “common grammar errors” you see will include a lot of words where the “correct” use is actually less common then other ways the word is used. Take “ironic”. In general use it can mean surprising or remarkable. If you’re a literary theorist, however, irony has a specific technical meaning–and if you’re not a literary theorist you’re going to need to take a course on it to really get what irony’s about. The only people, then, who are going to use this word “correctly” will be those who are highly educated. And, let’s be real, you know what someone means when they say ironic and isn’t that the point?
  • Overusing words like “just”: This error is apparently so egregious that there’s an e-mail plug-in, targeted mainly at women, to help avoid it. However, as other linguists have pointed out, not only is there limited evidence that women say “just” more than men, but even if there were a difference why would the assumption be that women were overusing “just”? Couldn’t it be that men aren’t using it enough?
  • Double negatives: Also called negative concord, this “error” happens when multiple negatives are used in a sentence, as in, “There isn’t nothing wrong with my language.” This particular construction is perfectly natural and correct in a lot of dialects of American English, including African American English and Southern English, not to mention the standard in some other languages, including French.

In each of these cases, the “error” in question is one that’s produced more by certain groups of people. And those groups of people–less educated individuals, women, African Americans–face disadvantages in other aspects of their life too. This isn’t a mistake or coincidence. When we talk about certain ways of talking, we’re talking about certain types of people. And almost always we’re talking about people who already have the deck stacked against them.

Think about this: why don’t American English speakers point out whenever the Queen of England says things differently? For instance, she often fails to produce the “r” sound in words like “father”, which is definitely not standardized American English. But we don’t talk about how the Queen is “talking lazy” or “dropping letters” like we do about, for instance,  “th” being produced as “d” in African American English. They’re both perfectly regular, logical language varieties that differ from standardized American English…but only one group gets flack for it.

Now I’m not arguing that language errors don’t exist, since they clearly do. If you’ve ever accidentally said a spoonerism or suffered from a tip of the tongue moment then you know what it feel like when your language system breaks down for a second. But here’s a fundamental truth of linguistics: barring a condition like aphasia, a native speaker of a language uses their language correctly. And I think it’s important for us all to examine exactly why it is that we’ve been led to believe otherwise…and who it is that we’re being told is wrong.

 

Does reading a story affect the way you talk afterwards? (Or: do linguistic tasks have carryover effects?)

So tomorrow is my generals exam (the title’s a bit misleading: I’m actually going to be presenting research I’ve done so my committee can decide if I’m ready to start work on my dissertation–fingers crossed!). I thought it might be interesting to discuss some of the research I’m going to be presenting in a less formal setting first, though. It’s not at the same level of general interest as the Twitter research I discussed a couple weeks ago, but it’s still kind of a cool project. (If I do say so myself.)

Plush bunny with headphones.jpg

Shhhh. I’m listening to linguistic data. “Plush bunny with headphones”. Licensed under Public Domain via Wikimedia Commons.

Basically, I wanted to know whether there are carryover effects for some of the mostly commonly-used linguistics tasks. A carryover effect is when you do something and whatever it was you were doing continues to affect you after you’re done. This comes up a lot when you want to test multiple things on the same person.

An example might help here. So let’s say you’re testing two new malaria treatments to see which one works best. You find some malaria patients, they agree to be in your study, and you give them treatment A and record thier results. Afterwards, you give them treatment B and again record their results. But if it turns out that treatment A cures Malaria (yay!) it’s going to look like treatment B isn’t doing anything, even if it is helpful, because everyone’s been cured of Malaria. So thier behavior in the second condition (treatment B) is affected by thier participation in the first condition (treatment A): the effects of treatment A have carried over.

There are a couple of ways around this. The easiest one is to split your group of participants in half and give half of them A first and half of them B first. However, a lot of times when people are using multiple linguistic tasks in the same experiment, then won’t do that. Why? Because one of the things that linguists–especially sociolinguists–want to control for is speech style. And there’s a popular idea in sociolinguistics that you can make someone talk more formally, but it’s really hard to make them talk less formally. So you tend to end up with a fixed task order going from informal tasks to more formal tasks.

So, we have two separate ideas here:

  • The idea that one task can affect the next, and so we need to change task order to control for that
  • The idea that you can only go from less formal speech to more formal speech, so you need to not change task order to control for that

So what’s a poor linguist to do? Balance task order to prevent carryover effects but risk not getting the informal speech they’re interested in? Or keep task order fixed to get informal and formal speech but at the risk of carryover effects? Part of the problem is that, even though they’re really well-studied in other fields like psychology, sociology or medicine, carryover effects haven’t really been studied in linguistics before. As a result, we don’t know how bad they are–or aren’t!

Which is where my research comes in. I wanted to see if there were carryover effects and what they might look like. To do this, I had people come into the lab and do a memory game that involved saying the names of weird-looking things called Fribbles aloud. No, not the milkshakes, one of the little purple guys below (although I could definitely go for a milkshake right now). Then I had them do one linguistic elicitation tasks (reading a passage, doing an interview, reading a list of words or, to control for the effects of just sitting there for a bit, an arithmetic task). Then I had them repeat the Fribble game. Finally, I compared a bunch of measures from speech I recorded during the two Fribble games to see if there was any differences.

Greeble designed by Scott Yu and hosted by the Tarr Lab wiki (click for link).

Greeble designed by Scott Yu and hosted by the Tarr Lab wiki (click for link).

What did I find? Well, first, I found the same thing a lot of other people have found: people tend to talk while doing different things. (If I hadn’t found that, then it would be pretty good evidence that I’d done something wrong when designing my experiment.) But the really exciting thing is that I found, for some specific measures, there weren’t any carryover effects. I didn’t find any carryover effects for speech speed, loudness or any changes in pitch. So if you’re looking at those things you can safely reorder your experiments to help avoid other effects, like fatigue.

But I did find that something a little more interesting was happening with the way people were saying their vowels. I’m not 100% sure what’s going on with that yet. The Fribble names were funny made-up words (like “Kack” and “Dut”) and I’m a little worried that what I’m seeing may be a result of that weirdness… I need to do some more experiments to be sure.

Still, it’s pretty exciting to find that there are some things it looks like you don’t need to worry about carryover effects for. That means that, for those things, you can have a static order to maintain the style continuum and it doesn’t matter. Or, if you’re worried that people might change what they’re doing as they get bored or tired, you can switch the order around to avoid having that affect your data.

Tweeting with an accent

I’m writing this blog post from a cute little tea shop in Victoria, BC. I’m up here to present at the Northwest Linguistics Conference, which is a yearly conference for both Canadian and American linguists (yes, I know Canadians are Americans too, but United Statsian sounds weird), and I thought that my research project may be interesting to non-linguists as well. Basically, I investigated whether it’s possible for Twitter users to “type with an accent”. Can linguists use variant spellings in Twitter data to look at the same sort of sound patterns we see in different speech communities?

Picture of a bird saying

Picture of a bird saying “Let’s Tawk”. Taken from the website of the Center for the Psychology of Women in Seattle. Click for link.

So if you’ve been following the Great Ideas in Linguistics series, you’ll remember that I wrote about sociolinguistic variables a while ago. If you didn’t, sociolinguistic variables are sounds, words or grammatical structures that are used by specific social groups. So, for example, in Southern American English (representing!) the sound in “I” is produced with only one sound, so it’s more like “ah”.

Now, in speech these sociolinguistic variables are very well studied. In fact, the Dictionary of American Regional English was just finished in 2013 after over fifty years of work. But in computer mediated communication–which is the fancy term for internet language–they haven’t been really well studied. In fact, some scholars suggested that it might not be possible to study speech sounds using written data. And on the surface of it, that does make sense. Why would you expect to be able to get information about speech sounds from a written medium? I mean, look at my attempt to explain an accent feature in the last paragraph. It would be far easier to get my point across using a sound file. That said, I’d noticed in my own internet usage that people were using variant spellings, like “tawk” for “talk”, and I had a hunch that they were using variant spellings in the same way they use different dialect sounds in speech.

While hunches have their place in science, they do need to be verified empirically before they can be taken seriously. And so before I submitted my abstract, let alone gave my talk, I needed to see if I was right. Were Twitter users using variant spellings in the same way that speakers use different sound patterns? And if they are, does that mean that we can investigate sound  patterns using Twitter data?

Since I’m going to present my findings at a conference and am writing this blog post, you can probably deduce that I was right, and that this is indeed the case. How did I show this? Well, first I picked a really well-studied sociolinguistic variable called the low back merger. If you don’t have the merger (most African American speakers and speakers in the South don’t) then you’ll hear a strong difference between the words “cot” and “caught” or “god” and “gaud”. Or, to use the example above, you might have a difference between the words “talk” and “tock”. “Talk” is little more backed and rounded, so it sounds a little more like “tawk”, which is why it’s sometimes spelled that way. I used the Twitter public API and found a bunch of tweets that used the “aw” spelling of common words and then looked to see if there were other variant spellings in those tweets. And there were. Furthermore, the other variant spellings used in tweets also showed features of Southern American English or African American English. Just to make sure, I then looked to see if people were doing the same thing with variant spellings of sociolinguistic variables associated with Scottish English, and they were. (If you’re interested in the nitty-gritty details, my slides are here.)

Ok, so people will sometimes spell things differently on Twitter based on their spoken language dialect. What’s the big deal? Well, for linguists this is pretty exciting. There’s a lot of language data available on Twitter and my research suggests that we can use it to look at variation in sound patterns. If you’re a researcher looking at sound patterns, that’s pretty sweet: you can stay home in your jammies and use Twitter data to verify findings from your field work. But what if you’re not a language researcher? Well, if we can identify someone’s dialect features from their Tweets then we can also use those features to make a pretty good guess about their demographic information, which isn’t always available (another problem for sociolinguists working with internet data). And if, say, you’re trying to sell someone hunting rifles, then it’s pretty helpful to know that they live in a place where they aren’t illegal. It’s early days yet, and I’m nowhere near that stage, but it’s pretty exciting to think that it could happen at some point down the line.

So the big take away is that, yes, people can tweet with an accent, and yes, linguists can use Twitter data to investigate speech sounds. Not all of them–a lot of people aren’t aware of many of their dialect features and thus won’t spell them any differently–but it’s certainly an interesting area for further research.

“Men” vs. “Females” and sexist writing

So, I have a confession to make. I actually set out to write a completely different blog post. In searching Wikimedia Commons for a picture, though, I came across something that struck me as odd. I was looking for pictures of people writing, and I noticed that there were two gendered sub-categories, one for men and one for women. Leaving aside the question of having only two genders, what really stuck out to me were the names. The category with pictures of men was called “Men Writing” and the category with pictures of women was called “Females Writing”.

Family 3

According to this sign, the third most common gender is “child”.

So why did that bother me? It is true that male humans are men and that women are female humans. Sure, a writing professor might nag about how the two terms lack parallelism, but does it really matter?

The thing is, it wouldn’t matter if this was just a one-off thing. But it’s not. Let’s look at the Category: Males and Category: Females*. At the top of the category page for men, it states “This category is about males in general. For human males, see Category:Male humans”. And the male humans category is, conveniently, the first subcategory. Which is fine, no problem there. BUT. There is no equivalent disclaimer at the top of Category: Females, and the first subcategory is not female humans but female animals. So even though “Females” is used to refer specifically to female humans when talking about writing, when talking about females in general it looks as if at least one editor has decided that it’s more relevant for referring to female animals. And that also gels with my own intuitions. I’m more like to ask “How many females?” when looking at a bunch of baby chickens than I am when looking at a bunch of baby humans. Assuming the editors responsible for these distinctions are also native English speakers, their intuitions are probably very similar.

So what? Well, it makes me uncomfortable to be referred to with a term that is primarily used for non-human animals while men are referred to with a term that I associate with humans. (Or, perhaps, women are being referred to as “female men”, but that’s equally odd and exclusionary.)

It took me a while to come to that conclusion. I felt that there was something off about the terminology, but I had to turn and talk it over with my officemate for a couple minutes before finally getting at the kernel of the problem. And I don’t think it’s a concious choice on the part of the editors–it’s probably something they don’t even realize they’re doing. But I definitely do think that it’s related to the gender imbalance of the editors of Wikimedia. According to recent statistics, over ninety percent (!) of Wikipedia editors are male. And this type of sexist language use probably perpetuates that imbalance. If I feel, even if it’s for reasons that I have a hard time articulating, that I’m not welcome in a community then I’m less likely to join it. And that’s not just me. Students who are presented with job descriptions in language that doesn’t match thier gender are less likely to be interested in those jobs. Women are less likely to respond to job postings if “he” is used to refer to both men and women. I could go on citing other studies, but we could end up being here all day.

My point is this: sexist language affects the behaviour and choices of those who hear it. And in this case, it makes me less likely to participate in this on-line community because I don’t feel as if I would be welcomed and respected there. It’s not only Wikipedia/Wikimedia, either. This particular usage pattern is also something I associate with Reddit (a good discussion here). The gender breakdown of Reddit? About 70% male.

For some reason, the idea that we should avoid sexist language usage seems to really bother people. I was once a TA for a large lecture class where, in the middle of discussions of the effects of sexist language, a male student interrupted the professor to say that he didn’t think it was a problem. I’ve since thought about it quite a bit (it was pretty jarring) and I’ve come to the conclusion that the reason the student felt that way is that, for him, it really wasn’t a problem. Since sexist language is almost always exclusionary to women, and he was not a woman, he had not felt that moment of discomfort before.

Further, I think he may have felt that, because this type of language tends to benefit men, he felt that we were blaming him. I want to be clear here: I’m not blaming anyone for thier unconscious biases. And I’m  not saying that only men use sexist language. The Wikimedia editors who made this choice may very well have been women. What I am saying is that we need to be aware of these biases and strive to correct them. It’s hard, and it takes constant vigilance, but it’s an important and relatively simple step that we can all take in order to help eliminate sexism.

*As they were on Wednesday, April 8 2015. If they’ve been changed, I’d recommend the Way Back Machine.

Great ideas in linguistics: Sociolinguistics

I’ll be the first to admit: for a long time, even after I’d begun my linguistics training, I didn’t really understand what sociolinguistics was. I had the idea that it mainly had to do with discourse analysis, which is certainly a fascinating area of study, but I wasn’t sure it was enough to serve as the basis for a major discipline of linguistics. Fortunately, I’ve learned a great deal about sociolinguistics since that time.

Sociolinguistics is the sub-field of linguistics that studies language in its social context and derives explanatory principles from it. By knowing about the language, we can learn something about a social reality and vice versa.

Now, at first glance this may seem so intuitive that it’s odd someone would to the trouble of stating it directly. As social beings, we know that the behaviour of people around us is informed by their identities and affiliations. At the extreme of things it can be things like having a cultural rule that literally forbids speaking to your mother-in-law, or requires replacing the letters “ck” with “cc” in all written communication. But there are more subtle rules in place as well, rules which are just as categorical and predictable and important. And if you don’t look at what’s happening with the social situation surrounding those linguistic rules, you’re going to miss out on a lot.

Case in point: Occasionally you’ll here phonologists talk about sound changes being in free variation, or rules that are randomly applied. BUT if you look at the social facts of the community, you’ll often find that there is no randomness at all. Instead, there are underlying social factors that control which option a person makes as they’re speaking. For example, if you were looking at whether people in Montreal were making r-sounds with the front or back of the tongue and you just sampled a bunch of them you might find that some people made it one way most of the time and others made it the other way most of the time. Which is interesting, sure, but doesn’t have a lot of explanatory power.

However, if you also looked at the social factors associated with it, and the characteristics of the individuals who used each r-sound, you might notice something interesting, as Clermont and Cedergren did (see the illustration). They found that younger speakers preferred the back-of-the-mouth r-sound, while older people tended to use the tip of the tongue instead. And that has a lot more explanatory power. Now we can start asking questions to get at the forces underlying that pattern: Is this the way the younger people have always talked, i.e. some sort of established youthful style, or is there a language change going on and they newer form is going to slowly take over? What causes younger speakers to use the the form they do? Is there also an effect of gender, or who you hang out with?

changes

Figure one from Sankoff and Blondeau. 2007. (Click picture to look at the whole study.) As you can see, younger speakers are using [R] more than older speakers, and the younger a speaker is the more likely they are to use [R].

And that’s why sociolinguistics is all kinds of awesome. It lets us peel away and reveal some of the complexity surrounding language. By adding sociological data to our studies, we can help to reduce statistical noise and reveal new and interesting things about how language works, what it means to be a language-user, and why we do what we do.