Does reading a story affect the way you talk afterwards? (Or: do linguistic tasks have carryover effects?)

So tomorrow is my generals exam (the title’s a bit misleading: I’m actually going to be presenting research I’ve done so my committee can decide if I’m ready to start work on my dissertation–fingers crossed!). I thought it might be interesting to discuss some of the research I’m going to be presenting in a less formal setting first, though. It’s not at the same level of general interest as the Twitter research I discussed a couple weeks ago, but it’s still kind of a cool project. (If I do say so myself.)

Plush bunny with headphones.jpg
Shhhh. I’m listening to linguistic data. “Plush bunny with headphones”. Licensed under Public Domain via Wikimedia Commons.

Basically, I wanted to know whether there are carryover effects for some of the mostly commonly-used linguistics tasks. A carryover effect is when you do something and whatever it was you were doing continues to affect you after you’re done. This comes up a lot when you want to test multiple things on the same person.

An example might help here. So let’s say you’re testing two new malaria treatments to see which one works best. You find some malaria patients, they agree to be in your study, and you give them treatment A and record thier results. Afterwards, you give them treatment B and again record their results. But if it turns out that treatment A cures Malaria (yay!) it’s going to look like treatment B isn’t doing anything, even if it is helpful, because everyone’s been cured of Malaria. So thier behavior in the second condition (treatment B) is affected by thier participation in the first condition (treatment A): the effects of treatment A have carried over.

There are a couple of ways around this. The easiest one is to split your group of participants in half and give half of them A first and half of them B first. However, a lot of times when people are using multiple linguistic tasks in the same experiment, then won’t do that. Why? Because one of the things that linguists–especially sociolinguists–want to control for is speech style. And there’s a popular idea in sociolinguistics that you can make someone talk more formally, but it’s really hard to make them talk less formally. So you tend to end up with a fixed task order going from informal tasks to more formal tasks.

So, we have two separate ideas here:

  • The idea that one task can affect the next, and so we need to change task order to control for that
  • The idea that you can only go from less formal speech to more formal speech, so you need to not change task order to control for that

So what’s a poor linguist to do? Balance task order to prevent carryover effects but risk not getting the informal speech they’re interested in? Or keep task order fixed to get informal and formal speech but at the risk of carryover effects? Part of the problem is that, even though they’re really well-studied in other fields like psychology, sociology or medicine, carryover effects haven’t really been studied in linguistics before. As a result, we don’t know how bad they are–or aren’t!

Which is where my research comes in. I wanted to see if there were carryover effects and what they might look like. To do this, I had people come into the lab and do a memory game that involved saying the names of weird-looking things called Fribbles aloud. No, not the milkshakes, one of the little purple guys below (although I could definitely go for a milkshake right now). Then I had them do one linguistic elicitation tasks (reading a passage, doing an interview, reading a list of words or, to control for the effects of just sitting there for a bit, an arithmetic task). Then I had them repeat the Fribble game. Finally, I compared a bunch of measures from speech I recorded during the two Fribble games to see if there was any differences.

Greeble designed by Scott Yu and hosted by the Tarr Lab wiki (click for link).
Greeble designed by Scott Yu and hosted by the Tarr Lab wiki (click for link).

What did I find? Well, first, I found the same thing a lot of other people have found: people tend to talk while doing different things. (If I hadn’t found that, then it would be pretty good evidence that I’d done something wrong when designing my experiment.) But the really exciting thing is that I found, for some specific measures, there weren’t any carryover effects. I didn’t find any carryover effects for speech speed, loudness or any changes in pitch. So if you’re looking at those things you can safely reorder your experiments to help avoid other effects, like fatigue.

But I did find that something a little more interesting was happening with the way people were saying their vowels. I’m not 100% sure what’s going on with that yet. The Fribble names were funny made-up words (like “Kack” and “Dut”) and I’m a little worried that what I’m seeing may be a result of that weirdness… I need to do some more experiments to be sure.

Still, it’s pretty exciting to find that there are some things it looks like you don’t need to worry about carryover effects for. That means that, for those things, you can have a static order to maintain the style continuum and it doesn’t matter. Or, if you’re worried that people might change what they’re doing as they get bored or tired, you can switch the order around to avoid having that affect your data.

Tweeting with an accent

I’m writing this blog post from a cute little tea shop in Victoria, BC. I’m up here to present at the Northwest Linguistics Conference, which is a yearly conference for both Canadian and American linguists (yes, I know Canadians are Americans too, but United Statsian sounds weird), and I thought that my research project may be interesting to non-linguists as well. Basically, I investigated whether it’s possible for Twitter users to “type with an accent”. Can linguists use variant spellings in Twitter data to look at the same sort of sound patterns we see in different speech communities?

Picture of a bird saying
Picture of a bird saying “Let’s Tawk”. Taken from the website of the Center for the Psychology of Women in Seattle. Click for link.

So if you’ve been following the Great Ideas in Linguistics series, you’ll remember that I wrote about sociolinguistic variables a while ago. If you didn’t, sociolinguistic variables are sounds, words or grammatical structures that are used by specific social groups. So, for example, in Southern American English (representing!) the sound in “I” is produced with only one sound, so it’s more like “ah”.

Now, in speech these sociolinguistic variables are very well studied. In fact, the Dictionary of American Regional English was just finished in 2013 after over fifty years of work. But in computer mediated communication–which is the fancy term for internet language–they haven’t been really well studied. In fact, some scholars suggested that it might not be possible to study speech sounds using written data. And on the surface of it, that does make sense. Why would you expect to be able to get information about speech sounds from a written medium? I mean, look at my attempt to explain an accent feature in the last paragraph. It would be far easier to get my point across using a sound file. That said, I’d noticed in my own internet usage that people were using variant spellings, like “tawk” for “talk”, and I had a hunch that they were using variant spellings in the same way they use different dialect sounds in speech.

While hunches have their place in science, they do need to be verified empirically before they can be taken seriously. And so before I submitted my abstract, let alone gave my talk, I needed to see if I was right. Were Twitter users using variant spellings in the same way that speakers use different sound patterns? And if they are, does that mean that we can investigate sound  patterns using Twitter data?

Since I’m going to present my findings at a conference and am writing this blog post, you can probably deduce that I was right, and that this is indeed the case. How did I show this? Well, first I picked a really well-studied sociolinguistic variable called the low back merger. If you don’t have the merger (most African American speakers and speakers in the South don’t) then you’ll hear a strong difference between the words “cot” and “caught” or “god” and “gaud”. Or, to use the example above, you might have a difference between the words “talk” and “tock”. “Talk” is little more backed and rounded, so it sounds a little more like “tawk”, which is why it’s sometimes spelled that way. I used the Twitter public API and found a bunch of tweets that used the “aw” spelling of common words and then looked to see if there were other variant spellings in those tweets. And there were. Furthermore, the other variant spellings used in tweets also showed features of Southern American English or African American English. Just to make sure, I then looked to see if people were doing the same thing with variant spellings of sociolinguistic variables associated with Scottish English, and they were. (If you’re interested in the nitty-gritty details, my slides are here.)

Ok, so people will sometimes spell things differently on Twitter based on their spoken language dialect. What’s the big deal? Well, for linguists this is pretty exciting. There’s a lot of language data available on Twitter and my research suggests that we can use it to look at variation in sound patterns. If you’re a researcher looking at sound patterns, that’s pretty sweet: you can stay home in your jammies and use Twitter data to verify findings from your field work. But what if you’re not a language researcher? Well, if we can identify someone’s dialect features from their Tweets then we can also use those features to make a pretty good guess about their demographic information, which isn’t always available (another problem for sociolinguists working with internet data). And if, say, you’re trying to sell someone hunting rifles, then it’s pretty helpful to know that they live in a place where they aren’t illegal. It’s early days yet, and I’m nowhere near that stage, but it’s pretty exciting to think that it could happen at some point down the line.

So the big take away is that, yes, people can tweet with an accent, and yes, linguists can use Twitter data to investigate speech sounds. Not all of them–a lot of people aren’t aware of many of their dialect features and thus won’t spell them any differently–but it’s certainly an interesting area for further research.

“Men” vs. “Females” and sexist writing

So, I have a confession to make. I actually set out to write a completely different blog post. In searching Wikimedia Commons for a picture, though, I came across something that struck me as odd. I was looking for pictures of people writing, and I noticed that there were two gendered sub-categories, one for men and one for women. Leaving aside the question of having only two genders, what really stuck out to me were the names. The category with pictures of men was called “Men Writing” and the category with pictures of women was called “Females Writing”.

Family 3
According to this sign, the third most common gender is “child”.
So why did that bother me? It is true that male humans are men and that women are female humans. Sure, a writing professor might nag about how the two terms lack parallelism, but does it really matter?

The thing is, it wouldn’t matter if this was just a one-off thing. But it’s not. Let’s look at the Category: Males and Category: Females*. At the top of the category page for men, it states “This category is about males in general. For human males, see Category:Male humans”. And the male humans category is, conveniently, the first subcategory. Which is fine, no problem there. BUT. There is no equivalent disclaimer at the top of Category: Females, and the first subcategory is not female humans but female animals. So even though “Females” is used to refer specifically to female humans when talking about writing, when talking about females in general it looks as if at least one editor has decided that it’s more relevant for referring to female animals. And that also gels with my own intuitions. I’m more like to ask “How many females?” when looking at a bunch of baby chickens than I am when looking at a bunch of baby humans. Assuming the editors responsible for these distinctions are also native English speakers, their intuitions are probably very similar.

So what? Well, it makes me uncomfortable to be referred to with a term that is primarily used for non-human animals while men are referred to with a term that I associate with humans. (Or, perhaps, women are being referred to as “female men”, but that’s equally odd and exclusionary.)

It took me a while to come to that conclusion. I felt that there was something off about the terminology, but I had to turn and talk it over with my officemate for a couple minutes before finally getting at the kernel of the problem. And I don’t think it’s a concious choice on the part of the editors–it’s probably something they don’t even realize they’re doing. But I definitely do think that it’s related to the gender imbalance of the editors of Wikimedia. According to recent statistics, over ninety percent (!) of Wikipedia editors are male. And this type of sexist language use probably perpetuates that imbalance. If I feel, even if it’s for reasons that I have a hard time articulating, that I’m not welcome in a community then I’m less likely to join it. And that’s not just me. Students who are presented with job descriptions in language that doesn’t match thier gender are less likely to be interested in those jobs. Women are less likely to respond to job postings if “he” is used to refer to both men and women. I could go on citing other studies, but we could end up being here all day.

My point is this: sexist language affects the behaviour and choices of those who hear it. And in this case, it makes me less likely to participate in this on-line community because I don’t feel as if I would be welcomed and respected there. It’s not only Wikipedia/Wikimedia, either. This particular usage pattern is also something I associate with Reddit (a good discussion here). The gender breakdown of Reddit? About 70% male.

For some reason, the idea that we should avoid sexist language usage seems to really bother people. I was once a TA for a large lecture class where, in the middle of discussions of the effects of sexist language, a male student interrupted the professor to say that he didn’t think it was a problem. I’ve since thought about it quite a bit (it was pretty jarring) and I’ve come to the conclusion that the reason the student felt that way is that, for him, it really wasn’t a problem. Since sexist language is almost always exclusionary to women, and he was not a woman, he had not felt that moment of discomfort before.

Further, I think he may have felt that, because this type of language tends to benefit men, he felt that we were blaming him. I want to be clear here: I’m not blaming anyone for thier unconscious biases. And I’m  not saying that only men use sexist language. The Wikimedia editors who made this choice may very well have been women. What I am saying is that we need to be aware of these biases and strive to correct them. It’s hard, and it takes constant vigilance, but it’s an important and relatively simple step that we can all take in order to help eliminate sexism.

*As they were on Wednesday, April 8 2015. If they’ve been changed, I’d recommend the Way Back Machine.

Great ideas in linguistics: Sociolinguistics

I’ll be the first to admit: for a long time, even after I’d begun my linguistics training, I didn’t really understand what sociolinguistics was. I had the idea that it mainly had to do with discourse analysis, which is certainly a fascinating area of study, but I wasn’t sure it was enough to serve as the basis for a major discipline of linguistics. Fortunately, I’ve learned a great deal about sociolinguistics since that time.

Sociolinguistics is the sub-field of linguistics that studies language in its social context and derives explanatory principles from it. By knowing about the language, we can learn something about a social reality and vice versa.

Now, at first glance this may seem so intuitive that it’s odd someone would to the trouble of stating it directly. As social beings, we know that the behaviour of people around us is informed by their identities and affiliations. At the extreme of things it can be things like having a cultural rule that literally forbids speaking to your mother-in-law, or requires replacing the letters “ck” with “cc” in all written communication. But there are more subtle rules in place as well, rules which are just as categorical and predictable and important. And if you don’t look at what’s happening with the social situation surrounding those linguistic rules, you’re going to miss out on a lot.

Case in point: Occasionally you’ll here phonologists talk about sound changes being in free variation, or rules that are randomly applied. BUT if you look at the social facts of the community, you’ll often find that there is no randomness at all. Instead, there are underlying social factors that control which option a person makes as they’re speaking. For example, if you were looking at whether people in Montreal were making r-sounds with the front or back of the tongue and you just sampled a bunch of them you might find that some people made it one way most of the time and others made it the other way most of the time. Which is interesting, sure, but doesn’t have a lot of explanatory power.

However, if you also looked at the social factors associated with it, and the characteristics of the individuals who used each r-sound, you might notice something interesting, as Clermont and Cedergren did (see the illustration). They found that younger speakers preferred the back-of-the-mouth r-sound, while older people tended to use the tip of the tongue instead. And that has a lot more explanatory power. Now we can start asking questions to get at the forces underlying that pattern: Is this the way the younger people have always talked, i.e. some sort of established youthful style, or is there a language change going on and they newer form is going to slowly take over? What causes younger speakers to use the the form they do? Is there also an effect of gender, or who you hang out with?

changes
Figure one from Sankoff and Blondeau. 2007. (Click picture to look at the whole study.) As you can see, younger speakers are using [R] more than older speakers, and the younger a speaker is the more likely they are to use [R].
And that’s why sociolinguistics is all kinds of awesome. It lets us peel away and reveal some of the complexity surrounding language. By adding sociological data to our studies, we can help to reduce statistical noise and reveal new and interesting things about how language works, what it means to be a language-user, and why we do what we do.

New series: 50 Great Ideas in Linguistics

As I’ve been teaching this summer (And failing to blog on a semi-regular basis like a loser. Mea culpa.) I’ll occasionally find that my students aren’t familiar with something I’d assumed they’d covered at some point already. I’ve also found that there are relatively few resources for looking up linguistic ideas that don’t require a good deal of specialized knowledge going in. SIL’s glossary of linguistic terms is good but pretty jargon-y, and the various handbooks tend not to have on-line versions. And even with a concerted effort by linguists to make Wikipedia a good resource, I’m still not 100% comfortable with recommending that my students use it.

Therefore! I’ve decided to make my own list of Things That Linguistic-Type People Should Know and then slowly work on expounding on them. I have something to point my students to and it’s a nice bite-sized way to talk about things; perfect for a blog.

Here, in no particular order, are 50ish Great Ideas of Linguistics sorted by sub-discipline. (You may notice a slightly sub-disciplinary bias.) I might change my mind on some of these–and feel free to jump in with suggestions–but it’s a start. Look out for more posts on them.

  • Sociolinguistics
    • Sociolinguistic variables
    • Social class and language
    • Social networks
    • Accommodation
    • Style
    • Language change
    • Linguistic security
    • Linguistic awareness
    • Covert and overt prestige
  • Phonetics
    • Places of articulation
    • Manners of articulation
    • Voicing
    • Vowels and consonants
    • Categorical perception
    • “Ease”
    • Modality
  • Phonology
    • Rules
    • Assimilation and dissimilation
    • Splits and mergers
    • Phonological change
  • Morphology
  • Syntax
  • Semantics
    • Pragmatics
    • Truth values
    • Scope
    • Lexical semantics
    • Compositional semantics
  • Computational linguistics
    • Classifiers
    • Natural Language Processing
    • Speech recognition
    • Speech synthesis
    • Automata
  • Documentation/Revitalization
    • Language death
    • Self-determination
  • Psycholinguistics

Meme Grammar

So the goal of linguistics is to find and describe the systematic ways in which humans use language. And boy howdy do we humans love using language systematically. A great example of this is internet memes.

What are internet memes? Well, let’s start with the idea of a “meme”. “Memes” were posited by Richard Dawkin in his book The Selfish Gene. He used the term to describe cultural ideas that are transmitted from individual to individual much like a virus or bacteria. The science mystique I’ve written about is a great example of a meme of this type. If you have fifteen minutes, I suggest Dan Dennett’s TED talk on the subject of memes as a much more thorough introduction.

So what about the internet part? Well, internet memes tend to be a bit narrower in their scope. Viral videos, for example, seem to be a separate category from intent memes even though they clearly fit into Dawkin’s idea of what a meme is. Generally, “internet meme” refers to a specific image and text that is associated with that image. These are generally called image macros. (For a through analysis of emerging and successful internet memes, as well as an excellent object lesson in why you shouldn’t scroll down to read the comments, I suggest Know Your Meme.) It’s the text that I’m particularly interested in here.

Memes which involve language require that it be used in a very specific way, and failure to obey these rules results in social consequences. In order to keep this post a manageable size, I’m just going to look at the use of language in the two most popular image memes, as ranked by memegenerator.net, though there is a lot more to study here. (I think a study of the differing uses of the initialisms MRW [my reaction when]  and MFW [my face when] on imgur and 4chan would show some very interesting patterns in the construction of identity in the two communities. Particularly since the 4chan community is made up of anonymous individuals and the imgur community is made up of named individuals who are attempting to gain status through points. But that’s a discussion for another day…)

The God tier (i.e. most popular) characters at on the website Meme Generator as of February 23rd, 2013. Click for link to site.
The God tier (i.e. most popular) characters at on the website Meme Generator as of February 23rd, 2013. Click for link to site. If you don’t recognize all of these characters, congratulations on not spending all your free time on the internet.

Without further ado, let’s get to the grammar. (I know y’all are excited.)

Y U No

This meme is particularly interesting because its page on Meme Generator already has a grammatical description.

The Y U No meme actually began as Y U No Guy but eventually evolved into simply Y U No, the phrase being generally followed by some often ridiculous suggestion. Originally, the face of Y U No guy was taken from Japanese cartoon Gantz’ Chapter 55: Naked King, edited, and placed on a pink wallpaper. The text for the item reads “I TXT U … Y U NO TXTBAK?!” It appeared as a Tumblr file, garnering over 10,000 likes and reblogs.

It went totally viral, and has morphed into hundreds of different forms with a similar theme. When it was uploaded to MemeGenerator in a format that was editable, it really took off. The formula used was : “(X, subject noun), [WH]Y [YO]U NO (Y, verb)?”[Bold mine.]

A pretty good try, but it can definitely be improved upon. There are always two distinct groupings of text in this meme, always in impact font, white with a black border and in all caps. This is pretty consistent across all image macros. In order to indicate the break between the two text chunks, I will use — throughout this post. The chunk of text that appears above the image is a noun phrase that directly addresses someone or something, often a famous individual or corporation. The bottom text starts with “Y U NO” and finishes with a verb phrase. The verb phrase is an activity or action that the addressee from the first block of text could or should have done, and that the meme creator considers positive. It is also inflected as if “Y U NO” were structurally equivalent to “Why didn’t you”. So, since you would ask Steve Jobs “Why didn’t you donate more money to charity?”, a grammatical meme to that effect would be “STEVE JOBS — Y U NO DONATE MORE MONEY TO CHARITY”. In effect, this meme questions someone or thing who had the agency to do something positive why they chose not to do that thing. While this certainly has the potential to be a vehicle for social commentary, like most memes it’s mostly used for comedic effect. Finally, there is some variation in the punctuation of this meme. While no punctuation is the most common, an exclamation points, a question mark or both are all used. I would hypothesize that the the use of punctuation varies between internet communities… but I don’t really have the time or space to get into that here.

A meme (created by me using Meme Generator) following the guidelines outlined above.

Futurama Fry

This meme also has a brief grammatical analysis

The text surrounding the meme picture, as with other memes, follows a set formula. This phrasal template goes as follows: “Not sure if (insert thing)”, with the bottom line then reading “or just (other thing)”. It was first utilized in another meme entitled “I see what you did there”, where Fry is shown in two panels, with the first one with him in a wide-eyed expression of surprise, and the second one with the familiar half-lidded expression.

As an example of the phrasal template, Futurama Fry can be seen saying: “Not sure if just smart …. Or British”. Another example would be “Not sure if highbeams … or just bright headlights”. The main form of the meme seems to be with the text “Not sure if trolling or just stupid”.

This meme is particularly interesting because there seems to an extremely rigid syntactic structure. The phrase follow the form “NOT SURE IF _____ — OR _____”. The first blank can either be filled by a complete sentence or a subject complement while the second blank must be filled by a subject complement. Subject complements, also called predicates (But only by linguists; if you learned about predicates in school it’s probably something different. A subject complement is more like a predicate adjective or predicate noun.), are everything that can come after a form of the verb “to be” in a sentence. So, in a sentence like “It is raining”, “raining” is the subject complement. So, for the Futurama Fry meme, if you wanted to indicate that you were uncertain whther it was raining or sleeting, both of these forms would be correct:

  • NOT SURE IF IT’S RAINING — OR SLEETING
  • NOT SURE IF RAINING — OR SLEETING

Note that, if a complete sentence is used and abbreviation is possible, it must be abbreviated. Thus the following sentence is not a good Futurama Fry sentence:

  • *NOT SURE IF IT IS RAINING — OR SLEETING

This is particularly interesting  because the “phrasal template” description does not include this distinction, but it is quite robust. This is a great example of how humans notice and perpetuate linguistic patterns that they aren’t necessarily aware of.

A meme (created by me using Meme Generator) following the guidelines outlined above. If you’re not sure whether it’s phonetics or phonology, may I recommend this post as a quick refresher?

So this is obviously very interesting to a linguist, since we’re really interested in extracting and distilling those patterns. But why is this useful/interesting to those of you who aren’t linguists? A couple of reasons.

  1. I hope you find it at least a little interesting and that it helps to enrich your knowledge of your experience as a human. Our capacity for patterning is so robust that it affects almost every aspect of our existence and yet it’s easy to forget that, to let our awareness of that slip our of our conscious minds. Some patterns deserve to be examined and criticized, though, and  linguistics provides an excellent low-risk training ground for that kind of analysis.
  2. If you are involved in internet communities I hope you can use this new knowledge to avoid the social consequences of violating meme grammars. These consequences can range from a gentle reprimand to mockery and scorn The gatekeepers of internet culture are many, vigilant and vicious.
  3. As with much linguistic inquiry, accurately noting and describing these patterns is the first step towards being able to use them in a useful way. I can think of many uses, for example, of a program that did large-scale sentiment analyses of image macros but was able to determine which were grammatical (and therefore more likely to be accepted and propagated by internet communities) and which were not.

Soda vs. Pop vs. Coke … Which is right?

Short answer: they’re all correct (at least in the United States) but some are more common in certain dialectal areas. Here’s a handy-dandy map, in case you were wondering:

Maps! Language! Still one of my favorite combinations. This particular map, and the data collection it’s based on is courtesy of popvssoda.com. Click picture for link and all the lovely statistics. (You do like statistics, right?)

Long answer: I’m going to sort this into reactions I tend to get after answering questions like this one.

What  do you mean they’re all correct? Coke/Soda/Pop is clearly wrong. Ok, I’ll admit, there are certain situations when you might need to choose to use one over the other. Say, if you’re writing for a newspaper with a very strict style guide. But otherwise, I’m sticking by my guns here: they’re all correct. How do I know? Because each of them in is current usage, and there is a dialectal group where it is the preferred term. Linguistics (at least the type of linguistics that studies dialectal variation) is all about describing what people actually say and people actually say all three.

But why doesn’t everyone just say the same thing? Wouldn’t that be easier? Easier to understand? Probably, yes. But people use different words for the same thing for the same reasons that they speak different languages. In a very, very simplified way, it kinda works like this:

  • You tend to speak like the people that you spend time with. That makes it easier for you to understand each other and lets other people in your social group know that you’re all members of the same group. Like team jerseys.
  • Over time, your group will introduce or adopt new linguistic makers that aren’t necessarily used by the whole population. Maybe a person you know refers to sodas as “phosphates” because his grandfather was a sodajerk and that form really catches on among your friends.
  • As your group keeps using and adopting new words (or sounds, or grammatical markers or any other facet of language)  that are different from other groups their language slowly begins to drift away from the language used by other groups.
  • Eventually, in extreme cases, you end up with separate languages. (Like what happened with Latin: different speech communities ended up speaking French, Italian, Spanish, Portuguese, and the other Romance languages rather than the Latin they’d shared under Roman rule.)

This is the process by which languages or dialectal communities tend to diverge. Divergence isn’t the only pressure on speakers, however. Particularly since we can now talk to and listen to people from basically anywhere (Yay internet! Yay TV! Yay radio!) your speech community could look like mine does: split between people from the Pacific Northwest and the South. My personal language use is slowly drifting from mostly Southern to a mix of Southern and Pacific Northwestern. This is called dialect leveling and it’s part of the reason why American dialectal regions tend include hundreds or thousands of miles instead of two or three.

Dialect leveling: Where two or more groups of people start out talking differently and end up talking alike. Schools tend to be a huge factor in this.

So, on the one hand, there is pressure to start all talking alike. On the other hand, however, I still want to sound like I belong with my Southern friends and have them understand me easily (and not be made fun of for sounding strange, let’s be honest) so when I’m talking to them I don’t retain very many markers of the Pacific Northwest. That’s pressure that’s keeping the dialect areas separate and the reason why I still say “soda”, even though I live in a “pop” region.

Huh. That’s pretty cool. Yep. Yep, it sure is.

Mapping language, language maps

So for some reason, I’ve come across three studies in quick succession based in mapping language. Now, if you know me, you know that nattering on about linguistic methodology is pretty much the Persian cat to my Blofeld, but I really do think that looking at the way that linguists do linguistics is incredibly important. (Warning: the next paragraph will be kinda preachy, feel free to skip it.)

It’s something the field, to paint with an incredibly broad brush, tends to skimp on. After all, we’re asking all these really interesting questions that have the potential to change people’s lives. How is hearing speech different from hearing other things? What causes language pathologies and how can we help correct them? Can we use the voice signal to reliably detect Parkinson’s over the phone? That’s what linguistics is. Who has time to look at whether asking  people to list the date on a survey form affects their responses? If linguists don’t use good, controlled methods to attempt to look at these questions, though, we’ll either find the wrong answers or miss it completely because of some confounding variable we didn’t think about. Believe me, I know firsthand how heart wrenching it is to design an experiment,  run subjects, do your stats and end up with a big pile of useless goo because your methodology wasn’t well thought out. It sucks. And it happens way more than it needs to, mainly because a lot of linguistics programs don’t stress rigorous scientific training.

OK, sermon over. Maps! I think using maps to look at language data is a great methodology! Why?

FraMauroMap
Hmm… needs more data about language. Also the rest of the continents, but who am I to judge? 
  1.  You get an end product that’s tangible and easy to read and use. People know what maps are and how to use them. Presenting linguistic data as a map rather than, say, a terabyte of detailed surveys or a thousand hours of recordings is a great way to make that same data accessible. Accessible data gets used. And isn’t that kind of the whole point?
  2. Maps are so. accurateright now. This means that maps of data aren’t  just rough approximations, they’re the best, most accurate way to display this information. Seriously, the stuff you can do with GIS is just mind blowing. (Check out this dialect map of the US. If you click on the region you’re most interested, you get additional data like field recordings, along with the precise place they were made. Super useful.)
  3. Maps are fun. Oh, come on, who doesn’t like looking at  maps? Particularly if you’re looking at a region you’re familiar with. See, here’s my high school, and the hay field we rented three years ago. Oh, and there’s my friend’s house! I didn’t realize they were so close to the highway. Add a second layer of information and BOOM, instant learning.

The studies

Two of the studies I came across were actually based on Twitter data. Twitter’s an amazing resource for studying linguistics because you have this enormous data set you can just use without having to get consent forms from every single person. So nice. Plus, because all tweets are archived, in the Library of Congress if nowhere else, other researchers can go back and verify things really easily.

This study looks at how novel slang expressions spread across the US. It hasn’t actually been published yet, so I don’t have the map itself, but they do talk about some interesting tidbits. For example: the places most likely to spawn new successful slang are urban centers with a high African American population.

The second Twitter study is based in London and looked at the different languages Londoners tweet in and did have a map:

Click for link to author’s blog post.

Interesting, huh? You can really get a good idea of the linguistic landscape of London. Although there were some potential methodological problems with this study, I still think it’s a great way to present this data.

The third study I came across is one that’s actually here at the University of Washington. This one is interesting because it kind of goes the other way. Basically, the researchers has respondents indicate areas on a map of Washington where they thought  language communities existed and then had them describe them.  So what you end up with is sort of a representation of the social ideas of what language is like in various parts of Washington state. Like so:

Click for link to study site.

There are lots more interesting maps on the study site, each of which shows some different perception of language use in Washington State. (My favorite is the one that suggests that people think other people who live right next to the Canadian border sound Canadian.)

So these are just a couple of the ways in which people are using maps to look at language data. I hope it’s a trend that continues.

Rap Reduplication

I love rap and hiphop. In addition to being a great example of how different cultural traditions can combine to create a uniquely American art form and fun to listen to (I don’t get much chance to stretch my English-major chops these days) it’s often a carrier of linguistic change. I mentioned one example of this earlier, when I was discussing language games and in-group/out-group language. But I’ve recently noticed another interesting linguistic phenomena in rap that you don’t really see in English very often: reduplication.

Whose work displays metrical complexity, rich cultural/literary/historical allusions and healthy lashing of dirty jokes? Trick question! It’s both. Man, I hope nobody in the future tries to claim that Jay-Z was actually Dick Chaney in disguise…
Reduplication is one of my favorite linguistic phenomena and a great example of a autological word. Basically, reduplication is a linguistic phenomena where you say the same thing twice. It’s also one of those rare phonological phenomena that are semantically meaningful. There are lots of ways to interpret what saying something twice means, but you there are a couple of pretty popular choices:

  • Probably the best English example is “Like like”, as in “I like him, but I don’t like like him.” It seems to serve as some sort of deintensifier (Yeah I just made that word up. Deal with it.) or to disambiguate between two possible meanings of the same word. It seems to serve to narrow the scope of the base word. So, “like-like” is a type of “like” and “holiday holiday” is a type of “holiday”. Apparently there’s a similar relationship in Italian and French (see comments).
  • In Koasati, (and Cree as well apparently) it’s used to indicate a repeated action. So it would be like if I said “cut-cut” in English to mean that I chopped something finely instead of cutting a piece off of  something.
  • In Mandarin it’s an almost juvenile marking, used to indicate “cuteness” or “smallness”. (You can see this in Hebrew as well.) You’ll sometimes see this in English, too, particularly from children.  If you hang out with young kids, keep your ears peeled for things like “bunbun” for “bunny”.
  • On the other, Mandarin also uses reduplication to indicate plurality. Khmer is another language that does this, and I think Japanese does as well. So that’s things like “bird” for one bird and “birdbird” or “bir-bird” for a flock of birds.
  • Finally, and this is what I think’s going on in rap, you’ll see reduplication to intensify things. Like I’d say a “red red” is a really intense red, or that someone who’s “short short” is really tiny.

I’ve been noticing this particularly with “truetrue”.  You can hear it in Chamillionare’s “I’m true”, both as “I’m true, I’m true” and “true true” in verse two. And Lil Wayne’s “My Homies Still” is absolutely rife with reduplication. You’ve got “click click” in the first line, and in verse four (which is Big Sean’s) you’ve got these lines:

Whoa, okay, boi this here’s what I do do
Got your sister dancing, not the kind that’s in a tutu
Got me in control, no strings attached, that’s that voodoo
She said can’t nobody do it better, I tell her, true true yep ***** true true
True true, my my bro bro say…

Of course, a grouping this concentrated speaks more towards an artistic choice than pervasive linguistic change… but it is something I’ve been noticing more and more. The earliest example I could find is GZA’s “True Fresh MC” from 1991, but I’m hesitant to call it reduplication, since there’s a definite pause between the first and second “true”.

Feel free to weigh in in the comments. Is this a legitimate trend or have I fallen prey to a recency illusion? Are there other examples that I’m missing? Is this something you say in everyday speech?

Language Games*

A lot of the games that we play as kids help us learn important life skills. “I spy”? Color recognition. “Peekaboo”? Object permanence. But what about language games? In English, you’ve got games like pig Latin, which has several versions. Most involve moving syllables or consonants from the front of a word to the end, and then adding “-ay”. It’s such a prevalent phenomena that there’s even a Google search in pig Latin.

And English isn’t alone in having language games like this. In fact, every language I’ve studied, including Nepali and Esperanto, has had some form of similar language game.

Codex Manesse 262v Herr Goeli
“Ekchay Atemay!”
“Roland, please stop being so infantile. This is backgammon and I know perfectly well you’re fluent in Liturgical Latin.”
The weird thing, though, is that it kinda looks like the only people that language games are really useful for is linguists.

Let’s look at syllables. If you’re a normal person, you only think about them when you’re forced to write a haiku for some reason. (Pro tip: In Japanese, it’s not the syllables that you count but the moras.) If you’re a linguist, though, you think about them all the time, and spend time arguing about whether or not they actually exist. One of the best arguments for syllables existing is that people can move them around relatively intuitively without even having a university degree in linguistics when language games require it. (I know, shocking, isn’t it?)

And you can use the existence of language games to argue that there’s a viable speaker community of any given language, a sort of measure of language health, like mayflies in streams; that’s a valuable indicator, since language death is a serious problem. Or you can even use them to argue that a language is alive in the first place.

The main use of language games for language users, however, seems to be the creation of smaller speech communities within larger communities.  But then, as a linguist, you probably already knew that. Keep an ear out for them in everyday life, however, and you might be surprised how often they tend to crop up–like the use of -izz  in early hip hop parlance.

*If you thought I was going to bring up Wittgenstein in a blog post meant for people with little to no background in linguistics you are a very silly person. Oh, alright, here. I hope you’re proud of yourself.