Tweeting with an accent

I’m writing this blog post from a cute little tea shop in Victoria, BC. I’m up here to present at the Northwest Linguistics Conference, which is a yearly conference for both Canadian and American linguists (yes, I know Canadians are Americans too, but United Statsian sounds weird), and I thought that my research project may be interesting to non-linguists as well. Basically, I investigated whether it’s possible for Twitter users to “type with an accent”. Can linguists use variant spellings in Twitter data to look at the same sort of sound patterns we see in different speech communities?

Picture of a bird saying
Picture of a bird saying “Let’s Tawk”. Taken from the website of the Center for the Psychology of Women in Seattle. Click for link.

So if you’ve been following the Great Ideas in Linguistics series, you’ll remember that I wrote about sociolinguistic variables a while ago. If you didn’t, sociolinguistic variables are sounds, words or grammatical structures that are used by specific social groups. So, for example, in Southern American English (representing!) the sound in “I” is produced with only one sound, so it’s more like “ah”.

Now, in speech these sociolinguistic variables are very well studied. In fact, the Dictionary of American Regional English was just finished in 2013 after over fifty years of work. But in computer mediated communication–which is the fancy term for internet language–they haven’t been really well studied. In fact, some scholars suggested that it might not be possible to study speech sounds using written data. And on the surface of it, that does make sense. Why would you expect to be able to get information about speech sounds from a written medium? I mean, look at my attempt to explain an accent feature in the last paragraph. It would be far easier to get my point across using a sound file. That said, I’d noticed in my own internet usage that people were using variant spellings, like “tawk” for “talk”, and I had a hunch that they were using variant spellings in the same way they use different dialect sounds in speech.

While hunches have their place in science, they do need to be verified empirically before they can be taken seriously. And so before I submitted my abstract, let alone gave my talk, I needed to see if I was right. Were Twitter users using variant spellings in the same way that speakers use different sound patterns? And if they are, does that mean that we can investigate sound  patterns using Twitter data?

Since I’m going to present my findings at a conference and am writing this blog post, you can probably deduce that I was right, and that this is indeed the case. How did I show this? Well, first I picked a really well-studied sociolinguistic variable called the low back merger. If you don’t have the merger (most African American speakers and speakers in the South don’t) then you’ll hear a strong difference between the words “cot” and “caught” or “god” and “gaud”. Or, to use the example above, you might have a difference between the words “talk” and “tock”. “Talk” is little more backed and rounded, so it sounds a little more like “tawk”, which is why it’s sometimes spelled that way. I used the Twitter public API and found a bunch of tweets that used the “aw” spelling of common words and then looked to see if there were other variant spellings in those tweets. And there were. Furthermore, the other variant spellings used in tweets also showed features of Southern American English or African American English. Just to make sure, I then looked to see if people were doing the same thing with variant spellings of sociolinguistic variables associated with Scottish English, and they were. (If you’re interested in the nitty-gritty details, my slides are here.)

Ok, so people will sometimes spell things differently on Twitter based on their spoken language dialect. What’s the big deal? Well, for linguists this is pretty exciting. There’s a lot of language data available on Twitter and my research suggests that we can use it to look at variation in sound patterns. If you’re a researcher looking at sound patterns, that’s pretty sweet: you can stay home in your jammies and use Twitter data to verify findings from your field work. But what if you’re not a language researcher? Well, if we can identify someone’s dialect features from their Tweets then we can also use those features to make a pretty good guess about their demographic information, which isn’t always available (another problem for sociolinguists working with internet data). And if, say, you’re trying to sell someone hunting rifles, then it’s pretty helpful to know that they live in a place where they aren’t illegal. It’s early days yet, and I’m nowhere near that stage, but it’s pretty exciting to think that it could happen at some point down the line.

So the big take away is that, yes, people can tweet with an accent, and yes, linguists can use Twitter data to investigate speech sounds. Not all of them–a lot of people aren’t aware of many of their dialect features and thus won’t spell them any differently–but it’s certainly an interesting area for further research.

Great Ideas in Linguistics: Consonants and Vowels

Consonants and vowels are one of the handful of linguistics terms that have managed to escape the cage of academic discourse to make their nest in the popular conciousness. Everyone knows what the difference between a vowel and a consonant is, right? Let’s check super quick. Pick the option below that best describes a vowel:

  • Easy! It’s A, E, I, O, U and sometimes Y.
  • A speech sound produced without constriction of the vocal tract above the glottis.

Everyone got the second one, right? No? Huh, maybe we’re not  on the same page after all.

There’s two problems with the “andsometimesY” definition of vowels. The first is that it’s based on the alphabet and, as I’ve discussed before, English has a serious problem when it comes to mapping sounds onto letters in a predictable way. (It gives you the very false impression that English has six-ish vowels when it really has twice that many.) The second is that isn’t really a good way of modelling what a vowel actually is. If we got a new letter in the alphabet tomorrow, zborp, we’d have no principled way of determining whether it was a vowel or not.

Letter dice d6.JPG
Ah, a new letter is it? Time to get out the old vowelizing dice and re-roll.  “Letter dice d6”. Licensed under CC BY-SA 3.0 via Wikimedia Commons.

But the linguistic definition captures some other useful qualities of vowels as well. Since vowels don’t have a sharp constriction, you get acoustic energy pretty much throughout the entire spectrum. Not all frequencies are created equal, however. In vowels, the shape of the vocal tract creates pockets of more concentrated acoustic energy. We call these “formants” and they’re so stable between repetitions of vowels that they can be used to identify which vowel it is. In fact, that’s what you’re using to distinguish “beat” from “bet” from “bit” when you hear them aloud. They’re also easy to measure, which means that speech technologies rely really heavily on them.

Another quality of vowels is that, since the whole vocal tract has to unkink itself (more or less) they tend to take a while to produce. And that same openness means that not much of the energy produced at the vocal folds is absorbed. In simple terms, this means that vowels tend to be longer and louder than other sounds, i.e. consonants. This creates a neat little one-two where vowels are both easier to produce and hear. As a result, languages tend to prefer to have quite a lot of vowels, and to tack consonants on to them. This tendency shakes out create a robust pattern across languages where you’ll get one or two consonants, then a vowel, then a couple consonants, then a vowel, etc. You’ve probably run across the term linguists use for those little vowel-nuggets: we call them syllables.

If you stick with the “andsometimesY” definition, though, you lose out on including those useful qualities. It may be easier to teach to five-year-olds, but it doesn’t really capture the essential vowelyness of vowels. Fortunately, the linguistics definition does.

New series: 50 Great Ideas in Linguistics

As I’ve been teaching this summer (And failing to blog on a semi-regular basis like a loser. Mea culpa.) I’ll occasionally find that my students aren’t familiar with something I’d assumed they’d covered at some point already. I’ve also found that there are relatively few resources for looking up linguistic ideas that don’t require a good deal of specialized knowledge going in. SIL’s glossary of linguistic terms is good but pretty jargon-y, and the various handbooks tend not to have on-line versions. And even with a concerted effort by linguists to make Wikipedia a good resource, I’m still not 100% comfortable with recommending that my students use it.

Therefore! I’ve decided to make my own list of Things That Linguistic-Type People Should Know and then slowly work on expounding on them. I have something to point my students to and it’s a nice bite-sized way to talk about things; perfect for a blog.

Here, in no particular order, are 50ish Great Ideas of Linguistics sorted by sub-discipline. (You may notice a slightly sub-disciplinary bias.) I might change my mind on some of these–and feel free to jump in with suggestions–but it’s a start. Look out for more posts on them.

  • Sociolinguistics
    • Sociolinguistic variables
    • Social class and language
    • Social networks
    • Accommodation
    • Style
    • Language change
    • Linguistic security
    • Linguistic awareness
    • Covert and overt prestige
  • Phonetics
    • Places of articulation
    • Manners of articulation
    • Voicing
    • Vowels and consonants
    • Categorical perception
    • “Ease”
    • Modality
  • Phonology
    • Rules
    • Assimilation and dissimilation
    • Splits and mergers
    • Phonological change
  • Morphology
  • Syntax
  • Semantics
    • Pragmatics
    • Truth values
    • Scope
    • Lexical semantics
    • Compositional semantics
  • Computational linguistics
    • Classifiers
    • Natural Language Processing
    • Speech recognition
    • Speech synthesis
    • Automata
  • Documentation/Revitalization
    • Language death
    • Self-determination
  • Psycholinguistics

The Acoustic Theory of Speech Perception

So, quick review: understanding speech is hard to model and the first model we discussed, motor theory, while it does address some problems, leaves something to be desired. The big one is that it doesn’t suggest that the main fodder for perception is the acoustic speech signal. And that strikes me as odd. I mean, we’re really used to thinking about hearing speech as a audio-only thing. Telephones and radios work perfectly well, after all, and the information you’re getting there is completely audio. That’s not to say that we don’t use visual, or, heck, even tactile data in speech perception. The McGurk effect, where a voice saying “ba” dubbed over someone saying “ga” will be perceived as “da” or “tha”, is strong evidence that we can and do use our eyes during speech perception. And there’s even evidence that a puff of air on the skin will change our perception of speech sounds. But we seem to be able to get along perfectly well without these extra sensory inputs, relying on acoustic data alone.

CPT-sound-physical-manifestation
This theory sounds good to me. Sorry, I’ll stop.
Ok, so… how do we extract information from acoustic data? Well, like I’ve said a couple time before, it’s actually a pretty complex problem. There’s no such thing as “invariance” in the speech signal and that makes speech recognition monumentally hard. We tend not to think about it because humans are really, really good at figuring out what people are saying, but it’s really very, very complex.

You can think about it like this: imagine that you’re looking for information online about platypuses. Except, for some reason, there is no standard spelling of platypus. People spell it “platipus”, “pladdypuss”, “plaidypus”, “plaeddypus” or any of thirty or forty other variations. Even worse, one person will use many different spellings and may never spell it precisely the same way twice. Now, a search engine that worked like our speech recognition works would not only find every instance of the word platypus–regardless of how it was spelled–but would also recognize that every spelling referred to the same animal. Pretty impressive, huh? Now imagine that every word have a very variable spelling, oh, and there are no spaces between words–everythingisjustruntogetherlikethisinonelongspeechstream. Still not difficult enough for you? Well, there is also the fact that there are ambiguities. The search algorithm would need to treat “pladypuss” (in the sense of  a plaid-patterned cat) and “palattypus” (in the sense of the venomous monotreme) as separate things. Ok, ok, you’re right, it still seems pretty solvable. So let’s add the stipulation that the program needs to be self-training and have an accuracy rate that’s incredibly close to 100%. If you can build a program to these specifications, congratulations: you’ve just revolutionized speech recognition technology. But we already have a working example of a system that looks a heck of a lot like this: the human brain.

So how does the brain deal with the “different spellings” when we say words? Well, it turns out that there are certain parts of a word that are pretty static, even if a lot of other things move around. It’s like a superhero reboot: Spiderman is still going to be Peter Parker and get bitten by a spider at some point and then get all moody and whine for a while. A lot of other things might change, but if you’re only looking for those criteria to figure out whether or not you’re reading a Spiderman comic you have a pretty good chance of getting it right. Those parts that are relatively stable and easy to look for we call “cues”. Since they’re cues in the acoustic signal, we can be even more specific and call them “acoustic cues”.

If you think of words (or maybe sounds, it’s a point of some contention) as being made up of certain cues, then it’s basically like a list of things a house-buyer is looking for in a house. If a house has all, or at least most, of the things they’re looking for, than it’s probably the right house and they’ll select that one. In the same way, having a lot of cues pointing towards a specific word makes it really likely that that word is going to be selected. When I say “selected”, I mean that the brain will connect the acoustic signal it just heard to the knowledge you have about a specific thing or concept in your head. We can think of a “word” as both this knowledge and the acoustic representation. So in the “platypuss” example above, all the spellings started with “p” and had an “l” no more than one letter away. That looks like a  pretty robust cue. And all of the words had a second “p” in them and ended with one or two tokens of “s”. So that also looks like a pretty robust queue. Add to that the fact that all the spellings had at least one of either a “d” or “t” in between the first and second “p” and you have a pretty strong template that would help you to correctly identify all those spellings as being the same word.

Which all seems to be well and good and fits pretty well with our intuitions (or mine at any rate). But that leaves us with a bit of a problem: those pesky parts of Motor Theory that are really strongly experimentally supported. And this model works just as well for motor theory too, just replace  the “letters” with specific gestures rather than acoustic cues. There seems to be more to the story than either the acoustic model or the motor theory model can offer us, though both have led to useful insights.

The Motor Theory of Speech Perception

Ok, so like I talked about in my previous two posts, modelling speech perception is an ongoing problem with a lot of hurdles left to jump. But there are potential candidate theories out there, all of which offer good insight into the problem. The first one I’m going to talk about is motor theory.

Clamp-Type 2C1.5-4 Motor
So your tongue is like the motor body and the other person’s ear are like the load cell…
So motor theory has one basic premise and three major claims.  The basic premise is a keen observation: we don’t just perceive speech sounds, we also make them. Whoa, stop the presses. Ok, so maybe it seems really obvious, but motor theory was really the first major attempt to model speech perception that took this into account. Up until it was first posited in the 1960’s , people had pretty much been ignoring that and treating speech perception like the only information listeners had access to was what was in the acoustic speech signal. We’ll discuss that in greater detail, later, but it’s still pretty much the way a lot of people approach the problem. I don’t know of a piece of voice recognition software, for example, that include an anatomical model.

So what’s the fact that listeners are listener/speakers get you? Well, remember how there aren’t really invariant units in the speech signal? Well, if you decide that what people are actually perceiving aren’t actually a collection of acoustic markers that point to one particular language sound but instead the gestures needed to make up that sound, then suddenly that’s much less of a problem. To put it in another way, we’re used to thinking of speech being made up of a bunch of sounds, and that when we’re listening speech we’re deciding what the right sounds are and from there picking the right words. But from a motor theory standpoint, what you’re actually doing when you’re listening to speech is deciding what the speaker’s doing with their mouth and using that information to figure out what words they’re saying. So in the dictionary in your head, you don’t store words as strings of sounds but rather as strings of gestures

If you’re like me when I first encountered this theory, it’s about this time that you’re starting to get pretty skeptical. I mean, I basically just said that what you’re hearing is the actual movement of someone else’s tongue and figuring out what they’re saying by reverse engineering it based on what you know your tongue is doing when you say the same word. (Just FYI, when I say tongue here, I’m referring to the entire vocal tract in its multifaceted glory, but that’s a bit of a mouthful. Pun intended. 😉 ) I mean, yeah, if we accept this it gives us a big advantage when we’re talking about language acquisition–since if you’re listening to gestures, you can learn them just by listening–but still. It’s weird. I’m going to need some convincing.

Well, let’s get back to the those three principles I mentioned earlier, which are taken from Galantucci, Flower and Turvey’s excellent review of motor theory.

  1. Speech is a weird thing to perceive and pretty much does its own thing. I’ve talked about this at length, so let’s just take that as a given for now.
  2. When we’re listening to speech, we’re actually listening to gestures. We talked about that above. 
  3. We use our motor system to help us perceive speech.

Ok, so point three should jump out at you a bit. Why? Of these three points, its the easiest one to test empirically. And since I’m a huge fan of empirically testing things (Science! Data! Statistics!) we can look into the literature and see if there’s anything that supports this. Like, for example, a study that shows that when listening to speech, our motor cortex gets all involved. Well, it turns out that there  are lots of studies that show this. You know that term “active listening”? There’s pretty strong evidence that it’s more than just a metaphor; listening to speech involves our motor system in ways that not all acoustic inputs do.

So point three is pretty well supported. What does that mean for point two? It really depends on who you’re talking to. (Science is all about arguing about things, after all.) Personally, I think motor theory is really interesting and address a lot of the problems we face in trying to model speech perception. But I’m not ready to swallow it hook, line and sinker. I think Robert Remez put it best in the proceedings of Modularity and The Motor Theory of Speech Perception:

I think it is clear that Motor Theory is false. For the other, I think the evidence indicates no less that Motor Theory is essentially, fundamentally, primarily and basically true. (p. 179)

On the one hand, it’s clear that our motor system is involved in speech perception. On the other, I really do think that we use parts of the acoustic signal in and of themselves. But we’ll get into that in more depth next week.

Why do I really, really love West African languages?

So I found a wonderful free app that lets you learn Yoruba, or at least Yoruba words,  and posted about it on Google plus. Someone asked a very good question: why am I interested in Yoruba? Well, I’m not interested just in Yoruba. In fact, I would love to learn pretty much any western African language or, to be a little more precise, any Niger-Congo language.

Niger-Congo-en
This map’s color choices make it look like a chocolate-covered ice cream cone.
Why? Well, not to put too fine a point on it, I’ve got a huge language crush on them. Whoa there, you might be thinking, you’re a linguist. You’re not supposed to make value judgments on languages. Isn’t there like a linguist code of ethics or something? Well, not really, but you are right. Linguists don’t usually make value judgments on languages. That doesn’t mean we can’t play favorites!  And West African languages are my favorites. Why? Because they’re really phonologically and phonetically interesting. I find the sounds and sound systems of these languages rich and full of fascinating effects and processes. Since that’s what I study within linguistics, it makes sense that that’s a quality I really admire in a language.

What are a few examples of Niger-Congo sound systems that are just mind blowing? I’m glad you asked.

  • Yoruba: Yoruba has twelve vowels. Seven of them are pretty common (we have all but one in American English) but if you say four of them nasally, they’re different vowels. And if you say a nasal vowel when you’re not supposed to, it’ll change the entire meaning of a word. Plus? They don’t have a ‘p’ or an ‘n’ sound. That is crazy sauce! Those are some of the most widely-used sounds in human language. And Yoruba has a complex tone system as well. You probably have some idea of the level of complexity that can add to a sound system if you’ve ever studied Mandarin, or another East Asian language. Seriously, their sound system makes English look childishly simplistic.
  • Akan: There are several different dialects of Akan, so I’ll just stick to talking about Asante, which is the one used in universities and for official business. It’s got a crazy consonant system. Remember how  Yoruba didn’t have an “n” sound? Yeah, in Akan they have nine. To an English speaker they all  pretty much sound the same, but if you grew up speaking Akan you’d be able to tell the difference easily. Plus, most sounds other than “p”, “b”, “f” or “m” can be made while rounding the lips (linguists call this “labialized” and are completely different sounds). They’ve also got a vowel harmony system, which means you can’t have vowels later in a word that are completely different from vowels earlier in the word. Oh, yeah, and tones and a vowel nasalization distinction and some really cool tone terracing. I know, right? It’s like being a kid in a candy store.

But how did these language get so cool? Well, there’s some evidence that these languages have really robust and complex sound systems because the people speaking them never underwent large-scale migration to another Continent. (Obviously, I can’t ignore the effects of colonialism or the slave trade, but it’s still pretty robust.) Which is not to say that, say, Native American languages don’t have awesome sound systems; just just tend to be slightly smaller on average.

Now that you know how kick-ass these languages, I’m sure you’re chomping at the bit to hear some of them. Your wish is my command; here’s a song in Twi (a dialect of Akan) from one of my all-time-favorite musicians: Sarkodie. (He’s making fun of Ghanaian emigrants who forget their roots. Does it get any better than biting social commentary set to a sick beat?)

Ask vs. Aks: Let me axe you a question

Do you know which one of these forms is the correct one? You sure about that?

Four things are inevitable: death, taxes, the eventual heat-death of the universe, and language change. All (living) languages are constantly in a state of flux, at all levels of the linguistic system. Meanings change, new structures come into being and old ones die out, words are born and die and pronunciations change. And no one, it seems, is happy about it. New linguistic forms tend to be the source of endless vitriol and argument, and language users love constructing rules that have more to do with social norms than linguistic reality. Rules that linguists create, which attempt to model the way language is used, are called “descriptive”, while rules that non-linguists create, which attempt to suggest how they believe language should be used, are called “prescriptive”. I’m not going to talk that much more about it here; if you’re interested, Language Log and Language Hippie both discuss the issue at length. The reason that I bring this up is that prescriptive rules tend to favor older forms. (An occasionally forms from other languages. That whole “don’t split an infinitive” thing? Based on Latin. English speakers have been happily splitting infinitives since the 13th century, and I imagine we’ll continue to boldly split them for centuries to come.) There is, however, one glaring exception: the whole [ask] vs. [aks] debate.

Axt zum spalten
In a way, it’s kinda like Theseus’ paradox or Abe Lincoln’s axe. If you replace all the sounds in a word one by one, it is the same word at the end of the process as it was in the beginning?
Historically, it’s [aks], the homophone of the chopping tool pictured above, that has precedence. Let’s take a look at the Oxford English Dictionary’s take on the history of the word, shall we?

The original long á gave regularly the Middle English (Kentish) ōxi ; but elsewhere was shortened before the two consonants, giving Middle English a , and, in some dialects, e . The result of these vowel changes, and of the Old English metathesis asc- , acs- , was that Middle English had the types ōx , ax , ex , ask , esk , ash , esh , ass , ess . The true representative of the orig. áscian was the s.w. and w.midl. ash , esh , also written esse (compare æsce ash n.1, wæsc(e)an wash n.), now quite lost. Acsian, axian, survived inax, down to nearly 1600 the regular literary form, and still used everywhere in midl. and southern dialects, though supplanted in standard English by ask, originally the northern form. Already in 15th cent. the latter was reduced dialectally to asse, past tense ast, still current dialectally.*

So, [aks] was the regular literary form (i.e. the one you would have been taught to say in school if you were lucky enough to have gone to school) until the 1600 or so? Ok, so, if older forms are better, than that should be the “right” one. Right? Well, let’s see what Urban Dictionary has to say on the matter, since that tends to be a  pretty good litmus test of language attitudes.

“What retards say when they don’t know how to pronounce the word ask.” — User marcotte on Urban Dictionary, top definition

Oh. Sorry, Chaucer, but I’m going to have to inform you that you were a retard who didn’t know how to pronounce the word ask. Let’s unpack what’s going on here a little bit, shall we? There’s clearly a disconnect between the linguistic facts and language attitudes.

  • Facts: these two forms have both existed for centuries, and [aks] was considered the “correct” form for much of that time.
  • Language attitude: [aks] is not only “wrong”, it reflects negatively on those people who use it, making them sound less intelligent and less educated.

This is probably (at least in America) tangled in with the fact that [aks] is a marker of African American English. Even within the African American community, the form is stigmatized. Oprah, for example, who often uses markers of African American English (especially when speaking with other African Americans) almost never uses [aks] for [ask]. So the idea that [aks] is the wrong form and that [ask] is correct is based on a social construction of how an intelligent, educated individual should speak. It has nothing to do with the linguistic qualities of the word itself. (For a really interesting discussion of how knowledge of linguistic forms is acquired by children and the relationship between that and animated films, see Lippi-Green’s chapter “Teaching children to discriminate” from English with an Accent: Language  ideology and discrimination in the United States here.)

Now, the interesting thing about these forms is that they both have phonological pressures pushing English speakers towards using them. That’s because [s] has a special place in English phonotactics. In general, you want the sounds that are the most sonorant nearer the center of a syllable. And [s] is more sonorant than [k], so it seems like [ask] should be the favored form. But, like I said, [s] is special. In “special”, for example, it comes at the very beginning of the word, before the less-sonorant [p]. And all the really long syllables in English, like “strengths”, have [s] on the end. So the special status of [s] seems to favor [aks]. The fact that each form can be modeled perfectly well based on our knowledge of the way English words are formed helps to explain why both forms continue to be actively used, even centuries after they emerged. And, who knows? We might decide that [aks] is the “correct” form again in another hundred years or so. Try and keep that in mind the next time you talk about the right and wrong ways to say something.

* “ask, v.”. OED Online. December 2012. Oxford University Press. 12 February 2013 <http://www.oed.com.offcampus.lib.washington.edu/view/Entry/11507&gt;.

Why is studying linguistics useful? *Is* studying linguistics useful?

So I recently gave a talk at the University of Washington Scholar’s Studio. In it, I covered a couple things that I’ve already talked about here on my blog: the fact that, acoustically speaking, there’s no such thing as a “word” and that our ears can trick us. My general point was that our intuitions about speech, a lot of the things we think seem completely obvious, actually aren’t true at all from an acoustic perspective.

What really got to me, though, was that after I’d finished my talk (and it was super fast, too, only five minutes) someone asked why it mattered. Why should we care that our intuitions don’t match reality? We can still communicate perfectly well. How is linguistics useful, they asked. Why should they care?

I’m sorry, what was it you plan to spend your life studying again? I know you told me last week, but for some reason all I remember you saying is “Blah, blah, giant waste of time.”

It was a good question, and I’m really bummed I didn’t have time to answer it. I sometimes forget, as I’m wading through a hip-deep piles of readings that I need to get to, that it’s not immediately obvious to other people why what I do is important. And it is! If I didn’t believe that, I wouldn’t be in grad school. (It’s certainly not the glamorous easy living and fat salary that keep me here.) It’s important in two main ways. Way one is the way in which it enhances our knowledge and way two is the way that it helps people.

 Increasing our knowledge. Ok, so, a lot of our intuitions are wrong. So what? So a lot of things! If we’re perceiving things that aren’t really there, or not perceiving things that are really there, something weird and interesting is going on. We’re really used to thinking of ourselves as pretty unbiased in our observations. Sure, we can’t hear all the sounds that are made, but we’ve built sensors for that, right? But it’s even more pervasive than that. We only perceive the things that our bodies and sensory organs and brains can perceive, and we really don’t know how all these biological filters work. Well, okay, we do know some things (lots and lots of things about ears, in particular) but there’s a whole lot that we still have left to learn. The list of unanswered questions in linguistics is a little daunting, even just in the sub-sub-field of perceptual phonetics.

Every single one of us uses language every single day. And we know embarrassingly little about how it works. And, what we do know, it’s often hard to share with people who have little background in linguistics. Even here, in my blog, without time restraints and an audience that’s already pretty interested (You guys are awesome!) I often have to gloss over interesting things. Not because I don’t think you’ll understand them, but because I’d metaphorically have to grow a tree, chop it down and spends hours carving it just to make a little step stool so you can get the high-level concept off the shelf and, seriously, who has time for that? Sometimes I really envy scientists in the major disciplines  because everyone already knows the basics of what they study. Imagine that you’re a geneticist, but before you can tell people you look at DNA, you have to convince them that sexual reproduction exists. I dream of the day when every graduating high school senior will know IPA. (That’s the international phonetic alphabet, not the beer.)

Okay, off the soapbox.

Helping people. Linguistics has lots and lots and lots of applications. (I’m just going to talk about my little sub-field here, so know that there’s a lot of stuff being left unsaid.) The biggest problem is that so few people know that linguistics is a thing. We can and want to help!

  • Foreign language teaching. (AKA applied linguistics) This one is a particular pet peeve of mine. How many of you have taken a foreign language class and had the instructor tell you something about a sound in the language, like: “It’s between a “k” and a “g” but more like the “k” except different.” That crap is not helpful. Particularly if the instructor is a native speaker of the language, they’ll often just keep telling you that you’re doing it wrong without offering a concrete way to make it correctly. Fun fact: There is an entire field dedicated to accurately describing the sounds of the world’s languages. One good class on phonetics and suddenly you have a concrete description of what you’re supposed to be doing with your mouth and the tools to tell when you’re doing it wrong. On the plus side, a lot language teachers are starting to incorporate linguistics into their curriculum with good results.
  • Speech recognition and speech synthesis. So this is an area that’s a little more difficult. Most people working on these sorts of projects right now are computational people and not linguists. There is a growing community of people who do both (UW offers a masters degree in computational linguistics that feeds lots of smart people into Seattle companies like Microsoft and Amazon, for example) but there’s definite room for improvement. The main tension is the fact that using linguistic models instead of statistical ones (though some linguistic models are statistical) hugely increases the need for processing power. The benefit is that accuracy  tends to increase. I hope that, as processing power continues to be easier and cheaper to access, more linguistics research will be incorporated into these applications. Fun fact: In computer speech recognition, an 80% comprehension accuracy rate in conversational speech is considered acceptable. In humans, that’s grounds to test for hearing or brain damage.
  • Speech pathology. This is a great field and has made and continues to make extensive use of linguistic research. Speech pathologists help people with speech disorders overcome them, and the majority of speech pathologists have an undergraduate degree in linguistics and a masters in speech pathology. Plus, it’s a fast-growing career field with a good outlook.  Seriously, speech pathology is awesome. Fun fact: Almost half of all speech pathologists work in school environments, helping kids with speech disorders. That’s like the antithesis of a mad scientist, right there.

And that’s why you should care. Linguistics helps us learn about ourselves and help people, and what else could you ask for in a scientific discipline? (Okay, maybe explosions and mutant sharks, but do those things really help humanity?)

Letters “r” lies, or why English spelling is horrible

If you’re like me and have vivid memories of learning to read English, you probably remember being deeply frustrated. As far as four-year-old-Rachael was concerned,  math was nice and simple: two and two always, always equals four. Not sometimes. Not only when it felt like it. All the time. Nice and simple.

Reading, and particularly phonics, on the other hand, was a minefield of dirty tricks. Oh, sure, they told us that each letter represented a single sound, but even a kid knows that’s hooey. Cough? Bough? Come on, that was like throwing sand in a fight; completely unfair. And what about those vowels? What and cut rhyme with each other, not cut and put. Even as phonics training was increasing my phonemic awareness, pushing me to pay more attention to the speech sounds I made, English orthography (that’s our spelling system) was dragging me behind the ball-shed and pulling out my hair in clumps. Metaphorically.

Books that make literacy fun
“Oh man, they’re trying to tell us that A makes the ‘Aaahhh’ sound. What do they take us for, complete idiots? Or is that ‘whaahhht’ to they take us for?”
“I know, right? One-to-one correspondence? Complete rubbish!”
Of course, I did eventually pass third grade and gain mastery of the written English language. But it was an uphill battle all the way. Why? Because English orthography is retarded. Wait. I’m sorry. That’s completely unfair to individuals suffering from retardation. English orthography is spiteful, contradictory and completely unsuited to representing the second most widely-spoken second language. This poem really highlights the problem:

Recovering Sounds from Orthography

Brush up Your English

I take it you already know
Of tough and bough and cough and dough?
Others may stumble but not you
On hiccough, thorough, slough and through.
Well done! And now you wish perhaps,
To learn of less familiar traps?Beware of heard, a dreadful word
That looks like beard and sounds like bird.
And dead, it’s said like bed, not bead-
for goodness’ sake don’t call it ‘deed’!
Watch out for meat and great and threat
(they rhyme with suite and straight and debt).

A moth is not a moth in mother,
Nor both in bother, broth, or brother,
And here is not a match for there,
Nor dear and fear for bear and pear,
And then there’s doze and rose and lose-
Just look them up- and goose and choose,
And cork and work and card and ward
And font and front and word and sword,
And do and go and thwart and cart-
Come, I’ve hardly made a start!
A dreadful language? Man alive!
I’d learned to speak it when I was five!
And yet to write it, the more I sigh,
I’ll not learn how ’til the day I die.

A dreadful language? Man alive! I mastered it when I was five.

— T.S. Watt (1954)

So why don’t we get our acts together and fix this mess? Well… trying to fix it is kind of the reason we’re in this mess in the first place. Basically, in renaissance England we started out with a basically phonetic spelling system. You actually sounded out words and wrote them as they sounded. “Aks” instead of “ask”, for example. (For what it’s worth, “aks” is the original pronunciation.) And you would be writing by hand. On very expensive parchment with very expensive quills and ink for very rich people.

Enter the printing press. Suddenly we can not only produce massive amounts of literature, but everyone can access them. Spelling goes from being something that only really rich people and scribes care about to a popular phenomena. And printing press owners were quick to capitalize on that phenomena  by printing spelling lists that showed the “correct” way to write words. Except there wasn’t a whole lot of agreement between the different printing houses and they were already so heavily invested in their own systems that they weren’t really willing to all switch over to a centralized system. By the time Samuel Johnson comes around to pin down every word of English like an entomologist in a field of butterflies, we have standardized spellings for most words… that all come from different systems developed by different people. And it’s just gotten more complex from there. One of the main reasons is that we keep shoving new words into the language without regard for how they’re spelled.

“The problem with defending the purity of the English language is that the English language is as pure as a crib-house whore. It not only borrows words from other languages; it has on occasion chased other languages down dark alley-ways, clubbed them unconscious and rifled their pockets for new vocabulary.”

― James Nicoll

There’s actually a sound in English, the zh sort of sound in “lesiure”, that only exists in words we’ve “borrowed” from other language and, of course, there’s no letter for it. Of course not; that would be too simple. And English detests simple. If you’re really interested in more of the gory details, there’s a great lecture you can listen to/watch here by Edwin Duncan which goes into way more detail on the historical background. Or you can just scroll through the Oxford English Dictionary and wince constantly.

 

Rap Reduplication

I love rap and hiphop. In addition to being a great example of how different cultural traditions can combine to create a uniquely American art form and fun to listen to (I don’t get much chance to stretch my English-major chops these days) it’s often a carrier of linguistic change. I mentioned one example of this earlier, when I was discussing language games and in-group/out-group language. But I’ve recently noticed another interesting linguistic phenomena in rap that you don’t really see in English very often: reduplication.

Whose work displays metrical complexity, rich cultural/literary/historical allusions and healthy lashing of dirty jokes? Trick question! It’s both. Man, I hope nobody in the future tries to claim that Jay-Z was actually Dick Chaney in disguise…
Reduplication is one of my favorite linguistic phenomena and a great example of a autological word. Basically, reduplication is a linguistic phenomena where you say the same thing twice. It’s also one of those rare phonological phenomena that are semantically meaningful. There are lots of ways to interpret what saying something twice means, but you there are a couple of pretty popular choices:

  • Probably the best English example is “Like like”, as in “I like him, but I don’t like like him.” It seems to serve as some sort of deintensifier (Yeah I just made that word up. Deal with it.) or to disambiguate between two possible meanings of the same word. It seems to serve to narrow the scope of the base word. So, “like-like” is a type of “like” and “holiday holiday” is a type of “holiday”. Apparently there’s a similar relationship in Italian and French (see comments).
  • In Koasati, (and Cree as well apparently) it’s used to indicate a repeated action. So it would be like if I said “cut-cut” in English to mean that I chopped something finely instead of cutting a piece off of  something.
  • In Mandarin it’s an almost juvenile marking, used to indicate “cuteness” or “smallness”. (You can see this in Hebrew as well.) You’ll sometimes see this in English, too, particularly from children.  If you hang out with young kids, keep your ears peeled for things like “bunbun” for “bunny”.
  • On the other, Mandarin also uses reduplication to indicate plurality. Khmer is another language that does this, and I think Japanese does as well. So that’s things like “bird” for one bird and “birdbird” or “bir-bird” for a flock of birds.
  • Finally, and this is what I think’s going on in rap, you’ll see reduplication to intensify things. Like I’d say a “red red” is a really intense red, or that someone who’s “short short” is really tiny.

I’ve been noticing this particularly with “truetrue”.  You can hear it in Chamillionare’s “I’m true”, both as “I’m true, I’m true” and “true true” in verse two. And Lil Wayne’s “My Homies Still” is absolutely rife with reduplication. You’ve got “click click” in the first line, and in verse four (which is Big Sean’s) you’ve got these lines:

Whoa, okay, boi this here’s what I do do
Got your sister dancing, not the kind that’s in a tutu
Got me in control, no strings attached, that’s that voodoo
She said can’t nobody do it better, I tell her, true true yep ***** true true
True true, my my bro bro say…

Of course, a grouping this concentrated speaks more towards an artistic choice than pervasive linguistic change… but it is something I’ve been noticing more and more. The earliest example I could find is GZA’s “True Fresh MC” from 1991, but I’m hesitant to call it reduplication, since there’s a definite pause between the first and second “true”.

Feel free to weigh in in the comments. Is this a legitimate trend or have I fallen prey to a recency illusion? Are there other examples that I’m missing? Is this something you say in everyday speech?