Hard science vs. soft science and the science mystique

So, recently I’ve been doing a lot of thinking and reading about what it means to do science, what science entails  and what is (and is not) science. Partly, this was sparked by the  fact that, at a recent middle school science education event, I was asked more than once why linguistics counted as a science. This intrigued me, as no one at the Lego robots display next to us had their discipline’s qualifications questioned, despite the fact that engineering is not scientific. Rigorous, yes. Scientific, no.

Science01science
Hmm, I dunno. Looks science-y, but I don’t see any lab coats. Or goggles. There should definitely be more goggles.
This subject is particularly near and dear to me because my own research looks into, among other things, how the ways in which linguists gather data affect the data they gather and the potential for systematic bias that introduces. In order to look at how we do things, I also need to know why. And that’s where this discussion of science comes in. This can be a hard discussion to have, however, since conversations about what science is, or should be, tends to get muddied by the popular conception of science. I’m not saying people don’t know what science is, ’cause I think most people do, just that we (and I include myself in that) also have a whole bucketful of other socially-motivated ideas that we tend to lump in with science.

I’m going to call the social stuff that we’ve learned to associate with science The Science Mystique. I’m not the first person to call it that, but I think it’s fitting. (Note that if you’re looking for the science of Mystique, you’ll need to look elsewhere.) To start in our exploration of the Science Mystique, let’s start with a quote from another popular science writer, Phil Plait.

They [the scientists who made the discoveries discussed earlier in the speech] used physics. They used math. They used chemistry, biology, astronomy, engineering.

They used science.

These are all the things you discovered doing your projects. All the things that brought you here today.

Computers? Cell phones? Rockets to Saturn, probes to the ocean floor, PSP, gamecubes, gameboys, X-boxes? All by scientists.

Those places I talked about before? You can get to know them too. You can experience the wonder of seeing them for the first time, the thrill of discovery, the incredible, visceral feeling of doing something no one has ever done before, seen things no one has seen before, know something no one else has ever known.

No crystal balls, no tarot cards, no horoscopes. Just you, your brain, and your ability to think.

Welcome to science. You’re gonna like it here.

Inspirational! Science-y! Misleading! Wait, what?

So there are a couple things here that I find really troubling, and I’m just going to break them down and go though them one by one. These are things that are part of the science mystique, that permeate our cultural conception of what science is, and I’ve encountered them over and over and over again. I’m just picking on this particular speech because it’s been slathered all over the internet lately and I’ve encountered a lot of people who really resonated with its message.

  1. Science and engineering and math are treated as basically the same thing.  This. This is one of my biggest pet peeves when it comes to talking about science. Yes, I know that STEM fields (that’s Science, Technology, Engineering and Mathematics) are often lumped together. Yes, I know that there’s a lot of cross-pollination. But one, and only one, of these fields has as its goal the creation of testable models. And that’s science. The goal of engineering is to make stuff. And I know just enough math to know that there’s no way I know what the goal of mathematics is. The takeaway here is that, no matter how “science-y” they may seem, how enfolded they are into the science mystique, neither math nor engineering is a science. 
  2. There’s an insinuation that “science” =  thinking and “non-science” = NOT thinking.  This is really closely tied in with the idea that you have to be smart to be a scientist. False. Absolutely false. In fact, raw intelligence isn’t even on my list of the top five qualities you need to be a scientist:
    1. Passion. You need to love what you do, because otherwise being in grad school for five to ten years while living under the poverty line and working sixty hour weeks just isn’t worth it.
    2. Dedication. See above.
    3. Creativity. Good scientists ask good questions, and coming up with a good but answerable question that no one has asked before and  that will help shed new light on whatever it is you’re studying takes lateral thinking.
    4. Excellent time management skills. Particularly if you’re working in a university setting. You need to be able to balance research, teaching and service, all while still maintaining a healthy life. It’s hard.
    5.  Intelligibility. A huge part of science is taking very complex concepts and explaining them clearly. To your students. To other scientists. To people on the bus. To people on the internet (Hi guys!). You can have everything else on this list in spades, but if you can’t express your ideas you’re going to sink like a lead duck.
  3. Science is progress! Right? Right? Yes. Absolutely. There is no way in which science has harmed the human race and no way in which things other than science have aided it. It sounds really silly when you just come out and say it, doesn’t it? I mean, we have the knowledge to eradicate polio, but because of social and political factors it hadn’t happened yet. And you can’t solve social problems by just throwing science at them. And then there’s the fact that, while the models themselves maybe morally neutral, the uses to which they are put are not always so. See Einstein and the bomb. See chemical and biological warfare. And, frankly, I think the greatest advances of the 20th century weren’t in science or engineering or technology. They were deep-seated changes in how we, particularly Americans, treated people. My great-grandmother couldn’t go to high school because she was a woman. My mother couldn’t take college-level courses because she was a woman, though she’s currently working on her degree.  Now, I’m a graduate student and my gender is almost completely irrelevant. Segregation is over. Same sex relationships are legally acknowledged by nine states and DC. That’s the progress I would miss most if a weeping angel got me.
  4. Go quantitative or go home.  I’ve noticed a strong bias towards quantitative data, to the point that a lot of people argue that it’s better than qualitative data. I take umbridge at this. Quantitative data is easier, not necessarily better. Easier? Absolutely. It’s easier to get ten people to agree that a banana is ten inches than it does to agree that it’s tasty. And yet, from a practical standpoint, banana growers want to grow tastier bananas, ones that will ship well and sell well, not longer bananas. But it can be hard to plug “banana tastiness” into your mathematical models and measuring “tastiness” leaves you open to criticism that your data collection is biased. (That’s not to say that qualitative data can’t be biased.) This idea that quantitative data is better leads to an overemphasis on the type of questions that can best be answered quantitatively and that’s a problem. This also leads some people to dismiss the “squishy” sciences that use mainly qualitative data and that’s also a problem. All branches of science help us to shed new light on the world and universe around us and to ignore work because it doesn’t fit the science mystique is a grave mistake.

So what can we do to help lessen the effects of these biases? To disentangle the science mystique from the actual science? Well, the best thing we can do is be aware of it. Critically examine the ways the people talk about science. Closely examine your own biases. I, for example, find it far too easy to slip into the “quantitative is better” trap. Notice systematic similarities and question them. Science is, after all, about asking questions.

Soda vs. Pop vs. Coke … Which is right?

Short answer: they’re all correct (at least in the United States) but some are more common in certain dialectal areas. Here’s a handy-dandy map, in case you were wondering:

Maps! Language! Still one of my favorite combinations. This particular map, and the data collection it’s based on is courtesy of popvssoda.com. Click picture for link and all the lovely statistics. (You do like statistics, right?)

Long answer: I’m going to sort this into reactions I tend to get after answering questions like this one.

What  do you mean they’re all correct? Coke/Soda/Pop is clearly wrong. Ok, I’ll admit, there are certain situations when you might need to choose to use one over the other. Say, if you’re writing for a newspaper with a very strict style guide. But otherwise, I’m sticking by my guns here: they’re all correct. How do I know? Because each of them in is current usage, and there is a dialectal group where it is the preferred term. Linguistics (at least the type of linguistics that studies dialectal variation) is all about describing what people actually say and people actually say all three.

But why doesn’t everyone just say the same thing? Wouldn’t that be easier? Easier to understand? Probably, yes. But people use different words for the same thing for the same reasons that they speak different languages. In a very, very simplified way, it kinda works like this:

  • You tend to speak like the people that you spend time with. That makes it easier for you to understand each other and lets other people in your social group know that you’re all members of the same group. Like team jerseys.
  • Over time, your group will introduce or adopt new linguistic makers that aren’t necessarily used by the whole population. Maybe a person you know refers to sodas as “phosphates” because his grandfather was a sodajerk and that form really catches on among your friends.
  • As your group keeps using and adopting new words (or sounds, or grammatical markers or any other facet of language)  that are different from other groups their language slowly begins to drift away from the language used by other groups.
  • Eventually, in extreme cases, you end up with separate languages. (Like what happened with Latin: different speech communities ended up speaking French, Italian, Spanish, Portuguese, and the other Romance languages rather than the Latin they’d shared under Roman rule.)

This is the process by which languages or dialectal communities tend to diverge. Divergence isn’t the only pressure on speakers, however. Particularly since we can now talk to and listen to people from basically anywhere (Yay internet! Yay TV! Yay radio!) your speech community could look like mine does: split between people from the Pacific Northwest and the South. My personal language use is slowly drifting from mostly Southern to a mix of Southern and Pacific Northwestern. This is called dialect leveling and it’s part of the reason why American dialectal regions tend include hundreds or thousands of miles instead of two or three.

Dialect leveling: Where two or more groups of people start out talking differently and end up talking alike. Schools tend to be a huge factor in this.

So, on the one hand, there is pressure to start all talking alike. On the other hand, however, I still want to sound like I belong with my Southern friends and have them understand me easily (and not be made fun of for sounding strange, let’s be honest) so when I’m talking to them I don’t retain very many markers of the Pacific Northwest. That’s pressure that’s keeping the dialect areas separate and the reason why I still say “soda”, even though I live in a “pop” region.

Huh. That’s pretty cool. Yep. Yep, it sure is.

Why is it so hard for computers to recognize speech?

This is a problem that’s plagued me for quite a while. I’m not a computational linguist  myself, but one of the reasons that theoretical linguistics is important is that it allows us to create robust concpetional models of language… which is basically what voice recognition (or synthesis) programs are. But, you may say to yourself, if it’s your job to create and test robust models, you’re clearly not doing very well. I mean, just listen to this guy. Or this guy. Or this person, whose patience in detailing errors borders on obsession. Or, heck, this person, who isn’t so sure that voice recognition is even a thing we need.

Electronic eye
You mean you wouldn’t want to be able to have pleasant little chats with your computer? I mean, how could that possibly go wrong?
Now, to be fair to linguists, we’ve kinda been out of the loop for a while. Fred Jelinek, a very famous researcher in speech recognition, once said “Every time we fire a phonetician/linguist, the performance of our system goes up”. Oof, right in the career prospects. There was, however, a very good reason for that, and it had to do with the pressures on computer scientists and linguists respectively. (Also a bunch of historical stuff that we’re not going to get into.)

Basically, in the past (and currently to a certain extent) there was this divide in linguistics. Linguists wanted to model speaker’s competence, not their performance. Basically, there’s this idea that there is some sort of place in your brain where you knew all the rules of language and  have them all perfectly mapped out and described. Not in a consious way, but there nonetheless. But somewhere between the magical garden of language and your mouth and/or ears you trip up and mistakes happen. You say a word wrong or mishear it or switch bits around… all sorts of things can go wrong. Plus, of course, even if we don’t make a recognizable mistake, there’s a incredible amount of variation that we can decipher without a problem. That got pushed over to the performance side, though, and wasn’t looked at as much. Linguistics was all about what was happening in the language mind-garden (the competence) and not the messy sorts of things you say in everyday life (the performance). You can also think of it like what celebrities actually say in an interview vs. what gets into the newspaper; all the “um”s and “uh”s are taken out, little stutters or repetitions are erased and if the sentence structure came out a little wonky the reporter pats it back into shape. It was pretty clear what they meant to say, after all.

So you’ve got linguists with their competence models explaining them to the computer folks and computer folks being all clever and mathy and coming up with algorithms that seem to accurately model our knowledge of human linguistic competency… and getting terrible results. Everyone’s working hard and doing their best and it’s just not working.

I think you can probably figure out why: if you’re a computer and just sitting there with very little knowledge of language (consider that this was before any of the big corpora were published, so there wasn’t a whole lot of raw data) and someone hands you a model that’s supposed to handle only perfect data and also actual speech data, which even under ideal conditions is far from perfect, you’re going to spit out spaghetti and call it a day. It’s a bit like telling someone to make you a peanut butter and jelly sandwich and just expecting them to do it. Which is fine if they already know what peanut butter and jelly are, and where you keep the bread, and how to open jars, and that food is something humans eat, so you shouldn’t rub it on anything too covered with bacteria or they’ll get sick and die. Probably not the best way to go about it.

So the linguists got the boot and they and the computational people pretty much did their own things for a bit. The model that most speech recognition programs use today is mostly statistical, based on things like how often a word shows up in whichever corpus they’re using currently. Which works pretty well. In a quiet room. When you speak clearly. And slowly. And don’t use any super-exotic words. And aren’t having a conversation. And have trained the system on your voice. And have enough processing power in whatever device you’re using. And don’t get all wild and crazy with your intonation. See the problem?

Language is incredibly complex and speech recognition technology, particularly when it’s based on a purely statistical model, is not terrific at dealing with all that complexity. Which is not to say that I’m knocking statistical models! Statistical phonology is mind-blowing and I think we in linguistics will get a lot of mileage from it. But there’s a difference. We’re not looking to conserve processing power: we’re looking to model what humans are actually doing. There’s been a shift away from the competency/performance divide (though it does still exist) and more interest in modelling the messy stuff that we actually see: conversational speech, connected speech, variation within speakers. And the models that we come up with are complex. Really complex. People working in Exemplar Theory, for example, have found quite a bit of evidence that you remember everything you’ve ever heard and use all of it to help parse incoming signals. Yeah, it’s crazy. And it’s not something that our current computers can do. Which is fine; it give linguists time to further refine our models. When computers are ready, we will be too, and in the meantime computer people and linguistic people are showing more and more overlap again, and using each other’s work more and more. And, you know, singing Kumbayah and roasting marshmallows together. It’s pretty friendly.

So what’s the take-away? Well, at least for the moment, in order to get speech recognition to a better place than it is now, we need  to build models that work for a system that is less complex than the human brain. Linguistics research, particularly into statistical models, is helping with this. For the future? We need to build systems that are as complex at the human brain. (Bonus: we’ll finally be able to test models of child language acquisition without doing deeply unethical things! Not that we would do deeply unethical things.) Overall, I’m very optimistic that computers will eventually be able to recognize speech as well as humans can.

TL;DR version:

  • Speech recognition has been light on linguists because they weren’t modeling what was useful for computational tasks.
  • Now linguists are building and testing useful models. Yay!
  • Language is super complex and treating it like it’s not will get you hit in the face with an error-ridden fish.
  • Linguists know language is complex and are working diligently at accurately describing how and why. Yay!
  • In order to get perfect speech recognition down, we’re going to need to have computers that are similar to our brains.
  • I’m pretty optimistic that this will happen.

 

 

How to pronounce the “th” sound in English

Or, as I like to call it, hunting the wild Eth and Thorn (which are old letters that can be difficult to typesest), because back in the day, English had the two distinct “th” sounds represented differently in their writing system. There was one where you vibrated your vocal folds (that’s called ‘voiced’) which was written as “ð” and one where you didn’t (unvoiced) which was written as “þ”. It’s a bit like the difference between “s” and “z” in English today. Try it: you can say both “s” and “z” without moving your tongue a millimeter. Unfortunately, while the voiced and voiceless “th” sounds remain distinct, they’re now represented by the same “th” sequence. The difference between “thy” and “thigh”, for example, is the first sound, but the spelling doesn’t reflect that. (Yet another example of why English orthography is horrible.)

Used with permission from the How To Be British Collection copyright LGP, click picture for website.

The fact that they’re written with the same letters even though they’re different sounds is only part of why they’re so hard to master. (That goes for native English speakers as well as those who are learning it as their second language: it’s one of the last sounds children learn.). The other part is that they’re relatively rare across languages. Standard Arabic  Greek, some varieties of Spanish, Welsh and a smattering of other languages have them.  If you happen to have a native language that doesn’ t have it, though, it’s tough to hear and harder to say. Don’t worry, though, linguistics can help!

I’m afraid the cartoon above may accurately express the difficulty of  producing the “th” for non-native speakers of English, but the technique is somewhat questionable. So, the fancy technical term for the “th” sounds are the interdental fricatives.  Why? Because there are two parts to making it. The first is the place of articulation, which means where you put your tongue. In this case, as you can probably guess (“inter-” between and “-dental” teeth), it goes in between your teeth. Gently!

The important thing about your tongue placement is that your tongue tip needs to be pressed lightly against the bottom of your top teeth. You need to create a small space to push air thorough, small enough that it makes a hissing sound as it escapes. That’s the “fricative” part. Fricatives are sounds where you force air through a small space and the air molecules start jostling each other and make a high-frequency hissing noise. Now, it won’t be as loud when you’re forcing air between your upper teeth and tongue as it is, for example, when you’re making an “s”, but it should still be noticeable.

So, to review, put the tip of  your tongue against the bottom of your top teeth. Blow air through the thin space between your tongue and your teeth so that it creates a (not very loud) hissing  sound. Now try voicing the sound (vibrating  your vocal folds) as you do so. That’s it! You’ve got both of the English “th” sounds down.

If you’d like some more help, I really like this video, and it has some super-cool slow-motion videos. The lady who made it has a website focusing on English pronunciation which has some great  resources.  Good luck!

How do you pronounce Gangnam?

So if you’ve been completely oblivious lately, you might not be aware that Korean musician Psy has recently become a international sensation due to the song below. If you haven’t already seen it, you should. I’ll wait.

Ok, good. Now, I wrote a post recently where I suggested that a trained phonetician can help you learn to pronounce things and I thought I’d put my money where my mouth is and run you though how to pronounce “Gangnam”; phonetics style. (Note: I’m assuming you’re a native English speaker here.)

First, let’s see how a non-phonetician does it. Here’s a brief guide to the correct pronunciation offered on Reddit by ThatWonAsianGuy, who I can only assume is a native Korean speaker.

The first G apparently sounds like a K consonant to non-Korean speakers, but it’s somewhere between a G and a K, but more towards the G. (There are three letters similar, ,, and . The first is a normal “k,” the second the one used in Gangnam, and the third being a clicky, harsh g/k noise.)

The “ang”part is a very wide “ahh” (like when a doctor tells you to open your mouth) followed by an “ng” (like the end of “ending”). The “ahh” part, however, is not a long vowel, so it’s pronounced quickly.

“Nam” also has the “ahh” for the a. The other letters are normal.

So it sounds like (G/K)ahng-nahm.

Let’s see how he did. Judges?

Full marks for accuracy, Rachael. Nothing he said is incorrect. On the other hand, I give it a usability score of just 2 out of 10.  While the descriptions of the vowels and nasal sounds are intelligible and usable to most English speakers, even I was stumped by  his description of a sound between a  “g” and a “k”. A strong effort, though; with some training this kid could make it to the big leagues of phonetics.

Thank you Rachael, and good luck to ThatWonAsianGuy in his future phonetics career. Ok, so what is going on here in terms of the k/g/apparently clicky harsh sound? Funny you should ask, because I’m about to tell you in gruesome detail.

First things first: you need to know what voicing is. Put your hand over your throat and  say “k”. Now say “g”. Can you feel how, when you say “g”, there’s sort of a buzzing feeling? That’s what linguists call voicing. What’s actually happening is that you’re pulling your vocal folds together and then forcing air through them. This makes them vibrate, which in turn makes a sound. Like so:

(If you’re wondering that little cat-tongue looking thing is, that’s the epiglottis. It keeps you from choking to death by trying to breath food and is way up there on my list of favorite body parts.)

But wait! That’s not all! What we think of as “regular voicing” (ok, maybe you don’t think of it all that often, but I’m just going to assume that you do) is just one of the things you can do with your voicing. What other types of voicing are there? It’s the type of thing that’s really best described vocally, so here goes:

Ok, so, that’s what’s going on in your larynx. Why is this important? Well it turns out that only one of the three sounds is actually voiced, and it’s voiced using a different type of voicing. Any guesses as to which one?

Yep, it’s the harsh, clicky one and it’s got glottal voicing (that really low, creaky sort of voice)*. The difference between the “regular k” and the “k/g sound” has nothing to do with voicing type. Which is crazy talk, because almost every “learn Korean” textbook or online course I’ve come across has described them as “k” and “g” respectively and, as we already established, the difference between “k” and “g” is that the “k” is voiced and the “g” isn’t.

Ok, I simplified things a bit. When you say “k” and “g” at the beginning of a word in English (and only at the beginning of a word), there’s actually one additional difference between them. Try this. Put your hand in front of your mouth and say “cab”. Then say “gab”. Do you notice a difference?

You should have felt a puff of air when you said the “k” but not when you said the “g”. Want proof that it only happens at the beginning of words? Try saying “back” and “bag” in the same way, with your hand in front of you mouth. At the end of words they feel about the same.  What’s going on?

Well, in English we always say an unvoiced “k” with a little puff of air at the beginning of the word. In fact, we tend to listen for that puff more than we listen for voicing. So if you say “kat” without voicing the sound, but also without the little puff of air, it sounds more like “gat”. (Which is why language teachers tell you to say it “g” instead of “k”. It’s not, strictly speaking, right, but it is a little easier to hear. The same thing happens in Mandarin, BTW.) And that’s the sound that’s at the beginning of Gangnam.

You’ll probably need to practice a bit before you get it right, but if you can make a sound at the beginning of a word where your vocal chords aren’t vibrating and without that little puff of air, you’re doing it right. You can already make the sound, it’s just the moving it to the beginning of the word that’s throwing a monkey wrench in the works.

So it’s the unvoiced “k” without the little puff of air. Then an “aahhh” sound, just as described above. Then the “ng” sound, which you tend to see at the end of words in English. It can happen in the middle of words as well, though, like in “finger”. And then “nam”, pronounced in the same way as the last syllable as “Vietnam”.

In the special super-secret International Phonetic (Cabal’s) Alphabet, that’s [kaŋnam]. Now go out there and impress a Korean speaker by not butchering the phonetics of their language!

*Ok, ok, that’s a bit of an oversimplification. You can find the whole story here.

Why is studying linguistics useful? *Is* studying linguistics useful?

So I recently gave a talk at the University of Washington Scholar’s Studio. In it, I covered a couple things that I’ve already talked about here on my blog: the fact that, acoustically speaking, there’s no such thing as a “word” and that our ears can trick us. My general point was that our intuitions about speech, a lot of the things we think seem completely obvious, actually aren’t true at all from an acoustic perspective.

What really got to me, though, was that after I’d finished my talk (and it was super fast, too, only five minutes) someone asked why it mattered. Why should we care that our intuitions don’t match reality? We can still communicate perfectly well. How is linguistics useful, they asked. Why should they care?

I’m sorry, what was it you plan to spend your life studying again? I know you told me last week, but for some reason all I remember you saying is “Blah, blah, giant waste of time.”

It was a good question, and I’m really bummed I didn’t have time to answer it. I sometimes forget, as I’m wading through a hip-deep piles of readings that I need to get to, that it’s not immediately obvious to other people why what I do is important. And it is! If I didn’t believe that, I wouldn’t be in grad school. (It’s certainly not the glamorous easy living and fat salary that keep me here.) It’s important in two main ways. Way one is the way in which it enhances our knowledge and way two is the way that it helps people.

 Increasing our knowledge. Ok, so, a lot of our intuitions are wrong. So what? So a lot of things! If we’re perceiving things that aren’t really there, or not perceiving things that are really there, something weird and interesting is going on. We’re really used to thinking of ourselves as pretty unbiased in our observations. Sure, we can’t hear all the sounds that are made, but we’ve built sensors for that, right? But it’s even more pervasive than that. We only perceive the things that our bodies and sensory organs and brains can perceive, and we really don’t know how all these biological filters work. Well, okay, we do know some things (lots and lots of things about ears, in particular) but there’s a whole lot that we still have left to learn. The list of unanswered questions in linguistics is a little daunting, even just in the sub-sub-field of perceptual phonetics.

Every single one of us uses language every single day. And we know embarrassingly little about how it works. And, what we do know, it’s often hard to share with people who have little background in linguistics. Even here, in my blog, without time restraints and an audience that’s already pretty interested (You guys are awesome!) I often have to gloss over interesting things. Not because I don’t think you’ll understand them, but because I’d metaphorically have to grow a tree, chop it down and spends hours carving it just to make a little step stool so you can get the high-level concept off the shelf and, seriously, who has time for that? Sometimes I really envy scientists in the major disciplines  because everyone already knows the basics of what they study. Imagine that you’re a geneticist, but before you can tell people you look at DNA, you have to convince them that sexual reproduction exists. I dream of the day when every graduating high school senior will know IPA. (That’s the international phonetic alphabet, not the beer.)

Okay, off the soapbox.

Helping people. Linguistics has lots and lots and lots of applications. (I’m just going to talk about my little sub-field here, so know that there’s a lot of stuff being left unsaid.) The biggest problem is that so few people know that linguistics is a thing. We can and want to help!

  • Foreign language teaching. (AKA applied linguistics) This one is a particular pet peeve of mine. How many of you have taken a foreign language class and had the instructor tell you something about a sound in the language, like: “It’s between a “k” and a “g” but more like the “k” except different.” That crap is not helpful. Particularly if the instructor is a native speaker of the language, they’ll often just keep telling you that you’re doing it wrong without offering a concrete way to make it correctly. Fun fact: There is an entire field dedicated to accurately describing the sounds of the world’s languages. One good class on phonetics and suddenly you have a concrete description of what you’re supposed to be doing with your mouth and the tools to tell when you’re doing it wrong. On the plus side, a lot language teachers are starting to incorporate linguistics into their curriculum with good results.
  • Speech recognition and speech synthesis. So this is an area that’s a little more difficult. Most people working on these sorts of projects right now are computational people and not linguists. There is a growing community of people who do both (UW offers a masters degree in computational linguistics that feeds lots of smart people into Seattle companies like Microsoft and Amazon, for example) but there’s definite room for improvement. The main tension is the fact that using linguistic models instead of statistical ones (though some linguistic models are statistical) hugely increases the need for processing power. The benefit is that accuracy  tends to increase. I hope that, as processing power continues to be easier and cheaper to access, more linguistics research will be incorporated into these applications. Fun fact: In computer speech recognition, an 80% comprehension accuracy rate in conversational speech is considered acceptable. In humans, that’s grounds to test for hearing or brain damage.
  • Speech pathology. This is a great field and has made and continues to make extensive use of linguistic research. Speech pathologists help people with speech disorders overcome them, and the majority of speech pathologists have an undergraduate degree in linguistics and a masters in speech pathology. Plus, it’s a fast-growing career field with a good outlook.  Seriously, speech pathology is awesome. Fun fact: Almost half of all speech pathologists work in school environments, helping kids with speech disorders. That’s like the antithesis of a mad scientist, right there.

And that’s why you should care. Linguistics helps us learn about ourselves and help people, and what else could you ask for in a scientific discipline? (Okay, maybe explosions and mutant sharks, but do those things really help humanity?)

Mapping language, language maps

So for some reason, I’ve come across three studies in quick succession based in mapping language. Now, if you know me, you know that nattering on about linguistic methodology is pretty much the Persian cat to my Blofeld, but I really do think that looking at the way that linguists do linguistics is incredibly important. (Warning: the next paragraph will be kinda preachy, feel free to skip it.)

It’s something the field, to paint with an incredibly broad brush, tends to skimp on. After all, we’re asking all these really interesting questions that have the potential to change people’s lives. How is hearing speech different from hearing other things? What causes language pathologies and how can we help correct them? Can we use the voice signal to reliably detect Parkinson’s over the phone? That’s what linguistics is. Who has time to look at whether asking  people to list the date on a survey form affects their responses? If linguists don’t use good, controlled methods to attempt to look at these questions, though, we’ll either find the wrong answers or miss it completely because of some confounding variable we didn’t think about. Believe me, I know firsthand how heart wrenching it is to design an experiment,  run subjects, do your stats and end up with a big pile of useless goo because your methodology wasn’t well thought out. It sucks. And it happens way more than it needs to, mainly because a lot of linguistics programs don’t stress rigorous scientific training.

OK, sermon over. Maps! I think using maps to look at language data is a great methodology! Why?

FraMauroMap
Hmm… needs more data about language. Also the rest of the continents, but who am I to judge? 
  1.  You get an end product that’s tangible and easy to read and use. People know what maps are and how to use them. Presenting linguistic data as a map rather than, say, a terabyte of detailed surveys or a thousand hours of recordings is a great way to make that same data accessible. Accessible data gets used. And isn’t that kind of the whole point?
  2. Maps are so. accurateright now. This means that maps of data aren’t  just rough approximations, they’re the best, most accurate way to display this information. Seriously, the stuff you can do with GIS is just mind blowing. (Check out this dialect map of the US. If you click on the region you’re most interested, you get additional data like field recordings, along with the precise place they were made. Super useful.)
  3. Maps are fun. Oh, come on, who doesn’t like looking at  maps? Particularly if you’re looking at a region you’re familiar with. See, here’s my high school, and the hay field we rented three years ago. Oh, and there’s my friend’s house! I didn’t realize they were so close to the highway. Add a second layer of information and BOOM, instant learning.

The studies

Two of the studies I came across were actually based on Twitter data. Twitter’s an amazing resource for studying linguistics because you have this enormous data set you can just use without having to get consent forms from every single person. So nice. Plus, because all tweets are archived, in the Library of Congress if nowhere else, other researchers can go back and verify things really easily.

This study looks at how novel slang expressions spread across the US. It hasn’t actually been published yet, so I don’t have the map itself, but they do talk about some interesting tidbits. For example: the places most likely to spawn new successful slang are urban centers with a high African American population.

The second Twitter study is based in London and looked at the different languages Londoners tweet in and did have a map:

Click for link to author’s blog post.

Interesting, huh? You can really get a good idea of the linguistic landscape of London. Although there were some potential methodological problems with this study, I still think it’s a great way to present this data.

The third study I came across is one that’s actually here at the University of Washington. This one is interesting because it kind of goes the other way. Basically, the researchers has respondents indicate areas on a map of Washington where they thought  language communities existed and then had them describe them.  So what you end up with is sort of a representation of the social ideas of what language is like in various parts of Washington state. Like so:

Click for link to study site.

There are lots more interesting maps on the study site, each of which shows some different perception of language use in Washington State. (My favorite is the one that suggests that people think other people who live right next to the Canadian border sound Canadian.)

So these are just a couple of the ways in which people are using maps to look at language data. I hope it’s a trend that continues.

Limitations on use of “[quality] as shit”

[Trigger warning: I’m going to write “shit” about a billion more times in this blog post because it is necessary to describe this linguistic observation. YOU HAVE BEEN WARNED.]

So every once in a while I notice something semantic about English that just blows my mind. I was making tea this  morning and thinking about whether or not your could say that “That dress is bespoke as shit”. Why? Because I’m a linguist, but also because someone brought this cartoon to my attention again recently:

So, in the field of semantics sitting around thinking about your intuitions about words is actually pretty solid methodology, so I’m going to do that. (I know, right? Not a single ultrasound or tracheal puncture? What do they do on Saturday nights?) Let’s compare the following sentences:

  1. That dress is bespoke as shit.
  2. His wardrobe is bespoke as shit.
  3. That dress is pink as shit.
  4. His wardrobe is pink as shit.

My intuition is that that two and three are fine, four is… okay but a little weird and that one is downright wrong. And I also feel very strongly that the goodness of a given sentence where some quality of an object is modified by “as shit” is closely tied to whether or not that quality is a continuous scale. (And, no, I’m not going to say “adjective” here. Mainly because you can also say “Her wardrobe is completely made out of sharks as shit.” And, in my universe, at least, “completely made out of sharks” doesn’t really count as an adjective.) Things that are on a continuous scale are like darkness. It can be a little dark or really dark or completely dark; there’s not really any point where you switch from being dark to light, right? And something that’s dark for me, like a starry night, might be light for a bat. “Pink”, and all colors, are continuous scales. (FUN FACT: how many color terms various languages have and why is a really big debate.) But things like “free” (as in costing zero dollars) are more discrete. Something’s either free or it’s not and there’s not really any middle ground.

The other thing you need to take into account is whether or not the thing being described is plural and whether it’s a mass or count noun. Mass nouns are things like “water”, “sand” or “bubblegum”. You can less or more or some of these things, but you can’t count them. “I’ll have three water” just sounds really odd. Count nouns are things like “buckets of water”, “grains of sand” or “pieces of bubblegum”. These are things that have discrete, countable units instead of just a lump of mass. It’s a really useful distinction.

Ok, so how does this gel with my intuitions? And, more importantly, can I describe qualities in such a way that my description has predictive power? (Remember, linguistics is all about building testable models of language use!) I think I can. Let’s roll up our sleeves and get to the knitty-gritty. I’ve got two separate parts of the sentence that go into whether or not I can use “as shit”: the thing(s) being described, and the quality it has. The thing being described can be either singular or plural, and either mass or count. The quality it has can either be continuous or discrete. Let’s put this in outline form to make the possible different conditions a bit easier to see:

  • Thing being described
    • Is it singular? If yes, is it:
      • A mass noun? If so, assign condition 1.
      • A count noun? If so, assign condition 2.
    • Is it plural?  If yes, is it:
      • A mass noun? If so, assign condition TRICK QUESTION, because that’s not possible. 😛
      • Is it a count noun? If so, assign condition 3.
  • Qualities: continuous or discrete
    • Is it continuous? If so, assign condition A
    • Is it discrete? If so, assign condition B.

[What’s that, pseudocode? I thought you didn’t do “computer-y code-y math-y things”, Rachael.] Ok, so now we’ve got six possible conditions for a given sentence (1A, 2A, 3A, 1B, 2B and 3B). Which conditions can take “as shit” and why? (Keep in mind, this is just my intuition.

  • 1A: “Water is big  as shit.” = acceptable
  • 2A: “The dog is big as shit.” = acceptable
  • 3A: “The dogs are big as shit.” =  acceptable
  • 1B: “Water is still as shit.” = unacceptable
  • 2B “The dog is still as shit.” = unacceptable
  • 3B: “The dogs are still as shit.” = acceptable

Okay, so a little of my reasoning. I feel very strong that “as shit” serves to intensify the adjective  and you can’t intensify something that’s binary. The light switch it either on or off; it’s can’t be extremely on or extremely off. So all of the B conditions are bad… except for 3B. What is 3B acceptable? Well, for me what I get the sense that what you’re saying is not that you’re intensifying the qualities of each individual but that you’re talking about the group as whole. And if you add up a bunch of binaries (three still dogs and one moving dog) you can get value somewhere in the middle.

But that’s just a really informal little model based on my intuitions and I feel like they’re getting screwed up because I’ve spent way too much time thinking  about this. And now the tea that I was making is getting cold as shit, so I might as well go drink it.

What counts as a word?

A lot of us, as literate English speakers, have probably experienced that queasy moment of dread when you’re writing something on the computer and suddenly get a squiggly red line under a word you use all the time. You look at the suggested spellings… and none of them are the word you wanted. If you’re like me, at this point you hop online really quickly to make sure the word means what you thought it did and that you’re not butchering the spelling too horribly. Or maybe you turn to the dictionary you keep on your desk. Or maybe you turn to someone sitting next to you and ask “Is this a real word?”.

Latin dictionary
Oh, this? It’s just the pocket edition. The full one is three hundred volumes and comes with an elephant named George to carry it around your house. And it’s covered in gold. This edition is only bound in unicorn skin but it’s fine for a quick desk reference.
The underlying assumption behind the search to see if someone else uses the word is that, if they don’t, you can’t either. It’s not a “real word”.  Which begs the question: what makes a word real? Is there a moment of Pinocchio-like transformation where the hollow wooden word someone created suddenly takes on life and joins the ranks of the English language to much back-slapping and cigar-handing from the other vetted words? Is there a little graduation party where the word gets a diploma from the OED and suddenly it’s okay to use it whenever you want? Or does it get hired by the spelling board and get to work right away?

OK, so that was getting a bit silly, but my point is that most people have the vague notion that there’s a distinction between “real” words and “fake” words that’s pretty hard and fast. Like most slang words and brand names are fake words. I like to call this the Scrabble distinction. If you can play it in Scrabble, it counts and you can put it in a paper or e-mail and no one will call you on it. If you can’t, it’s a fake word and you use it at your own risk. Dictionaries play a large part in determining which is which, right? The official Scrabble dictionary is pretty conservative: it doesn’t have d’oh in it for example. But it’s also not without controversy. The first official Scrabble dictionary, for example, didn’t have “granola” in it, which the Oxford English Dictionary (the great grand-daddy of English dictionaries and probably the most complete record ever complied of the lexicon of any language ever) notes was first used in 1886 and I think most of us would agree is a “real” word.

The line is even blurrier than that, though. English is a language with a long and rich written tradition. In some ways, that’s great. We’ve got a lot more information on how words used to be pronounced than we would have otherwise and a lot of diachronic information. (That’s information about how the language has changed over time. 😛 ) But if you’ve been exposed mainly to the English tradition, as I have, you tend to forget that writing isn’t inseparable from spoken language. They’re two different things and there are a lot of traditions that aren’t writing-based. Consider, for example, the Odù Ifá, an entirely oral divination text from Nigeria that sometimes gets compared to the bible or the Qur’an. In the cultures I was raised in, the thought of a sacred text that you can’t read is strange, but that’s just part of the cultural lens that I see the world through; I shouldn’t project that bias onto other cultures.

So non-literary cultures still need to add words to their lexicons, right? But how do they know which words are “real” without dictionaries? It depends. Sometimes it just sort of happens organically. We see this in English too. Think about words associated with texting or IMing like “lol” or “brb” (that’s “laughing out loud” and “be right back” for those of you who are still living under rocks). I’ve noticed people saying these in oral conversations more and more and I wouldn’t be surprised if in fifty years “burb” started showing up in dictionaries. But even cultures which have only had writing systems for a very short amounts of time have gatekeepers. Navajo, which has only been written since around 1940, is a great example. Peter Ladefoged shares the following story in Phonetic Data Analysis:

One of our former UCLA linguistics students who is a Navajo tells how she was once giving a talk in a Navajo community. She was showing how words could be put together to create new words (such as sweet + heart creates a word with an entirely new meaning). When she was explaining this an elder called out: ‘Stop this blasphemy! Only the gods can create words.’ The Navajo language is holy in a way that is very foreign to most of us (p. 13).

So in Navajo you have elders and religious leaders who are the guardians of the language and serve as the final authorities. (FUN FACT: “authority” comes from the same root as “author”. See how writing-dependent English is?) There are always gray areas though. Language is, after all, incredibly complex. I’ll leave you one case to think about.

“Rammaflagit.” That’s ɹæm.ə.flæʒ.ɪt in the international phonetic alphabet. (I remember how thrilled my dad was when I told him I was studying IPA in college.) I hear it all the time and it means something like “gosh darn it”, sort of a bolderized curse word. Real word or not? The dictionaries say “no”, but the people  who I’ve heard using it would clearly say “yes”. What do you think?

Letters “r” lies, or why English spelling is horrible

If you’re like me and have vivid memories of learning to read English, you probably remember being deeply frustrated. As far as four-year-old-Rachael was concerned,  math was nice and simple: two and two always, always equals four. Not sometimes. Not only when it felt like it. All the time. Nice and simple.

Reading, and particularly phonics, on the other hand, was a minefield of dirty tricks. Oh, sure, they told us that each letter represented a single sound, but even a kid knows that’s hooey. Cough? Bough? Come on, that was like throwing sand in a fight; completely unfair. And what about those vowels? What and cut rhyme with each other, not cut and put. Even as phonics training was increasing my phonemic awareness, pushing me to pay more attention to the speech sounds I made, English orthography (that’s our spelling system) was dragging me behind the ball-shed and pulling out my hair in clumps. Metaphorically.

Books that make literacy fun
“Oh man, they’re trying to tell us that A makes the ‘Aaahhh’ sound. What do they take us for, complete idiots? Or is that ‘whaahhht’ to they take us for?”
“I know, right? One-to-one correspondence? Complete rubbish!”
Of course, I did eventually pass third grade and gain mastery of the written English language. But it was an uphill battle all the way. Why? Because English orthography is retarded. Wait. I’m sorry. That’s completely unfair to individuals suffering from retardation. English orthography is spiteful, contradictory and completely unsuited to representing the second most widely-spoken second language. This poem really highlights the problem:

Recovering Sounds from Orthography

Brush up Your English

I take it you already know
Of tough and bough and cough and dough?
Others may stumble but not you
On hiccough, thorough, slough and through.
Well done! And now you wish perhaps,
To learn of less familiar traps?Beware of heard, a dreadful word
That looks like beard and sounds like bird.
And dead, it’s said like bed, not bead-
for goodness’ sake don’t call it ‘deed’!
Watch out for meat and great and threat
(they rhyme with suite and straight and debt).

A moth is not a moth in mother,
Nor both in bother, broth, or brother,
And here is not a match for there,
Nor dear and fear for bear and pear,
And then there’s doze and rose and lose-
Just look them up- and goose and choose,
And cork and work and card and ward
And font and front and word and sword,
And do and go and thwart and cart-
Come, I’ve hardly made a start!
A dreadful language? Man alive!
I’d learned to speak it when I was five!
And yet to write it, the more I sigh,
I’ll not learn how ’til the day I die.

A dreadful language? Man alive! I mastered it when I was five.

— T.S. Watt (1954)

So why don’t we get our acts together and fix this mess? Well… trying to fix it is kind of the reason we’re in this mess in the first place. Basically, in renaissance England we started out with a basically phonetic spelling system. You actually sounded out words and wrote them as they sounded. “Aks” instead of “ask”, for example. (For what it’s worth, “aks” is the original pronunciation.) And you would be writing by hand. On very expensive parchment with very expensive quills and ink for very rich people.

Enter the printing press. Suddenly we can not only produce massive amounts of literature, but everyone can access them. Spelling goes from being something that only really rich people and scribes care about to a popular phenomena. And printing press owners were quick to capitalize on that phenomena  by printing spelling lists that showed the “correct” way to write words. Except there wasn’t a whole lot of agreement between the different printing houses and they were already so heavily invested in their own systems that they weren’t really willing to all switch over to a centralized system. By the time Samuel Johnson comes around to pin down every word of English like an entomologist in a field of butterflies, we have standardized spellings for most words… that all come from different systems developed by different people. And it’s just gotten more complex from there. One of the main reasons is that we keep shoving new words into the language without regard for how they’re spelled.

“The problem with defending the purity of the English language is that the English language is as pure as a crib-house whore. It not only borrows words from other languages; it has on occasion chased other languages down dark alley-ways, clubbed them unconscious and rifled their pockets for new vocabulary.”

― James Nicoll

There’s actually a sound in English, the zh sort of sound in “lesiure”, that only exists in words we’ve “borrowed” from other language and, of course, there’s no letter for it. Of course not; that would be too simple. And English detests simple. If you’re really interested in more of the gory details, there’s a great lecture you can listen to/watch here by Edwin Duncan which goes into way more detail on the historical background. Or you can just scroll through the Oxford English Dictionary and wince constantly.