Fun with ambiguity!

Ambiguity is fun. For example, yesterday my friends and I were talking about my uncle, who repairs robots.  The conversation went something like this:

Me: Yeah, he’s a robot repair man. It’s a pretty good job.

Friend 1: How does one become a robot repair man?

Friend 2: Yeah, how did he become a robot?

Android
Oh, clearly you meant a robotic man who repairs things, not a man who repairs robots.
Now, because I’m not a normal person, I jotted down a note of this interesting ambiguity. You’ve probably noticed lots of instances like this, where a word can be interpreted in more than one way. But did you ever wonder about ambiguity in language? (A little note here: There is ambiguity on the word level and ambiguity on the sentence level. I’m talking about ambiguious words here, though I might come back and do phrases later on.)

Think about it this way: language’s primary  purpose is to assist in communication. You would think that anything that got in the way of that purpose would be weeded out. I mean, yeah, languages evolve, but they evolve with conscious input from humans, so you’d think that we’d try to cut down on things that make communication harder. I mean, if you were designing a human, would you include the appendix? Ok, maybe you would. But my point is, ambiguity isn’t really helpful in communication. So why do we continue to use it?

Funnily enough, I’m not the first person to ask this question; it’s one that’s troubled linguistics for a while. And there was a theory proposed in a  recent article that I find particularly interesting. The authors argued that words that have more than one meaning (like how chips can be delicious and ruin your computer, or taste terrible and make your computer run) are generally words that are really easy to say.

You can think of different words as having different shapes, and that you have to trace these shapes to say the word. A word that’s really easy to say, like mom, would be a circle. A word that’s harder to say, like Cryptonomicon, is going to be more like five-pointed star. (A word that’s impossible to say, like lpdkn, would be like trying to draw a scale model of Mount Fuji in two dimensions: you can kind of get the general idea across, but you can’t produce it fully because it violates the rules of physics. Metaphorically.) When you’re just talking to friends, you want to use as many circles as possible. Because of that pressure, you’re going to use circles to represent tires and oranges and the sun, and trust that your friends can use context clues to figure out that you didn’t have tire juice for breakfast.

I tend to like this argument, because I’m of the opinion that laziness is one of the driving factors in language–I’m not so sure of another argument that they make, which is that the primary purpose of language is not  communication, but basically to organize our thoughts, but more on that later. The main point is that ambiguity is an essential part of language and will remain so for the foreseeable future.

The Brothers Grimm and Their Phonology Habit

You’ve probably heard of the Brothers Grimm in conjunction with fairy tales. They were four-handedly responsible for popularizing most of the ones we know and love today. Well, popularizing them for those of us who live outside of the German countryside. If you’ve ever read or watched Sleeping Beauty, Snow White, Cinderella, Rapunzel or Rumpelstiltskin, you’ve got them to thank.

Walter Crane12
Oh, you're a prince? Sorry, I'm holding out for a linguist.
But this is a linguistics blog, not a folklore blog, so why am I going on and on about these guys? Because they were also pretty awesome linguists. They were like the Galileo of linguistics, way ahead of their time and brilliant. They were so brilliant, they discovered something called Grimm’s law. Well, really it was Jacob who discovered it (hence the apostrophe placement) and it wasn’t called Grimm’s law at the time. It was just something that no one had ever thought to look for.

What was it?

Grimm’s law is the very first time we see a set of rules governing linguistic change. And that may sound kind of boring, but it was just as monumental as the discovery of calculus. (Was calculus more of a discovery or a development? Mhh, whatever.) It fundamentally changed the way that linguistics was done.

Basically, Jacob determined that, historically, certain sounds in Germanic languages (including German and English) had changed. And they hadn’t changed randomly. A had changed to B had changed to C across a set of languages, and all across the language. It would be like if three or four different countries, without talking about it, decide that purple was better color for stop signs than red or bright green, and changed out all their stop signs. And then, when they were done, they decided that they really liked pink better and all changed to that.

Why was this exciting? Well, unlike theories like “This word is fun to say becuase I think it is“, Grimm’s law is testable. You can go out and take a picture of some non-pink stop signs and use that evidence to argue against a law that ends with all stop signs being now pink. We have a theory (and phonological theory!) that we can use empirical data to prove or disprove. It obviously took some time to be accepted as the standard practice, and for a long time, all anybody wanted to talk about was historical sound change and written texts. But, hey, once phonology was born, it was only a matter of time before it started saving the world.

Is linguistics a science?

Short answer: yes. Long answer: the rest of this post. Linguistics is a science; but there are some parts of linguistics that don’t really act like people expect sciences to act, and that tends to confuse people.

Lab coats
Not necessary equipment for linguistics.
Before I go over why linguistics is a science, I think it’s worth saying that I’m not arguing (and I am  arguing; there are linguists who I know personally and by reputation who argue passionately linguistics is not a science) that linguistics is a science because sciences are “better”. I’m arguing because there is an inherent difference between how you do science and how you study the humanities. Your aims are different and what you need to do to accomplish those aims are different. I’m arguing that the ultimate aims of linguistics are science-type and not humanities-type or plant-typeand therefore our methodology should match those aims.

Continue reading “Is linguistics a science?”

That’s so meta meta meta

Today, I’m going to introduce you to two of my very good friends in linguistics: “metalinguistic” and “recursive“. They’re not that closely related, but they tend to get asked if they’re sisters a lot. Why?

Well, metalinguistic knowledge is knowing about language, and the fact that you can read this shows that you must have some metalinguistic knowledge. But this blog (and the field of linguistics as a whole) is concerned with knowing about what you know about language, i.e. meta-metalinguistic knowledge. And just just talking about that, I’m adding another level. My discussion of what we know about linguistics gets us all the way to meta-meta-metalinguistic knowledge. And by talking about that… You get the picture.

The picture looks like this.

The picture is also recursive. One of my favorite examples of recursivity is PHP. Originally, the acronym stood for “Personal Home Page”, but it now stands for “PHP: Hypertext Preprocessor“. What does the PHP in that stand for? Why, for “PHP: Hypertext Preprocessor”, of course. (Repeat ad nauseum, or at least ad getting-punched-in-the-arm.) Or, wait, maybe it’s cats looking at cats looking at cats looking at cast looking at cats…

So you can see how they’re related, right? They’re both all about making you feel dizzy and then fall down, or maybe puke if you get motion sickness.

But what you may not know about recursivity is that it’s a very important process in linguistics as well. How so, you might ask?  Well, remember in the days of yore (yesterday was totally a day of yore) when I told you all about generativity? Recursivity is a great example of one of those generative processes. You can have a recursive sentence that just goes on forever. How about when you’re describing where you learned something?

I heard it from Jen.

Well, what if Jen heard it from someone else?

I heard it from Jen who heard it from Ian.

And then you find out that Ian wasn’t the originator either.

I heard it from Jen, who heard it from Ian, who heard it from Zach, who heard it from Nick, who heard it from Clarice…

And so on and so forth.You can pretty much keep going on infinitely. You can do it with other types of phrases to.

Get the butter from the fridge by the stove behind the water buffalo next to the peat coal kiln…

Chomsky argued that recursion is the fundamental characteristic of human languge, and this has been the cause of some debate. (Pirahã  may be the most argued-about Non-Indo-European language ever.) So recursion has two main uses in linguistics. The first is as a generative process that allows speakers to form infinitely long sentences, and the other is to use language about using language about using language about using language about using language about using language about using language…

What can you do with a degree in linguistics?

This is obviously a question that I, as someone who’s going to shortly hold such a degree, get asked a lot. Fortunately, there are a lot of possible answers! I’m going to start with the obvious ones and then start surprising you.

From Linguist Llama (click for original post).

Obvious answer #1: Get another degree in linguistics!

If you’re really in love with the subject, getting a doctorate and competing for the tiny number of teaching positions in the field is certainly an option. Imma be straight with you, though: it’s very, very hard work; very, very competitive and very, very low paying for the amount of specialized training you need. (A PhD usually takes between four and six years…if you manage to finish at all.) Oh, and did I mention that you’ll be expected to do original, groundbreaking research and consistently get it published in addition to your teaching load? Yeah… unless you’re 100% sure that’s what you want to do, you should probably keep reading.

Obvious answer #2: Teach computers how to language!

Do you like computers? Do you like linguistics? Do you like the thought of eventually having a job and making money? Holy balls of yarn, do I have a career for you. Super-high employment rates, cutting edge research, making all the best and newest toys… yeah. Plus, if you have a good background in both computer science and linguistics (a surprisingly large number of people only have a computer background) you’ll be a very competitive candidate.

Obvious answer #3: Help children and adults overcome speech problems!

If you’ve always wanted a career where you help people, you should look into Speech-Language Pathology. Sometimes, someone doesn’t acquire language correctly, or they develop a problem with language. Speech pathologists work with patients to help them acquire language or to relearn language. You’ll need at least a masters, but most people find it to be a very rewarding career.

Obvious answer #4: Work as a translator!

So I wrote earlier about the difference between a linguist and a translator, but being a linguist can really help you with translation as well, particularly if you’re interested in working on bilingual dictionaries. Of course, demand for translators varies from language to language, and you do have to be fluent in at least two languages.

Obvious answer #5: Teach languages!

If you’re interested in teaching anyone to acquire a second language, whether it’s English or something else, having a linguistics background can be very, very helpful. Think back to any foreign language classes you might have taken. Wouldn’t it have been better if your teacher had been able to tell you exactly what you were supposed to be doing with your mouth, instead of vaguely telling you what letters it was like and then that “You’re doing it wrong”? With a background in linguistics, you can really explain how things work in the second language, and that will really help your students.

So those are the biggies. You’ll need other skills for most of them, but linguistics will help you a lot. And, hey, linguistics classes are fun! But what other careers can linguistics help you with? Well…

Be a lawyer!  A background in linguistics is actually a really strong choice for someone heading to law school. Why? Well, law is all about using language really, really carefully and communicating effectively. An academic background in linguistics will help you do that.

Make up languages! Now, this is a bit of a niche, but there is more than one person who has been paid for designing “alien” languages for flims. You’ve heard of Na’vi and Klingon, I presume? They’re actually legit artificial languages with grammars and everything.

Write standardized tests!  If you’re American, you’ve probably taken or will take the SAT’s at some point. Fun fact: most of those language-based questions were written by linguists, who know how to ask questions designed to get at very specific pieces of linguistic knowledge.

Do anything you like! Really, linguistics training gives you a great set of skills. You can analyze large sets of data, deduce the rules that would generate them and then write about them in a clear way. That’s a really useful thing to be able to do.

Talkin’ ’bout my generativity

Quick, who’s this guy:

I dunno... could be the front half of an old centaur?
Used under the Creative Commons Attribution 2.0 Generic license, click for link to source.

If you answered “Einstein’s less famous brother, Einbert?” you wouldn’t actually be too far from the truth. It’s Noam Chomsky. He’s so famous his name comes pre-installed in Microsoft Word’s spell checker. (Did you mean “chomp sky?”)

If you’ve got a good history or government background, you may be thinking, “Oh yeah, the anarchy guy.” He may be, but his greatest intellectual achievement has nothing to do with anarchy and everything to do with linguistics. That achievement would be generativity.

Gen-er-a-tiv-i-ty. Write it down, it will be on the test.

Generativity was a game-changer for linguistics. Before that point, linguistics was basically phrenology, which I’ve mentioned before. Phrenology is to modern linguistics what naturalism is to modern biology. Phrenologists collected knowledge about languages haphazardly, without a whole lot of underlying theoretical structure. I mean, there was some, (I’ll talk about what the brother’s Grimm did on their weekends off later) but it was pretty confined. And a lot of it, let’s be honest, was about proving that Europe was best. The monumental Oxford English Dictionary is a good example of that mindset. They wanted to collect every single word in English language and pin it neatly to the page with a little series of notes about it and a list of sightings in the wild. It was, and remains, a grand undertaking and a staggering achievement… but modern linguists aren’t collectors anymore.

That’s because the end goal of modern linguistics is to solve language. The field is working to put together a series of rules that will actually describe and predict all human language. Not in the mind reader, fortune teller sense of predict. I mean that, with the right rules, we should be able to generate all possible sentences. In a generative way. By using generativity.

So why is this important?

Lots of reasons! Here, let me list them, because lists are fun to read.

  • This turned linguistics from an interesting hobby for rich people into a science. If you have rules, you can make predictions about what those rules will produce and then test those predictions. Testing predictions is also known as science. It’s also something that linguistics as a whole has been a little… hesitant to adopt, but that’s another story.
  • Suddenly computers! Computer programming is, at its most basic level, a series of rules. Linguistics is now dedicated to producing a series of rules. Bada-bing, bada-boom, universal translator. (It doesn’t work  that way, but, in theory, it eventually can.)
  • Now we have a framework that we can use to figure out how to ask questions. We have a goal. Things are organized.

Now for the promised test.

What term is used to describe the current goal of linguistics; i.e. to generate a set of rules that can accurately describe and predict language usage? (Seriously, I’m not going to give you the answer. Just scroll up.)

Indiscreet words, Part II: Son of Sounds

Ok, so in my last post about how the speech stream is far from discrete, I talked about how difficult it is to pick apart words. But I didn’t really talk that much about phonemes, and since I promised you phonetics and phonology and phun, I thought I should cover that. Besides, it’s super interesting.

It’s not just that language is continuous, it’s that language that’s discrete is actually impossible to understand. I ran across this Youtube video a while back that’s a great example of this phenomenon.

What the balls of yarn is he saying? It’s actually the preamble to the constitution, but it took me well over half the video to pick up on it, and I spend a dumb amount of time listening to phonemes in isolation.

You probably find this troubling on some level. After all, you’re a literate person, and as a literate person you’re really, really used to thinking about words as being easy to break down into “letter sounds”. If you’ve ever tried to fiddle around with learning Mandarin or Cantonese, you know just how table-flippingly frustrating it is to memorize a writing system where the graphemes (smallest unit of writing, just as morpheme is the smallest unit of meaning, phoneme is the small unit of sound and dormeme is the smallest amount of space you can legally house a person in) have no relation to the series of sounds they represent.

Fun fact: It’s actually pretty easy to learn to speak Mandarin or Cantonese once you get past the tones. They’re syntactically a lot like English, don’t have a lot of fussy agreement markers or grammatical gender and have a pretty small core vocabulary. It’s the characters that will make you tear your hair out.

Hm. Well, it kinda looks me sitting on a chair hunched over my laptop while wearing a little hat and ARGH WHAT AM I DOING THAT LOOKS NOTHING LIKE A BIRD.

But. Um. Sorry, got a little off track there. Point was, you’re really used to thinking about words as being further segmented. Like oranges. Each orange is an individual, and then there are neat little segments inside the orange so you don’t get your hands sticky. And, because you’re already familiar with the spelling system of your language, (which is, let’s face it, probably English) you probably have a fond idea that it’s pretty easy to divide words that way. But it’s not. If it were, things like instantaneous computational voice to voice translation would be common.

It’s hard because the edges of our sounds blur together like your aunt’s watercolor painting that you accidently spilled lemonade on. So let’s say you’re saying “round”. Well, for the “n” you’re going to close off your nasal passages and put your tongue against the little ridge right behind your teeth. But wait! That’s where you tongue needs to be to make the “d” sound! To make it super clear, you should stop open up your nasal passages before you flick your tongue down and release that little packet of air that you were storing behind it. You’re totally not going to, though. I mean, your tongue’s already where you need it to be; why would you take the extra time to make sure your nasal passages are fully clear before releasing the “d”? That’s just a waste of time. And if you did it, you’d sound weird. So the “d” gets some of that nasally goodness and neither you or your listener give a flying Fluco.

But, if you’re a computer who’s been told, “If it’s got this nasal sound, it’s an ‘n'”, then you’re going to be super confused. Maybe you’ll be all like, “Um, ok. It kinda sounds like an ‘n’, but then it’s got that little pop of air coming out that I’ve been told to look for with the ‘p’, ‘b’, ‘t’ ‘d’, ‘k’, ‘g’ set… so… let’s go with ‘rounp’. That’s a word, right?” Obviously, this is a vast over-simplification, but you get my point; computers are easily confused by the smearing around of sounds in words. They’re getting better, but humans are still the best.

So just remember: when you’re around the robot overlords, be sure to run your phonemes together as much as possible. It might confuse them enough for you to have time to run away.

Indiscreet Words

All right, first I’d like to apologize for the title. The opposite of discrete is not indiscreet, but continuous, and continuous is what language, especially speech, is. By continuous, I mean that it doesn’t come out in separable chunks; it’s more like a stream of water than a stream of ice cubes. In fact, English itself discriminates between things that are discrete and continuous; discrete things are called count nouns because (gasp!) you can count them, and continuous things are called mass nouns. You can count ice cubes and words, but you can’t count water or language unless you assign them units.

“But wait,” I can hear you protest. “Language is discrete.  I’m speaking in sentences, that are made up of words that are made up of letters.” And you’re right. For you, your language is made up of units that are psychologically real to you. Somewhere between the speaker vocalizing the words and you parsing them, you segment them using the rules that you’ve mastered. It’s a deeply complex process and one that we still don’t completely understand. If we did, we’d be able to write speech recognition programs that wouldn’t give us errors like “the wells were gathered and planning” for “the walls were dark and clammy”. (True life. I got that very error not that long ago.)

Here, let’s look at some data. Here’s the waveform that shows the wave intensity, or loudness, of a native speaker of English saying “I am an elephant.”

Can you pick out the part of the speech signal for each of the words? Here, let me help you.

So… if speech really is discrete, wouldn’t expect four separate bumps in loudness for the words, with silence in between? (Maybe with a couple extra bumps on the end for the laugher.)

Instead, what we get is pretty much a constant rush of noise that you rely on the vast amount of knowledge you have about your language to decode accurately. Take out that knowledge and you get something completely incomprehensible. And there’s a really easy way to show this, just listen to someone speaking a language you aren’t familiar with.

That’s Finnish and if you speak it well enough to understand everything he just said, I’d like to extend some mad props unto you; Finno-Ugric languages are as hard as ice-cream from a deep freezer. But to get back to the point, what observations can you make about what you just heard?

  • The speaker was speaking super-quickly.
  • There didn’t seem to be any pauses between words
  • Basically, it was like standing in front of a language fire hose.

For people who don’t speak your native language, you sound very similar. They’re not speaking any more quickly in Hindi or Mandarin or Swahili or German than you are in English, you just don’t have a metalinguistic framework to help you cut the sound-stream into words, slap it up on a syntactic framework and yank meaning out of it.

What happens when you lose your vioce?

To figure out why you lose your voice, let’s start by covering what happens when your voice is acting normally.

All sound is vibration. Like a bunch of people standing in loosely-spaced crowd, air molecules are pretty much doing their own thing. Then the source of the  sound, like a big bully, or maybe a bunch of bulls, pushes some people in the back of the crowd, and they push the people in front of them, etc. etc., until the last person jumps in your ear and bounces off your eardrum. Kinda like this:

So if all sound is vibration, something has to start it off. In a violin, it’s the friction between the bow and strings that causes the strings to vibrate. In a tuba or trumpet, it’s the vibration of the musician’s lips against the mouthpiece; if you just blow into a tuba without a proper embouchure (funny music-playing face), you’re not going to get any sound out of it. In you, it’s the vibration of your vocal cords that produce sound.

Aw, ain't it cute? The little vocal cords are so relaxed.
Uploaded by Samir at en.wikipedia and used here under the GFDL.

That’s them. But it’s actually a two-step process.

  • Step one: Tighten the vocal folds. This is like tuning a guitar; you can change the pitch of your voice based on how taut your vocal cords are. If you put your hand on your throat and sing a low note and a high one in quick succession, you can actually feel your muscle rotating as it adjusts the length of your vocal cords.
  • Step two: Vibrate those vocal folds. Now, you might think, based on step one, that you use your muscles to wiggle them back and forth really fast. Nope. You vibrate your vocal cords by blowing air through them. The more air, the louder the sound, the sooner you have to take a new breath.

So based on this, there are two possible ways to lose your voice. You can run out of air–which, unless you’ve had the breath knocked out of you, is a pretty straightforward problem to fix–or your muscles can crap out. And that’s generally why you lose your voice. The muscles in your larynx are just like any other muscles. If you use them hard enough, long enough, they’ll strain and, bam, you’ll lose your voice. Of course, this is just for run-of-the-mill I’ve-been-screaming-at-a-football-match type voice loss. Anything that messes with those muscles will cause you to lose your voice, and that can include things like aging, smoking (seriously, don’t smoke), damage to the larynx during surgery or even a tumor.

But unless you’re at risk for one of those things, your voice will come back once the strained muscles have had time to heal. In the meantime, I recommend carrying around a small whiteboard and whiteboard marker (It’s got good visibility, you can write easily and quickly, and you can write large enough that people not directly next to you can read it.) and learning how to finger spell.

How much do you talk?

You talk a lot. No, seriously. Even if you’re not a chatty person. You, as a human being who has not taken a vow of silence, transmit a lot of information.

Let me break it down for you. I recently did a large chunk of transcription, looking at speech data from four different people. I took a random two minute sample from each of those transcriptions, and they spoke 282,  257, 386 and 357 words in that time, for an average of around 160 words per minute. None of the people were talking faster than what I consider a normal rate, and I live in the South, where speaking rates are lower then they are in, say, California. But let’s pretend that this is your normal speaking rate.

Let’s put this in perspective.

Say you’re one of those brave souls who does NaNoWriMo, and you try to write a 50,000 word novel in a month. If you were writing your novel as fast as you speak, you’d finish in a little over five hours. That’s right. Every five hours you speak, you produce enough words to fill a book. Of course, you don’t spend five hours a day talking at full tilt, but even so, most people speak around 16,000 words a day. (The link is a Scientific American summing up of the paper in question.)

If you’re a hacker, you might be a little confused at the “words per minute” figure. (In other languages and for other purposes linguists tend to use morphemes, syllables, or phonemes, and measure them by the minute, second, or even hour.) The unit milliLampson sometimes pops up:

milliLampson /mil’*-lamp`sn/ /n./ A unit of talking speed, abbreviated mL. Most people run about 200 milliLampsons. The eponymous Butler Lampson [link mine] (a CS theorist and systems implementor highly regarded among hackers) goes at 1000. A few people speak faster. This unit is sometimes used to compare the (sometimes widely disparate) rates at which people can generate ideas and actually emit them in speech. For example, noted computer architect C. Gordon Bell (designer of the PDP-11) is said, with some awe, to think at about 1200 mL but only talk at about 300; he is frequently reduced to fragments of sentences as his mouth tries to keep up with his speeding brain.

Yeah… it’s cute, but you’re not really going to see it cropping up in linguistics literature. My guess would be, based on the speaking rate, that a milliLampson is loosely based on words per minute, probably based on Californian speakers (maybe even from, gasp! UC Berkeley), and then inflated by folkloric proportions. But that’s a great example of the type of misinformation that’s out there. Take this for example:

I love infographics. I do not love misinformation.
Image taken from infographic by Medical Billing and Coding (which can be found at http://www.medicalbillingandcoding.org/life-summed-up/) for educational purposes. Don't you feel educated? Always hunt down the citations for these random numbers the people vomit at you.

The fine folks at Medical Billing and Coding may have listed their sources, but I’m afraid one of their sources were wrong. Let take a look at this 73 million figure. I will even do all the arithmetic for you. I know, I know, I’m a peach.

Ok, so, let’s assume that their 18,140 figure is right, and that our 160 words/minute figure is right. In that case, we’ve got 18,140 hours per life x 60 minutes per hour  x 160 words per minute and do the multiplication and cancel out all the units super nicely, and we come up with 174,144,000 words per life. That’s almost 2.5 times as many as they predicted. Or, hey, since a little more math can’t hurt, let’s assume 75 speaking years per life x 365 days per year x 16,000 words per day and we come up with 438,000,000 words per life. And since I’m far more likely to trust the data from the article published in Science than my own little two-bit estimation, it looks like this infographic is wrong by a factor of 6.

What’s even more amazing, though, is that if you wrote down every single one of those words, it would be as long as 402.5 editions of Proust’s In Search of Lost Time, the longest novel ever writtenLike I said, you talk a lot.