Talkin’ ’bout my generativity

Quick, who’s this guy:

I dunno... could be the front half of an old centaur?
Used under the Creative Commons Attribution 2.0 Generic license, click for link to source.

If you answered “Einstein’s less famous brother, Einbert?” you wouldn’t actually be too far from the truth. It’s Noam Chomsky. He’s so famous his name comes pre-installed in Microsoft Word’s spell checker. (Did you mean “chomp sky?”)

If you’ve got a good history or government background, you may be thinking, “Oh yeah, the anarchy guy.” He may be, but his greatest intellectual achievement has nothing to do with anarchy and everything to do with linguistics. That achievement would be generativity.

Gen-er-a-tiv-i-ty. Write it down, it will be on the test.

Generativity was a game-changer for linguistics. Before that point, linguistics was basically phrenology, which I’ve mentioned before. Phrenology is to modern linguistics what naturalism is to modern biology. Phrenologists collected knowledge about languages haphazardly, without a whole lot of underlying theoretical structure. I mean, there was some, (I’ll talk about what the brother’s Grimm did on their weekends off later) but it was pretty confined. And a lot of it, let’s be honest, was about proving that Europe was best. The monumental Oxford English Dictionary is a good example of that mindset. They wanted to collect every single word in English language and pin it neatly to the page with a little series of notes about it and a list of sightings in the wild. It was, and remains, a grand undertaking and a staggering achievement… but modern linguists aren’t collectors anymore.

That’s because the end goal of modern linguistics is to solve language. The field is working to put together a series of rules that will actually describe and predict all human language. Not in the mind reader, fortune teller sense of predict. I mean that, with the right rules, we should be able to generate all possible sentences. In a generative way. By using generativity.

So why is this important?

Lots of reasons! Here, let me list them, because lists are fun to read.

  • This turned linguistics from an interesting hobby for rich people into a science. If you have rules, you can make predictions about what those rules will produce and then test those predictions. Testing predictions is also known as science. It’s also something that linguistics as a whole has been a little… hesitant to adopt, but that’s another story.
  • Suddenly computers! Computer programming is, at its most basic level, a series of rules. Linguistics is now dedicated to producing a series of rules. Bada-bing, bada-boom, universal translator. (It doesn’t work  that way, but, in theory, it eventually can.)
  • Now we have a framework that we can use to figure out how to ask questions. We have a goal. Things are organized.

Now for the promised test.

What term is used to describe the current goal of linguistics; i.e. to generate a set of rules that can accurately describe and predict language usage? (Seriously, I’m not going to give you the answer. Just scroll up.)

Indiscreet words, Part II: Son of Sounds

Ok, so in my last post about how the speech stream is far from discrete, I talked about how difficult it is to pick apart words. But I didn’t really talk that much about phonemes, and since I promised you phonetics and phonology and phun, I thought I should cover that. Besides, it’s super interesting.

It’s not just that language is continuous, it’s that language that’s discrete is actually impossible to understand. I ran across this Youtube video a while back that’s a great example of this phenomenon.

What the balls of yarn is he saying? It’s actually the preamble to the constitution, but it took me well over half the video to pick up on it, and I spend a dumb amount of time listening to phonemes in isolation.

You probably find this troubling on some level. After all, you’re a literate person, and as a literate person you’re really, really used to thinking about words as being easy to break down into “letter sounds”. If you’ve ever tried to fiddle around with learning Mandarin or Cantonese, you know just how table-flippingly frustrating it is to memorize a writing system where the graphemes (smallest unit of writing, just as morpheme is the smallest unit of meaning, phoneme is the small unit of sound and dormeme is the smallest amount of space you can legally house a person in) have no relation to the series of sounds they represent.

Fun fact: It’s actually pretty easy to learn to speak Mandarin or Cantonese once you get past the tones. They’re syntactically a lot like English, don’t have a lot of fussy agreement markers or grammatical gender and have a pretty small core vocabulary. It’s the characters that will make you tear your hair out.

Hm. Well, it kinda looks me sitting on a chair hunched over my laptop while wearing a little hat and ARGH WHAT AM I DOING THAT LOOKS NOTHING LIKE A BIRD.

But. Um. Sorry, got a little off track there. Point was, you’re really used to thinking about words as being further segmented. Like oranges. Each orange is an individual, and then there are neat little segments inside the orange so you don’t get your hands sticky. And, because you’re already familiar with the spelling system of your language, (which is, let’s face it, probably English) you probably have a fond idea that it’s pretty easy to divide words that way. But it’s not. If it were, things like instantaneous computational voice to voice translation would be common.

It’s hard because the edges of our sounds blur together like your aunt’s watercolor painting that you accidently spilled lemonade on. So let’s say you’re saying “round”. Well, for the “n” you’re going to close off your nasal passages and put your tongue against the little ridge right behind your teeth. But wait! That’s where you tongue needs to be to make the “d” sound! To make it super clear, you should stop open up your nasal passages before you flick your tongue down and release that little packet of air that you were storing behind it. You’re totally not going to, though. I mean, your tongue’s already where you need it to be; why would you take the extra time to make sure your nasal passages are fully clear before releasing the “d”? That’s just a waste of time. And if you did it, you’d sound weird. So the “d” gets some of that nasally goodness and neither you or your listener give a flying Fluco.

But, if you’re a computer who’s been told, “If it’s got this nasal sound, it’s an ‘n'”, then you’re going to be super confused. Maybe you’ll be all like, “Um, ok. It kinda sounds like an ‘n’, but then it’s got that little pop of air coming out that I’ve been told to look for with the ‘p’, ‘b’, ‘t’ ‘d’, ‘k’, ‘g’ set… so… let’s go with ‘rounp’. That’s a word, right?” Obviously, this is a vast over-simplification, but you get my point; computers are easily confused by the smearing around of sounds in words. They’re getting better, but humans are still the best.

So just remember: when you’re around the robot overlords, be sure to run your phonemes together as much as possible. It might confuse them enough for you to have time to run away.

Indiscreet Words

All right, first I’d like to apologize for the title. The opposite of discrete is not indiscreet, but continuous, and continuous is what language, especially speech, is. By continuous, I mean that it doesn’t come out in separable chunks; it’s more like a stream of water than a stream of ice cubes. In fact, English itself discriminates between things that are discrete and continuous; discrete things are called count nouns because (gasp!) you can count them, and continuous things are called mass nouns. You can count ice cubes and words, but you can’t count water or language unless you assign them units.

“But wait,” I can hear you protest. “Language is discrete.  I’m speaking in sentences, that are made up of words that are made up of letters.” And you’re right. For you, your language is made up of units that are psychologically real to you. Somewhere between the speaker vocalizing the words and you parsing them, you segment them using the rules that you’ve mastered. It’s a deeply complex process and one that we still don’t completely understand. If we did, we’d be able to write speech recognition programs that wouldn’t give us errors like “the wells were gathered and planning” for “the walls were dark and clammy”. (True life. I got that very error not that long ago.)

Here, let’s look at some data. Here’s the waveform that shows the wave intensity, or loudness, of a native speaker of English saying “I am an elephant.”

Can you pick out the part of the speech signal for each of the words? Here, let me help you.

So… if speech really is discrete, wouldn’t expect four separate bumps in loudness for the words, with silence in between? (Maybe with a couple extra bumps on the end for the laugher.)

Instead, what we get is pretty much a constant rush of noise that you rely on the vast amount of knowledge you have about your language to decode accurately. Take out that knowledge and you get something completely incomprehensible. And there’s a really easy way to show this, just listen to someone speaking a language you aren’t familiar with.

That’s Finnish and if you speak it well enough to understand everything he just said, I’d like to extend some mad props unto you; Finno-Ugric languages are as hard as ice-cream from a deep freezer. But to get back to the point, what observations can you make about what you just heard?

  • The speaker was speaking super-quickly.
  • There didn’t seem to be any pauses between words
  • Basically, it was like standing in front of a language fire hose.

For people who don’t speak your native language, you sound very similar. They’re not speaking any more quickly in Hindi or Mandarin or Swahili or German than you are in English, you just don’t have a metalinguistic framework to help you cut the sound-stream into words, slap it up on a syntactic framework and yank meaning out of it.

What happens when you lose your vioce?

To figure out why you lose your voice, let’s start by covering what happens when your voice is acting normally.

All sound is vibration. Like a bunch of people standing in loosely-spaced crowd, air molecules are pretty much doing their own thing. Then the source of the  sound, like a big bully, or maybe a bunch of bulls, pushes some people in the back of the crowd, and they push the people in front of them, etc. etc., until the last person jumps in your ear and bounces off your eardrum. Kinda like this:

So if all sound is vibration, something has to start it off. In a violin, it’s the friction between the bow and strings that causes the strings to vibrate. In a tuba or trumpet, it’s the vibration of the musician’s lips against the mouthpiece; if you just blow into a tuba without a proper embouchure (funny music-playing face), you’re not going to get any sound out of it. In you, it’s the vibration of your vocal cords that produce sound.

Aw, ain't it cute? The little vocal cords are so relaxed.
Uploaded by Samir at en.wikipedia and used here under the GFDL.

That’s them. But it’s actually a two-step process.

  • Step one: Tighten the vocal folds. This is like tuning a guitar; you can change the pitch of your voice based on how taut your vocal cords are. If you put your hand on your throat and sing a low note and a high one in quick succession, you can actually feel your muscle rotating as it adjusts the length of your vocal cords.
  • Step two: Vibrate those vocal folds. Now, you might think, based on step one, that you use your muscles to wiggle them back and forth really fast. Nope. You vibrate your vocal cords by blowing air through them. The more air, the louder the sound, the sooner you have to take a new breath.

So based on this, there are two possible ways to lose your voice. You can run out of air–which, unless you’ve had the breath knocked out of you, is a pretty straightforward problem to fix–or your muscles can crap out. And that’s generally why you lose your voice. The muscles in your larynx are just like any other muscles. If you use them hard enough, long enough, they’ll strain and, bam, you’ll lose your voice. Of course, this is just for run-of-the-mill I’ve-been-screaming-at-a-football-match type voice loss. Anything that messes with those muscles will cause you to lose your voice, and that can include things like aging, smoking (seriously, don’t smoke), damage to the larynx during surgery or even a tumor.

But unless you’re at risk for one of those things, your voice will come back once the strained muscles have had time to heal. In the meantime, I recommend carrying around a small whiteboard and whiteboard marker (It’s got good visibility, you can write easily and quickly, and you can write large enough that people not directly next to you can read it.) and learning how to finger spell.

How much do you talk?

You talk a lot. No, seriously. Even if you’re not a chatty person. You, as a human being who has not taken a vow of silence, transmit a lot of information.

Let me break it down for you. I recently did a large chunk of transcription, looking at speech data from four different people. I took a random two minute sample from each of those transcriptions, and they spoke 282,  257, 386 and 357 words in that time, for an average of around 160 words per minute. None of the people were talking faster than what I consider a normal rate, and I live in the South, where speaking rates are lower then they are in, say, California. But let’s pretend that this is your normal speaking rate.

Let’s put this in perspective.

Say you’re one of those brave souls who does NaNoWriMo, and you try to write a 50,000 word novel in a month. If you were writing your novel as fast as you speak, you’d finish in a little over five hours. That’s right. Every five hours you speak, you produce enough words to fill a book. Of course, you don’t spend five hours a day talking at full tilt, but even so, most people speak around 16,000 words a day. (The link is a Scientific American summing up of the paper in question.)

If you’re a hacker, you might be a little confused at the “words per minute” figure. (In other languages and for other purposes linguists tend to use morphemes, syllables, or phonemes, and measure them by the minute, second, or even hour.) The unit milliLampson sometimes pops up:

milliLampson /mil’*-lamp`sn/ /n./ A unit of talking speed, abbreviated mL. Most people run about 200 milliLampsons. The eponymous Butler Lampson [link mine] (a CS theorist and systems implementor highly regarded among hackers) goes at 1000. A few people speak faster. This unit is sometimes used to compare the (sometimes widely disparate) rates at which people can generate ideas and actually emit them in speech. For example, noted computer architect C. Gordon Bell (designer of the PDP-11) is said, with some awe, to think at about 1200 mL but only talk at about 300; he is frequently reduced to fragments of sentences as his mouth tries to keep up with his speeding brain.

Yeah… it’s cute, but you’re not really going to see it cropping up in linguistics literature. My guess would be, based on the speaking rate, that a milliLampson is loosely based on words per minute, probably based on Californian speakers (maybe even from, gasp! UC Berkeley), and then inflated by folkloric proportions. But that’s a great example of the type of misinformation that’s out there. Take this for example:

I love infographics. I do not love misinformation.
Image taken from infographic by Medical Billing and Coding (which can be found at for educational purposes. Don't you feel educated? Always hunt down the citations for these random numbers the people vomit at you.

The fine folks at Medical Billing and Coding may have listed their sources, but I’m afraid one of their sources were wrong. Let take a look at this 73 million figure. I will even do all the arithmetic for you. I know, I know, I’m a peach.

Ok, so, let’s assume that their 18,140 figure is right, and that our 160 words/minute figure is right. In that case, we’ve got 18,140 hours per life x 60 minutes per hour  x 160 words per minute and do the multiplication and cancel out all the units super nicely, and we come up with 174,144,000 words per life. That’s almost 2.5 times as many as they predicted. Or, hey, since a little more math can’t hurt, let’s assume 75 speaking years per life x 365 days per year x 16,000 words per day and we come up with 438,000,000 words per life. And since I’m far more likely to trust the data from the article published in Science than my own little two-bit estimation, it looks like this infographic is wrong by a factor of 6.

What’s even more amazing, though, is that if you wrote down every single one of those words, it would be as long as 402.5 editions of Proust’s In Search of Lost Time, the longest novel ever writtenLike I said, you talk a lot.

Phrenology != Phonology

This is not linguistics. It is, however, pretty cool. Photo taken by Flickr user Uncle Catherine, and used under the Creative Commons Attribution 2.5 Generic license.

Linguistics is a huge field. It includes everything from the algorithms behind Siri to preserving endangered languages like flies in amber to reconstructing dead languages. (Unlike biologists, we don’t have to worry about an undead T-Rex wandering around if things go terribly, terribly wrong.) Since it’s just one intrepid girl linguist here at Making Noise and Hearing Things, I’m going to have to restrict myself to just a single set of sub-disciplines. These are:

      • Psycholinguistics: Like a zombie valiantly  trying to overcome his crippling aphasia, psycholinguistics all about language and brains. Since I’m all about sound, you’ll probably be getting a lot of stuff about brains and sound.
      • Phonology: Often confused with  phrenology (no, seriously, this happens to me all the time) it’s the study of the systems of rules languages apply to their sounds. Here’s a quick example: say “dogs” and “cats”. Is the “s” on the end of both of those words the same? Try saying it again with your hand right above your Adam’s apple or where your Adam’s apple would be. When you say the “s” on “dogs” you should feel a slight buzzing, like you’ve swallowed a bee. The “s” on “cats”, though, doesn’t have it. Whether or not the final “s” has buzzing in it (linguists call it “voicing”) is determined by a simple rule in English: you get vibration on the final “s” if the sound before it had it. The “g” sound in “dog” has vibration; the “t” sound in “cat” doesn’t.
      • Phonetics: This is the study of sounds themselves. Phonology is all like, “Oh, yeah, that was voicing.” Phonetics is all like, “Sure, but how much voicing? How long did it last? How much air came out?” Phonetics wants to know all the dirty details. Phonetics takes videos like this one, where you can see the vocal folds vibrating in slow motion. [[WARNING: If you are prone to nightmares of terrors from beyond space, you might want to skip this one. Just saying.]]

But, yeah, those are the biggies. I can’t promise I won’t be branching out from these sub-disciplines, but I can promise an extremely low frequency of syntax posts. (Low frequency! Get it? Because… sound… um. Never mind.)

You Are a Linguist

Unless you have a degree in linguistics or are working as a translator (not the same thing, but I’ll get to that later) you probably read the title of this post and immediately thought “No, I’m not.” Trust me, you are. How do I know? Well, a linguist is someone who does two things:

  1. Makes claims about language
  2. Attempts to either verify or disprove these claims (whether they made them or someone else did).

That’s it. There’s no secret cabal of linguists you have to join, you don’t have to speak thirty languages, and you certainly don’t have to have a PhD. I think if you start paying attention, you’ll notice that you do this all the time. Have you ever had a conversation like this?

Lulu: She talks slow.

Max: Really? I’ve never noticed it.

Or maybe one like this:

Lulu: We go to the zoo.

Max: Don’t you mean we’re going to the zoo?

Lulu: No, we go to the zoo all the time.

Max: But we’re going today, so you could have said that “we’re going”.

Lulu: Yeah, but that’s not what I meant.

Bam. You’re a linguist; go you! “But wait a minute,” you say, “I know for a fact that translators are called linguists. Are you saying that just speaking another language doesn’t make you a linguist? Because that’s what I’ve always heard.”

Man, you’ve got the linguistics bug bad. Look at you, bringing up fine semantic distinctions! (Semantics is the study of how words map onto meaning, BTW.) And you’re absolutely right, a linguist can also be someone who speaks more than one language. The Oxford English Dictionary, the most complete record of the English language, defines a linguist as, first:

“One who is skilled in the use of languages; one who is master of other tongues besides his own. (Often with adj. indicating the degree or extent of the person’s skill.)”

And only later as:

A student of language; a philologist.”

Philology is what the very beginnings of the modern study of language were called. These days, most people prefer the term “linguistics”, and only use philology for a certain field of study within linguistics. For the purposes of this blog and most academic settings, a linguist is not someone who knows languages, but someone who knows about languages. And since knowing a language also automatically means you know about a language–if you’re a native English speaker, you can easily identify where people are from based on their accent, for example–you, sir or madam or other, are a linguist.