What sounds you can feel but not hear?

I got a cool question from Veronica the other day: 

Which wavelength someone would use not to hear but feel it on the body as a vibration?

So this would depend on two things. The first is your hearing ability. If you’ve got no or limited hearing, most of your interaction with sound will be tactile. This is one of the reasons why many Deaf individuals enjoy going to concerts; if the sound is loud enough you’ll be able to feel it even if you can’t hear it. I’ve even heard stories about folks who will take balloons to concerts to feel the vibrations better. In this case, it doesn’t really depend on the pitch of the sound (how high or low it is), just the volume.

But let’s assume that you have typical hearing. In that case, the relationship between pitch, volume and whether you can hear or feel a sound is a little more complex. This is due to something called “frequency response”. Basically, the human ear is better tuned to hearing some pitches than others. We’re really sensitive to sounds in the upper ranges of human speech (roughly 2k to 4k Hz). (The lowest pitch in the vocal signal can actually be much lower [down to around 80 Hz for a really low male voice] but it’s less important to be able to hear it because that frequency is also reflected in harmonics up through the entire pitch range of the vocal signal. Most telephones only transmit signals between  300 Hz to 3400 Hz, for example, and it’s only really the cut-off at the upper end of the range that causes problems–like making it hard to tell the difference between “sh” and “s”.)

Weighting curves.png
An approximation of human frequency response curves. The line basically shows where we hear things as “equally loud”. So a 100 Hz sound (like a bass guitar) needs to be ten times as loud as a 1000 Hz sound (like a flute or piccolo) to sound equally loud. 

The takeaway from all this is that we’re not super good at hearing very low sounds. That means they can be very, very loud before we pick up on them. If the sound is low enough and loud enough, then the only way we’ll be able to sense it is by feeling it.

How low is low enough? Most people can’t really hear anything much below 20 Hz (like the lowest note on a really big organ). The older you are and the more you’ve been exposed to really loud noises in that range, like bass-heavy concerts or explosions, the less you’ll be able to pick up on those really low sounds.

What about volume? My guess for what would be “sufficiently loud”, in this case, is 120+ Db. 120 Db is as loud as a rock concert, and it’s possible, although difficult and expensive, to get out of a home speaker set-up. If you have a neighbor listening to really bass-y music or watching action movies with a lot of low, booming sound effects on really expensive speakers, it’s perfectly possible that you’d feel those vibrations rather than hearing them. Especially if there are walls between the speakers and you. While mid and high frequency sounds are pretty easy to muffle, low-frequency sounds are much more difficult to sound proof against.

Are there any health risks? The effects of exposure to these types of low-frequency noise is actually something of an active research question. (You may have heard about the “brown note“, for example.) You can find a review of some of that research here. One comforting note: if you are exposed to a very loud sound below the frequencies you can easily hear–even if it’s loud enough to cause permanent damage at much higher frequencies–it’s unlikely that you will suffer any permanent hearing loss. That doesn’t mean you shouldn’t ask your neighbor to turn down the volume, though; for their ears if not for yours!

Advertisements

Great Ideas in Linguistics: Consonants and Vowels

Consonants and vowels are one of the handful of linguistics terms that have managed to escape the cage of academic discourse to make their nest in the popular conciousness. Everyone knows what the difference between a vowel and a consonant is, right? Let’s check super quick. Pick the option below that best describes a vowel:

  • Easy! It’s A, E, I, O, U and sometimes Y.
  • A speech sound produced without constriction of the vocal tract above the glottis.

Everyone got the second one, right? No? Huh, maybe we’re not  on the same page after all.

There’s two problems with the “andsometimesY” definition of vowels. The first is that it’s based on the alphabet and, as I’ve discussed before, English has a serious problem when it comes to mapping sounds onto letters in a predictable way. (It gives you the very false impression that English has six-ish vowels when it really has twice that many.) The second is that isn’t really a good way of modelling what a vowel actually is. If we got a new letter in the alphabet tomorrow, zborp, we’d have no principled way of determining whether it was a vowel or not.

Letter dice d6.JPG
Ah, a new letter is it? Time to get out the old vowelizing dice and re-roll.  “Letter dice d6”. Licensed under CC BY-SA 3.0 via Wikimedia Commons.

But the linguistic definition captures some other useful qualities of vowels as well. Since vowels don’t have a sharp constriction, you get acoustic energy pretty much throughout the entire spectrum. Not all frequencies are created equal, however. In vowels, the shape of the vocal tract creates pockets of more concentrated acoustic energy. We call these “formants” and they’re so stable between repetitions of vowels that they can be used to identify which vowel it is. In fact, that’s what you’re using to distinguish “beat” from “bet” from “bit” when you hear them aloud. They’re also easy to measure, which means that speech technologies rely really heavily on them.

Another quality of vowels is that, since the whole vocal tract has to unkink itself (more or less) they tend to take a while to produce. And that same openness means that not much of the energy produced at the vocal folds is absorbed. In simple terms, this means that vowels tend to be longer and louder than other sounds, i.e. consonants. This creates a neat little one-two where vowels are both easier to produce and hear. As a result, languages tend to prefer to have quite a lot of vowels, and to tack consonants on to them. This tendency shakes out create a robust pattern across languages where you’ll get one or two consonants, then a vowel, then a couple consonants, then a vowel, etc. You’ve probably run across the term linguists use for those little vowel-nuggets: we call them syllables.

If you stick with the “andsometimesY” definition, though, you lose out on including those useful qualities. It may be easier to teach to five-year-olds, but it doesn’t really capture the essential vowelyness of vowels. Fortunately, the linguistics definition does.

The Motor Theory of Speech Perception

Ok, so like I talked about in my previous two posts, modelling speech perception is an ongoing problem with a lot of hurdles left to jump. But there are potential candidate theories out there, all of which offer good insight into the problem. The first one I’m going to talk about is motor theory.

Clamp-Type 2C1.5-4 Motor
So your tongue is like the motor body and the other person’s ear are like the load cell…
So motor theory has one basic premise and three major claims.  The basic premise is a keen observation: we don’t just perceive speech sounds, we also make them. Whoa, stop the presses. Ok, so maybe it seems really obvious, but motor theory was really the first major attempt to model speech perception that took this into account. Up until it was first posited in the 1960’s , people had pretty much been ignoring that and treating speech perception like the only information listeners had access to was what was in the acoustic speech signal. We’ll discuss that in greater detail, later, but it’s still pretty much the way a lot of people approach the problem. I don’t know of a piece of voice recognition software, for example, that include an anatomical model.

So what’s the fact that listeners are listener/speakers get you? Well, remember how there aren’t really invariant units in the speech signal? Well, if you decide that what people are actually perceiving aren’t actually a collection of acoustic markers that point to one particular language sound but instead the gestures needed to make up that sound, then suddenly that’s much less of a problem. To put it in another way, we’re used to thinking of speech being made up of a bunch of sounds, and that when we’re listening speech we’re deciding what the right sounds are and from there picking the right words. But from a motor theory standpoint, what you’re actually doing when you’re listening to speech is deciding what the speaker’s doing with their mouth and using that information to figure out what words they’re saying. So in the dictionary in your head, you don’t store words as strings of sounds but rather as strings of gestures

If you’re like me when I first encountered this theory, it’s about this time that you’re starting to get pretty skeptical. I mean, I basically just said that what you’re hearing is the actual movement of someone else’s tongue and figuring out what they’re saying by reverse engineering it based on what you know your tongue is doing when you say the same word. (Just FYI, when I say tongue here, I’m referring to the entire vocal tract in its multifaceted glory, but that’s a bit of a mouthful. Pun intended. 😉 ) I mean, yeah, if we accept this it gives us a big advantage when we’re talking about language acquisition–since if you’re listening to gestures, you can learn them just by listening–but still. It’s weird. I’m going to need some convincing.

Well, let’s get back to the those three principles I mentioned earlier, which are taken from Galantucci, Flower and Turvey’s excellent review of motor theory.

  1. Speech is a weird thing to perceive and pretty much does its own thing. I’ve talked about this at length, so let’s just take that as a given for now.
  2. When we’re listening to speech, we’re actually listening to gestures. We talked about that above. 
  3. We use our motor system to help us perceive speech.

Ok, so point three should jump out at you a bit. Why? Of these three points, its the easiest one to test empirically. And since I’m a huge fan of empirically testing things (Science! Data! Statistics!) we can look into the literature and see if there’s anything that supports this. Like, for example, a study that shows that when listening to speech, our motor cortex gets all involved. Well, it turns out that there  are lots of studies that show this. You know that term “active listening”? There’s pretty strong evidence that it’s more than just a metaphor; listening to speech involves our motor system in ways that not all acoustic inputs do.

So point three is pretty well supported. What does that mean for point two? It really depends on who you’re talking to. (Science is all about arguing about things, after all.) Personally, I think motor theory is really interesting and address a lot of the problems we face in trying to model speech perception. But I’m not ready to swallow it hook, line and sinker. I think Robert Remez put it best in the proceedings of Modularity and The Motor Theory of Speech Perception:

I think it is clear that Motor Theory is false. For the other, I think the evidence indicates no less that Motor Theory is essentially, fundamentally, primarily and basically true. (p. 179)

On the one hand, it’s clear that our motor system is involved in speech perception. On the other, I really do think that we use parts of the acoustic signal in and of themselves. But we’ll get into that in more depth next week.

Why is it hard to model speech perception?

So this is a kick-off post for a series of posts about various speech perception models. Speech perception models, you ask? Like, attractive people who are good at listening?

Romantic fashion model
Not only can she discriminate velar, uvular and pharyngeal fricatives with 100% accuracy, but she can also do it in heels.
No, not really. (I wish that was a job…) I’m talking about a scientific model of how humans perceive speech sounds. If you’ve ever taken an introductory science class, you already have some experience with scientific models. All of Newton’s equations are just a way of generalizing general principals generally across many observed cases. A good model has both explanatory and predictive power. So if I say, for example, that force equals mass times acceleration, then that should fit with any data I’ve already observed as well as accurately describe new observations. Yeah, yeah, you’re saying to yourself, I learned all this in elementary school. Why are you still going on about it? Because I really want you to appreciate how complex this problem is.

Let’s take an example from an easier field, say, classical mechanics. (No offense physicists, but y’all know it’s true.) Imagine we want to model something relatively simple. Perhaps we want to know whether a squirrel who’s jumping from one tree to another is going to make. What do we need to know? And none of that “assume the squirrel is a sphere and there’s no air resistance” stuff, let’s get down to the nitty-gritty. We need to know the force and direction of the jump, the locations of the trees, how close the squirrel needs to get to be able to hold on, what the wind’s doing, air resistance and how that will interplay with the shape of the squirrel, the effects of gravity… am I missing anything? I feel like I might be, but that’s most of it.

So, do you notice something that all of these things we need to know the values of have in common? Yeah, that’s right, they’re easy to measure directly. Need to know what the wind’s doing? Grab your anemometer. Gravity? To the accelerometer closet! How far apart the trees are? It’s yardstick time. We need a value , we measure a value, we develop a model with good predictive and explanatory power (You’ll need to wait for your simulations to run on your department’s cluster. But here’s one I made earlier so you can see what it looks like. Mmmm, delicious!) and you clean up playing the numbers on the professional squirrel-jumping circuit.

Let’s take a similarly simple problem from the field of linguistics. You take a person, sit them down in a nice anechoic chamber*, plop some high quality earphones on them and play a word that could be “bite” and could be “bike” and ask them to tell you what they heard. What do you need to know to decide which way they’ll go? Well, assuming that your stimuli is actually 100% ambiguous (which is a little unlikely) there a ton of factors you’ll need to take into account. Like, how recently and often has the subject heard each of the words before? (Priming and frequency effects.) Are there any social factors which might affect their choice? (Maybe one of the participant’s friends has a severe overbite, so they just avoid the word “bite” all together.) Are they hungry? (If so, they’ll probably go for “bite” over “bike”.) And all of that assumes that they’re a native English speaker with no hearing loss or speech pathologies and that the person’s voice is the same as theirs in terms of dialect, because all of that’ll bias the  listener as well.

The best part? All of this is incredibly hard to measure. In a lot of ways, human language processing is a black box. We can’t mess with the system too much and taking it apart to see how it works, in addition to being deeply unethical, breaks the system. The best we can do is tap a hammer lightly against the side and use the sounds of the echos to guess what’s inside. And, no, brain imaging is not a magic bullet for this.  It’s certainly a valuable tool that has led to a lot of insights, but in addition to being incredibly expensive (MRI is easily more than a grand per participant and no one has ever accused linguistics of being a field that rolls around in money like a dog in fresh-cut grass) we really need to resist the urge to rely too heavily on brain imaging studies, as a certain dead salmon taught us.

But! Even though it is deeply difficult to model, there has been a lot of really good work done on towards a theory of speech perception. I’m going to introduce you to some of the main players, including:

  • Motor theory
  • Acoustic/auditory theory
  • Double-weak theory
  • Episodic theories (including Exemplar theory!)

Don’t worry if those all look like menu options in an Ethiopian restaurant (and you with your Amharic phrasebook at home, drat it all); we’ll work through them together.  Get ready for some mind-bending, cutting-edge stuff in the coming weeks. It’s going to be [fʌn] and [fʌnetɪk]. 😀

*Anechoic chambers are the real chambers of secrets.

Why do I really, really love West African languages?

So I found a wonderful free app that lets you learn Yoruba, or at least Yoruba words,  and posted about it on Google plus. Someone asked a very good question: why am I interested in Yoruba? Well, I’m not interested just in Yoruba. In fact, I would love to learn pretty much any western African language or, to be a little more precise, any Niger-Congo language.

Niger-Congo-en
This map’s color choices make it look like a chocolate-covered ice cream cone.
Why? Well, not to put too fine a point on it, I’ve got a huge language crush on them. Whoa there, you might be thinking, you’re a linguist. You’re not supposed to make value judgments on languages. Isn’t there like a linguist code of ethics or something? Well, not really, but you are right. Linguists don’t usually make value judgments on languages. That doesn’t mean we can’t play favorites!  And West African languages are my favorites. Why? Because they’re really phonologically and phonetically interesting. I find the sounds and sound systems of these languages rich and full of fascinating effects and processes. Since that’s what I study within linguistics, it makes sense that that’s a quality I really admire in a language.

What are a few examples of Niger-Congo sound systems that are just mind blowing? I’m glad you asked.

  • Yoruba: Yoruba has twelve vowels. Seven of them are pretty common (we have all but one in American English) but if you say four of them nasally, they’re different vowels. And if you say a nasal vowel when you’re not supposed to, it’ll change the entire meaning of a word. Plus? They don’t have a ‘p’ or an ‘n’ sound. That is crazy sauce! Those are some of the most widely-used sounds in human language. And Yoruba has a complex tone system as well. You probably have some idea of the level of complexity that can add to a sound system if you’ve ever studied Mandarin, or another East Asian language. Seriously, their sound system makes English look childishly simplistic.
  • Akan: There are several different dialects of Akan, so I’ll just stick to talking about Asante, which is the one used in universities and for official business. It’s got a crazy consonant system. Remember how  Yoruba didn’t have an “n” sound? Yeah, in Akan they have nine. To an English speaker they all  pretty much sound the same, but if you grew up speaking Akan you’d be able to tell the difference easily. Plus, most sounds other than “p”, “b”, “f” or “m” can be made while rounding the lips (linguists call this “labialized” and are completely different sounds). They’ve also got a vowel harmony system, which means you can’t have vowels later in a word that are completely different from vowels earlier in the word. Oh, yeah, and tones and a vowel nasalization distinction and some really cool tone terracing. I know, right? It’s like being a kid in a candy store.

But how did these language get so cool? Well, there’s some evidence that these languages have really robust and complex sound systems because the people speaking them never underwent large-scale migration to another Continent. (Obviously, I can’t ignore the effects of colonialism or the slave trade, but it’s still pretty robust.) Which is not to say that, say, Native American languages don’t have awesome sound systems; just just tend to be slightly smaller on average.

Now that you know how kick-ass these languages, I’m sure you’re chomping at the bit to hear some of them. Your wish is my command; here’s a song in Twi (a dialect of Akan) from one of my all-time-favorite musicians: Sarkodie. (He’s making fun of Ghanaian emigrants who forget their roots. Does it get any better than biting social commentary set to a sick beat?)

How to pronounce the “th” sound in English

Or, as I like to call it, hunting the wild Eth and Thorn (which are old letters that can be difficult to typesest), because back in the day, English had the two distinct “th” sounds represented differently in their writing system. There was one where you vibrated your vocal folds (that’s called ‘voiced’) which was written as “ð” and one where you didn’t (unvoiced) which was written as “þ”. It’s a bit like the difference between “s” and “z” in English today. Try it: you can say both “s” and “z” without moving your tongue a millimeter. Unfortunately, while the voiced and voiceless “th” sounds remain distinct, they’re now represented by the same “th” sequence. The difference between “thy” and “thigh”, for example, is the first sound, but the spelling doesn’t reflect that. (Yet another example of why English orthography is horrible.)

Used with permission from the How To Be British Collection copyright LGP, click picture for website.

The fact that they’re written with the same letters even though they’re different sounds is only part of why they’re so hard to master. (That goes for native English speakers as well as those who are learning it as their second language: it’s one of the last sounds children learn.). The other part is that they’re relatively rare across languages. Standard Arabic  Greek, some varieties of Spanish, Welsh and a smattering of other languages have them.  If you happen to have a native language that doesn’ t have it, though, it’s tough to hear and harder to say. Don’t worry, though, linguistics can help!

I’m afraid the cartoon above may accurately express the difficulty of  producing the “th” for non-native speakers of English, but the technique is somewhat questionable. So, the fancy technical term for the “th” sounds are the interdental fricatives.  Why? Because there are two parts to making it. The first is the place of articulation, which means where you put your tongue. In this case, as you can probably guess (“inter-” between and “-dental” teeth), it goes in between your teeth. Gently!

The important thing about your tongue placement is that your tongue tip needs to be pressed lightly against the bottom of your top teeth. You need to create a small space to push air thorough, small enough that it makes a hissing sound as it escapes. That’s the “fricative” part. Fricatives are sounds where you force air through a small space and the air molecules start jostling each other and make a high-frequency hissing noise. Now, it won’t be as loud when you’re forcing air between your upper teeth and tongue as it is, for example, when you’re making an “s”, but it should still be noticeable.

So, to review, put the tip of  your tongue against the bottom of your top teeth. Blow air through the thin space between your tongue and your teeth so that it creates a (not very loud) hissing  sound. Now try voicing the sound (vibrating  your vocal folds) as you do so. That’s it! You’ve got both of the English “th” sounds down.

If you’d like some more help, I really like this video, and it has some super-cool slow-motion videos. The lady who made it has a website focusing on English pronunciation which has some great  resources.  Good luck!