Feeling Sound

We’re all familiar with the sensation of sound so loud we can actually feel it: the roar of a jet engine, the palpable vibrations of a loud concert, a thunderclap so close it shakes the windows. It may surprise you to learn, however, that that’s not the only way in which we “feel” sounds. In fact, recent research suggests that tactile information might be just as important as sound in some cases!

Touch Gently (3022697095)
What was that? I couldn’t hear you, you were touching too gently.
I’ve already talked about how we can see sounds, and the role that sound plays in speech perception before. But just how much overlap is there between our sense of touch and hearing? There is actually pretty strong evidence that what we feel can actually override what we’re hearing. Yau et. al. (2009), for example, found that tactile expressions of frequency could override auditory cues. In other words, you might hear two identical tones as different if you’re holding something that is vibrating faster or slower. If our vision system had a similar interplay, we might think that a person was heavier if we looked at them while holding a bowling ball, and lighter if we looked at them while holding a volleyball.

And your sense of touch can override your ears (not that they were that reliable to begin with…) when it comes to speech as well. Gick and Derrick (2013) have found that tactile information can override auditory input for speech sounds. You can be tricked into thinking that you heard a “peach” rather than “beach”, for example, if you’re played the word “beach” and a puff of air is blown over your skin just as you hear the “b” sound. This is because when an English speaker says “peach”, they aspirate the “p”, or say it with a little puff of air. That isn’t there when they say the “b” in “beach”, so you hear the wrong word.

Which is all very cool, but why might this be useful to us as language-users? Well, it suggests that we use a variety of cues when we’re listening to speech. Cues act as little road-signs that point us towards the right interpretation. By having access to a lots of different cues, we ensure that our perception is more robust. Even when we lose some cues–say, a bear is roaring in the distance and masking some of the auditory information–you can use the others to figure out that your friend is telling you that there’s a bear. In other words, even if some of the road-signs are removed, you can still get where you’re going. Language is about communication, after all, and it really shouldn’t be surprising that we use every means at our disposal to make sure that communication happens.

Advertisements

Why speech is different from other types of sounds

Ok, so, a couple weeks ago I talked about why speech perception was hard to  model. Really, though, what I talked about was why building linguistic models is a hard task. There’s a couple other thorny problems that plague people who work with speech perception, and they have to do with the weirdness of the speech signal itself. It’s important to talk about because it’s on account of dealing with these weirdnesses that some theories of speech perception themselves can start to look pretty strange. (Motor theory, in particular, tends to sound pretty messed-up the first time you encounter it.)

The speech signal and the way we deal with it is really strange in two main ways.

  1. The speech signal doesn’t contain invariant units.
  2. We both perceive and produce speech in ways that are surprisingly non-linear.

So what are “invariant units” and why should we expect to have them? Well, pretty much everyone agrees that we store words as larger chunks made up of smaller chunks. Like, you know that the word “beet” is going to be made with the lips together at the beginning for the “b” and your tongue behind your teeth at the end for the “t”. And you also know that it will have certain acoustic properties; a short  break in the signal followed by a small burst of white noise in a certain frequency range (that’s a the “b” again) and then a long steady state for the vowel and then another sudden break in the signal for the “t”. So people make those gestures and you listen for those sounds and everything’s pretty straightforwards  right? Weeellllll… not really.

It turns out that you can’t really be grabbing onto certain types of acoustic queues because they’re not always reliably there. There are a bunch of different ways to produce “t”, for example, that run the gamut from the way you’d say it by itself to something that sound more like a “w” crossed with an “r”. When you’re speaking quickly in an informal setting, there’s no telling where on that continuum you’re going to fall. Even with this huge array of possible ways to produce a sound, however, you still somehow hear is at as “t”.

And even those queues that are almost always reliably there vary drastically from person to person. Just think about it: about half the population has a fundamental frequency, or pitch, that’s pretty radically different from the other half. The old interplay of biological sex and voice quality thing. But you can easily, effortlessly even, correct for the speaker’s gender and understand the speech produced by men and women equally well. And if a man and woman both say “beet”, you have no trouble telling that they’re saying the same word, even though the signal is quite different in both situations. And that’s not a trivial task. Voice recognition technology, for example, which is overwhelmingly trained on male voices, often has a hard time understanding women’s voices. (Not to mention different accents. What that says about regional and sex-based discrimination is a  topic for another time.)

And yet. And yet humans are very, very good a recognizing speech. How? Well linguists have made some striking progress in answering that question, though we haven’t yet arrived at an answer that makes everyone happy. And the variance in the signal isn’t the only hurdle facing humans as the recognize the vocal signal: there’s also the fact that the fact that we are humans has effects on what we can hear.

Akustik db2phon
Ooo, pretty rainbow. Thorny problem, though: this shows how we hear various frequencies better or worse. The sweet spot is right around 300 kHz or so. Which, coincidentally, just so happens to be where we produce most of the noise in the speech signal. But we do still produce information at other frequencies and we do use that in speech perception: particularly for sounds like “s” and “f”.

We can think of the information available in the world as a sheet of cookie dough. This includes things like UV light and sounds below 0 dB in intensity. Now imagine a cookie-cutter. Heck, make it a gingerbread man. The cookie-cutter represents the ways in which the human body limits our access to this information. There are just certain things that even a normal, healthy human isn’t capable of perceiving. We can only hear the information that falls inside the cookie cutter. And the older we get, the smaller the cookie-cutter becomes, as we slowly lose sensitivity in our auditory and visual systems. This makes it even more difficult to perceive speech. Even though it seems likely that we’ve evolved our vocal system to take advantage of the way our perceptual system works, it still makes the task of modelling speech perception even more complex.

What’s the loudest / softest / highest / lowest sound humans can hear?

Humans can’t hear all sounds. Actually, not even most, in the grand scheme of things. Like how we can only see a narrow band of all wavelengths–hence “visible” light–we can also only hear some of the possible wavelengths. And wave heights. You might remember this from physics, but there are two measurements that are really important on a diagram of a wave: wave length and intensity. Like so:

CPT-sound-physical-manifestation
This should be bringing back flashbacks of asking if you were going to be able to use a formula sheet and a calculator.
So you’ve got the wavelength, which is the distance between two peaks or two troughs, and the amplitude, which is the distance between the mid-point of the wave and the tip of a peak. Which is all very well, but it doesn’t tell you much in the grand scheme of things, since most waves aren’t kind enough to present themselves to you as labelled diagrams. You actually have a pretty good intuitive grasp of the wavelength and amplitude of sound waves, though. The first is pitch and the second is what I like to refer to as “loudness”. (Technically, “loudness” is a perceptual measurement, not a… you know, this is starting to be boring.)

So there’s a limit in how loud and how soft a sound can be and a limit of how high and low a sound can  be. I’ll deal with loudness first, because it’s less fun.

Loudness

So we measure loudness using the decibel scale, which is based on human perception. Since 0 decibels is, by definition, the lower perceptual limit of sound for humans, the quietest sound humans can hear is just above that, which is around 20 micro-pascals of pressure. Of course, that’s healthy young humans. The older you get, the more your hearing range decreases, which is why your grandmother asks you to repeat yourself a lot. The loudest is just under 160 decibels, since exposure to a sound at 160 decibels will literally rip your eardrum. That’s things like being right under a cannon when it fires, standing next to a rocket when it launches or standing right next to a jet engine during take off, all of which tend to have other problems associated with them. So… avoid that.

Pitch

Pitch is a bit more interesting. Normal human hearing is generally between 20 hertz and 20 kilohertz–compare that to 15 to 200 kilohertz for dolphins and bats! (Because they both rely on sonar and echo-locution for hunting.) Just like hearing range for loudness, though, this gets narrower as you get older, particularly at the higher end of the range. Here’s a video that runs the gamut of the human hearing range (warning: you might want to turn your speakers down).

If you’re older than 25 (which is when hearing loss usually starts in the upper ranges) you probably couldn’t hear the whole thing. If you did, congratulations! You’ve got the hearing of a normal, young human.