Hard science vs. soft science and the science mystique

So, recently I’ve been doing a lot of thinking and reading about what it means to do science, what science entails  and what is (and is not) science. Partly, this was sparked by the  fact that, at a recent middle school science education event, I was asked more than once why linguistics counted as a science. This intrigued me, as no one at the Lego robots display next to us had their discipline’s qualifications questioned, despite the fact that engineering is not scientific. Rigorous, yes. Scientific, no.

Hmm, I dunno. Looks science-y, but I don’t see any lab coats. Or goggles. There should definitely be more goggles.
This subject is particularly near and dear to me because my own research looks into, among other things, how the ways in which linguists gather data affect the data they gather and the potential for systematic bias that introduces. In order to look at how we do things, I also need to know why. And that’s where this discussion of science comes in. This can be a hard discussion to have, however, since conversations about what science is, or should be, tends to get muddied by the popular conception of science. I’m not saying people don’t know what science is, ’cause I think most people do, just that we (and I include myself in that) also have a whole bucketful of other socially-motivated ideas that we tend to lump in with science.

I’m going to call the social stuff that we’ve learned to associate with science The Science Mystique. I’m not the first person to call it that, but I think it’s fitting. (Note that if you’re looking for the science of Mystique, you’ll need to look elsewhere.) To start in our exploration of the Science Mystique, let’s start with a quote from another popular science writer, Phil Plait.

They [the scientists who made the discoveries discussed earlier in the speech] used physics. They used math. They used chemistry, biology, astronomy, engineering.

They used science.

These are all the things you discovered doing your projects. All the things that brought you here today.

Computers? Cell phones? Rockets to Saturn, probes to the ocean floor, PSP, gamecubes, gameboys, X-boxes? All by scientists.

Those places I talked about before? You can get to know them too. You can experience the wonder of seeing them for the first time, the thrill of discovery, the incredible, visceral feeling of doing something no one has ever done before, seen things no one has seen before, know something no one else has ever known.

No crystal balls, no tarot cards, no horoscopes. Just you, your brain, and your ability to think.

Welcome to science. You’re gonna like it here.

Inspirational! Science-y! Misleading! Wait, what?

So there are a couple things here that I find really troubling, and I’m just going to break them down and go though them one by one. These are things that are part of the science mystique, that permeate our cultural conception of what science is, and I’ve encountered them over and over and over again. I’m just picking on this particular speech because it’s been slathered all over the internet lately and I’ve encountered a lot of people who really resonated with its message.

  1. Science and engineering and math are treated as basically the same thing.  This. This is one of my biggest pet peeves when it comes to talking about science. Yes, I know that STEM fields (that’s Science, Technology, Engineering and Mathematics) are often lumped together. Yes, I know that there’s a lot of cross-pollination. But one, and only one, of these fields has as its goal the creation of testable models. And that’s science. The goal of engineering is to make stuff. And I know just enough math to know that there’s no way I know what the goal of mathematics is. The takeaway here is that, no matter how “science-y” they may seem, how enfolded they are into the science mystique, neither math nor engineering is a science. 
  2. There’s an insinuation that “science” =  thinking and “non-science” = NOT thinking.  This is really closely tied in with the idea that you have to be smart to be a scientist. False. Absolutely false. In fact, raw intelligence isn’t even on my list of the top five qualities you need to be a scientist:
    1. Passion. You need to love what you do, because otherwise being in grad school for five to ten years while living under the poverty line and working sixty hour weeks just isn’t worth it.
    2. Dedication. See above.
    3. Creativity. Good scientists ask good questions, and coming up with a good but answerable question that no one has asked before and  that will help shed new light on whatever it is you’re studying takes lateral thinking.
    4. Excellent time management skills. Particularly if you’re working in a university setting. You need to be able to balance research, teaching and service, all while still maintaining a healthy life. It’s hard.
    5.  Intelligibility. A huge part of science is taking very complex concepts and explaining them clearly. To your students. To other scientists. To people on the bus. To people on the internet (Hi guys!). You can have everything else on this list in spades, but if you can’t express your ideas you’re going to sink like a lead duck.
  3. Science is progress! Right? Right? Yes. Absolutely. There is no way in which science has harmed the human race and no way in which things other than science have aided it. It sounds really silly when you just come out and say it, doesn’t it? I mean, we have the knowledge to eradicate polio, but because of social and political factors it hadn’t happened yet. And you can’t solve social problems by just throwing science at them. And then there’s the fact that, while the models themselves maybe morally neutral, the uses to which they are put are not always so. See Einstein and the bomb. See chemical and biological warfare. And, frankly, I think the greatest advances of the 20th century weren’t in science or engineering or technology. They were deep-seated changes in how we, particularly Americans, treated people. My great-grandmother couldn’t go to high school because she was a woman. My mother couldn’t take college-level courses because she was a woman, though she’s currently working on her degree.  Now, I’m a graduate student and my gender is almost completely irrelevant. Segregation is over. Same sex relationships are legally acknowledged by nine states and DC. That’s the progress I would miss most if a weeping angel got me.
  4. Go quantitative or go home.  I’ve noticed a strong bias towards quantitative data, to the point that a lot of people argue that it’s better than qualitative data. I take umbridge at this. Quantitative data is easier, not necessarily better. Easier? Absolutely. It’s easier to get ten people to agree that a banana is ten inches than it does to agree that it’s tasty. And yet, from a practical standpoint, banana growers want to grow tastier bananas, ones that will ship well and sell well, not longer bananas. But it can be hard to plug “banana tastiness” into your mathematical models and measuring “tastiness” leaves you open to criticism that your data collection is biased. (That’s not to say that qualitative data can’t be biased.) This idea that quantitative data is better leads to an overemphasis on the type of questions that can best be answered quantitatively and that’s a problem. This also leads some people to dismiss the “squishy” sciences that use mainly qualitative data and that’s also a problem. All branches of science help us to shed new light on the world and universe around us and to ignore work because it doesn’t fit the science mystique is a grave mistake.

So what can we do to help lessen the effects of these biases? To disentangle the science mystique from the actual science? Well, the best thing we can do is be aware of it. Critically examine the ways the people talk about science. Closely examine your own biases. I, for example, find it far too easy to slip into the “quantitative is better” trap. Notice systematic similarities and question them. Science is, after all, about asking questions.

Why is it so hard for computers to recognize speech?

This is a problem that’s plagued me for quite a while. I’m not a computational linguist  myself, but one of the reasons that theoretical linguistics is important is that it allows us to create robust concpetional models of language… which is basically what voice recognition (or synthesis) programs are. But, you may say to yourself, if it’s your job to create and test robust models, you’re clearly not doing very well. I mean, just listen to this guy. Or this guy. Or this person, whose patience in detailing errors borders on obsession. Or, heck, this person, who isn’t so sure that voice recognition is even a thing we need.

Electronic eye
You mean you wouldn’t want to be able to have pleasant little chats with your computer? I mean, how could that possibly go wrong?
Now, to be fair to linguists, we’ve kinda been out of the loop for a while. Fred Jelinek, a very famous researcher in speech recognition, once said “Every time we fire a phonetician/linguist, the performance of our system goes up”. Oof, right in the career prospects. There was, however, a very good reason for that, and it had to do with the pressures on computer scientists and linguists respectively. (Also a bunch of historical stuff that we’re not going to get into.)

Basically, in the past (and currently to a certain extent) there was this divide in linguistics. Linguists wanted to model speaker’s competence, not their performance. Basically, there’s this idea that there is some sort of place in your brain where you knew all the rules of language and  have them all perfectly mapped out and described. Not in a consious way, but there nonetheless. But somewhere between the magical garden of language and your mouth and/or ears you trip up and mistakes happen. You say a word wrong or mishear it or switch bits around… all sorts of things can go wrong. Plus, of course, even if we don’t make a recognizable mistake, there’s a incredible amount of variation that we can decipher without a problem. That got pushed over to the performance side, though, and wasn’t looked at as much. Linguistics was all about what was happening in the language mind-garden (the competence) and not the messy sorts of things you say in everyday life (the performance). You can also think of it like what celebrities actually say in an interview vs. what gets into the newspaper; all the “um”s and “uh”s are taken out, little stutters or repetitions are erased and if the sentence structure came out a little wonky the reporter pats it back into shape. It was pretty clear what they meant to say, after all.

So you’ve got linguists with their competence models explaining them to the computer folks and computer folks being all clever and mathy and coming up with algorithms that seem to accurately model our knowledge of human linguistic competency… and getting terrible results. Everyone’s working hard and doing their best and it’s just not working.

I think you can probably figure out why: if you’re a computer and just sitting there with very little knowledge of language (consider that this was before any of the big corpora were published, so there wasn’t a whole lot of raw data) and someone hands you a model that’s supposed to handle only perfect data and also actual speech data, which even under ideal conditions is far from perfect, you’re going to spit out spaghetti and call it a day. It’s a bit like telling someone to make you a peanut butter and jelly sandwich and just expecting them to do it. Which is fine if they already know what peanut butter and jelly are, and where you keep the bread, and how to open jars, and that food is something humans eat, so you shouldn’t rub it on anything too covered with bacteria or they’ll get sick and die. Probably not the best way to go about it.

So the linguists got the boot and they and the computational people pretty much did their own things for a bit. The model that most speech recognition programs use today is mostly statistical, based on things like how often a word shows up in whichever corpus they’re using currently. Which works pretty well. In a quiet room. When you speak clearly. And slowly. And don’t use any super-exotic words. And aren’t having a conversation. And have trained the system on your voice. And have enough processing power in whatever device you’re using. And don’t get all wild and crazy with your intonation. See the problem?

Language is incredibly complex and speech recognition technology, particularly when it’s based on a purely statistical model, is not terrific at dealing with all that complexity. Which is not to say that I’m knocking statistical models! Statistical phonology is mind-blowing and I think we in linguistics will get a lot of mileage from it. But there’s a difference. We’re not looking to conserve processing power: we’re looking to model what humans are actually doing. There’s been a shift away from the competency/performance divide (though it does still exist) and more interest in modelling the messy stuff that we actually see: conversational speech, connected speech, variation within speakers. And the models that we come up with are complex. Really complex. People working in Exemplar Theory, for example, have found quite a bit of evidence that you remember everything you’ve ever heard and use all of it to help parse incoming signals. Yeah, it’s crazy. And it’s not something that our current computers can do. Which is fine; it give linguists time to further refine our models. When computers are ready, we will be too, and in the meantime computer people and linguistic people are showing more and more overlap again, and using each other’s work more and more. And, you know, singing Kumbayah and roasting marshmallows together. It’s pretty friendly.

So what’s the take-away? Well, at least for the moment, in order to get speech recognition to a better place than it is now, we need  to build models that work for a system that is less complex than the human brain. Linguistics research, particularly into statistical models, is helping with this. For the future? We need to build systems that are as complex at the human brain. (Bonus: we’ll finally be able to test models of child language acquisition without doing deeply unethical things! Not that we would do deeply unethical things.) Overall, I’m very optimistic that computers will eventually be able to recognize speech as well as humans can.

TL;DR version:

  • Speech recognition has been light on linguists because they weren’t modeling what was useful for computational tasks.
  • Now linguists are building and testing useful models. Yay!
  • Language is super complex and treating it like it’s not will get you hit in the face with an error-ridden fish.
  • Linguists know language is complex and are working diligently at accurately describing how and why. Yay!
  • In order to get perfect speech recognition down, we’re going to need to have computers that are similar to our brains.
  • I’m pretty optimistic that this will happen.



Why is studying linguistics useful? *Is* studying linguistics useful?

So I recently gave a talk at the University of Washington Scholar’s Studio. In it, I covered a couple things that I’ve already talked about here on my blog: the fact that, acoustically speaking, there’s no such thing as a “word” and that our ears can trick us. My general point was that our intuitions about speech, a lot of the things we think seem completely obvious, actually aren’t true at all from an acoustic perspective.

What really got to me, though, was that after I’d finished my talk (and it was super fast, too, only five minutes) someone asked why it mattered. Why should we care that our intuitions don’t match reality? We can still communicate perfectly well. How is linguistics useful, they asked. Why should they care?

I’m sorry, what was it you plan to spend your life studying again? I know you told me last week, but for some reason all I remember you saying is “Blah, blah, giant waste of time.”

It was a good question, and I’m really bummed I didn’t have time to answer it. I sometimes forget, as I’m wading through a hip-deep piles of readings that I need to get to, that it’s not immediately obvious to other people why what I do is important. And it is! If I didn’t believe that, I wouldn’t be in grad school. (It’s certainly not the glamorous easy living and fat salary that keep me here.) It’s important in two main ways. Way one is the way in which it enhances our knowledge and way two is the way that it helps people.

 Increasing our knowledge. Ok, so, a lot of our intuitions are wrong. So what? So a lot of things! If we’re perceiving things that aren’t really there, or not perceiving things that are really there, something weird and interesting is going on. We’re really used to thinking of ourselves as pretty unbiased in our observations. Sure, we can’t hear all the sounds that are made, but we’ve built sensors for that, right? But it’s even more pervasive than that. We only perceive the things that our bodies and sensory organs and brains can perceive, and we really don’t know how all these biological filters work. Well, okay, we do know some things (lots and lots of things about ears, in particular) but there’s a whole lot that we still have left to learn. The list of unanswered questions in linguistics is a little daunting, even just in the sub-sub-field of perceptual phonetics.

Every single one of us uses language every single day. And we know embarrassingly little about how it works. And, what we do know, it’s often hard to share with people who have little background in linguistics. Even here, in my blog, without time restraints and an audience that’s already pretty interested (You guys are awesome!) I often have to gloss over interesting things. Not because I don’t think you’ll understand them, but because I’d metaphorically have to grow a tree, chop it down and spends hours carving it just to make a little step stool so you can get the high-level concept off the shelf and, seriously, who has time for that? Sometimes I really envy scientists in the major disciplines  because everyone already knows the basics of what they study. Imagine that you’re a geneticist, but before you can tell people you look at DNA, you have to convince them that sexual reproduction exists. I dream of the day when every graduating high school senior will know IPA. (That’s the international phonetic alphabet, not the beer.)

Okay, off the soapbox.

Helping people. Linguistics has lots and lots and lots of applications. (I’m just going to talk about my little sub-field here, so know that there’s a lot of stuff being left unsaid.) The biggest problem is that so few people know that linguistics is a thing. We can and want to help!

  • Foreign language teaching. (AKA applied linguistics) This one is a particular pet peeve of mine. How many of you have taken a foreign language class and had the instructor tell you something about a sound in the language, like: “It’s between a “k” and a “g” but more like the “k” except different.” That crap is not helpful. Particularly if the instructor is a native speaker of the language, they’ll often just keep telling you that you’re doing it wrong without offering a concrete way to make it correctly. Fun fact: There is an entire field dedicated to accurately describing the sounds of the world’s languages. One good class on phonetics and suddenly you have a concrete description of what you’re supposed to be doing with your mouth and the tools to tell when you’re doing it wrong. On the plus side, a lot language teachers are starting to incorporate linguistics into their curriculum with good results.
  • Speech recognition and speech synthesis. So this is an area that’s a little more difficult. Most people working on these sorts of projects right now are computational people and not linguists. There is a growing community of people who do both (UW offers a masters degree in computational linguistics that feeds lots of smart people into Seattle companies like Microsoft and Amazon, for example) but there’s definite room for improvement. The main tension is the fact that using linguistic models instead of statistical ones (though some linguistic models are statistical) hugely increases the need for processing power. The benefit is that accuracy  tends to increase. I hope that, as processing power continues to be easier and cheaper to access, more linguistics research will be incorporated into these applications. Fun fact: In computer speech recognition, an 80% comprehension accuracy rate in conversational speech is considered acceptable. In humans, that’s grounds to test for hearing or brain damage.
  • Speech pathology. This is a great field and has made and continues to make extensive use of linguistic research. Speech pathologists help people with speech disorders overcome them, and the majority of speech pathologists have an undergraduate degree in linguistics and a masters in speech pathology. Plus, it’s a fast-growing career field with a good outlook.  Seriously, speech pathology is awesome. Fun fact: Almost half of all speech pathologists work in school environments, helping kids with speech disorders. That’s like the antithesis of a mad scientist, right there.

And that’s why you should care. Linguistics helps us learn about ourselves and help people, and what else could you ask for in a scientific discipline? (Okay, maybe explosions and mutant sharks, but do those things really help humanity?)

Mapping language, language maps

So for some reason, I’ve come across three studies in quick succession based in mapping language. Now, if you know me, you know that nattering on about linguistic methodology is pretty much the Persian cat to my Blofeld, but I really do think that looking at the way that linguists do linguistics is incredibly important. (Warning: the next paragraph will be kinda preachy, feel free to skip it.)

It’s something the field, to paint with an incredibly broad brush, tends to skimp on. After all, we’re asking all these really interesting questions that have the potential to change people’s lives. How is hearing speech different from hearing other things? What causes language pathologies and how can we help correct them? Can we use the voice signal to reliably detect Parkinson’s over the phone? That’s what linguistics is. Who has time to look at whether asking  people to list the date on a survey form affects their responses? If linguists don’t use good, controlled methods to attempt to look at these questions, though, we’ll either find the wrong answers or miss it completely because of some confounding variable we didn’t think about. Believe me, I know firsthand how heart wrenching it is to design an experiment,  run subjects, do your stats and end up with a big pile of useless goo because your methodology wasn’t well thought out. It sucks. And it happens way more than it needs to, mainly because a lot of linguistics programs don’t stress rigorous scientific training.

OK, sermon over. Maps! I think using maps to look at language data is a great methodology! Why?

Hmm… needs more data about language. Also the rest of the continents, but who am I to judge? 
  1.  You get an end product that’s tangible and easy to read and use. People know what maps are and how to use them. Presenting linguistic data as a map rather than, say, a terabyte of detailed surveys or a thousand hours of recordings is a great way to make that same data accessible. Accessible data gets used. And isn’t that kind of the whole point?
  2. Maps are so. accurateright now. This means that maps of data aren’t  just rough approximations, they’re the best, most accurate way to display this information. Seriously, the stuff you can do with GIS is just mind blowing. (Check out this dialect map of the US. If you click on the region you’re most interested, you get additional data like field recordings, along with the precise place they were made. Super useful.)
  3. Maps are fun. Oh, come on, who doesn’t like looking at  maps? Particularly if you’re looking at a region you’re familiar with. See, here’s my high school, and the hay field we rented three years ago. Oh, and there’s my friend’s house! I didn’t realize they were so close to the highway. Add a second layer of information and BOOM, instant learning.

The studies

Two of the studies I came across were actually based on Twitter data. Twitter’s an amazing resource for studying linguistics because you have this enormous data set you can just use without having to get consent forms from every single person. So nice. Plus, because all tweets are archived, in the Library of Congress if nowhere else, other researchers can go back and verify things really easily.

This study looks at how novel slang expressions spread across the US. It hasn’t actually been published yet, so I don’t have the map itself, but they do talk about some interesting tidbits. For example: the places most likely to spawn new successful slang are urban centers with a high African American population.

The second Twitter study is based in London and looked at the different languages Londoners tweet in and did have a map:

Click for link to author’s blog post.

Interesting, huh? You can really get a good idea of the linguistic landscape of London. Although there were some potential methodological problems with this study, I still think it’s a great way to present this data.

The third study I came across is one that’s actually here at the University of Washington. This one is interesting because it kind of goes the other way. Basically, the researchers has respondents indicate areas on a map of Washington where they thought  language communities existed and then had them describe them.  So what you end up with is sort of a representation of the social ideas of what language is like in various parts of Washington state. Like so:

Click for link to study site.

There are lots more interesting maps on the study site, each of which shows some different perception of language use in Washington State. (My favorite is the one that suggests that people think other people who live right next to the Canadian border sound Canadian.)

So these are just a couple of the ways in which people are using maps to look at language data. I hope it’s a trend that continues.

Is linguistics a science?

Short answer: yes. Long answer: the rest of this post. Linguistics is a science; but there are some parts of linguistics that don’t really act like people expect sciences to act, and that tends to confuse people.

Lab coats
Not necessary equipment for linguistics.
Before I go over why linguistics is a science, I think it’s worth saying that I’m not arguing (and I am  arguing; there are linguists who I know personally and by reputation who argue passionately linguistics is not a science) that linguistics is a science because sciences are “better”. I’m arguing because there is an inherent difference between how you do science and how you study the humanities. Your aims are different and what you need to do to accomplish those aims are different. I’m arguing that the ultimate aims of linguistics are science-type and not humanities-type or plant-typeand therefore our methodology should match those aims.

Continue reading “Is linguistics a science?”