Book Review: Punctuation..?

So the good folks over at Userdesign asked me to review their newest volume, Punctuation..? and I was happy to oblige. Linguists rarely study punctuation (it falls under the sub-field orthography, or the study of writing systems) but what we do study is the way that language attitudes and punctuation come together. I’ve written before about language attitudes when it come to grammar instruction and the strong prescriptive attitudes of most grammar instruction books. What makes this book so interesting is that it is partly prescriptive and partly descriptive. Since a descriptive bent in a grammar instruction manual is rare, I thought I’d delve into that a bit.

User_design_Books_Punctuation_w_cover

Image copyright Userdesign, used with permission. (Click for link to site.)

So, first of all, how about a quick review of the difference between a descriptive and prescriptive approach to language?

  • Descriptive: This is what linguists do. We don’t make value or moral judgments about languages or language use, we just say what’s going on as best we can. You can think of it like an anthropological ethnography: we just describe what’s going on. 
  • Prescriptive: This is what people who write letters to the Times do. They have a very clear idea of what’s “right” and “wrong” with regards to language use and are all to happy to tell you about it. You can think of this like a manner book: it tells you what the author thinks you should be doing. 

As a linguist, my relationship with language is mainly scientific, so I have a clear preference for a descriptive stance. An ichthyologist doesn’t tell octopi, “No, no, no, you’re doing it all wrong!” after all. At the same time, I live in a culture which has very rigid expectations for how an educated individual should write and sound, and if I want to be seen as an educated individual (and be considered for the types of jobs only open to educated individuals) you better believe I’m going to adhere to those societal standards. The problem comes when people have a purely prescriptive idea of what grammar is and what it should be. That can lead to nasty things like linguistic discrimination. I.e., language B (and thus all those individuals who speak language B) is clearly inferior to language A because they don’t do things properly. Since I think we can all agree that unfounded discrimination of this type is bad, you can see why linguists try their hardest to avoid value judgments of languages.

As I mentioned before, this book is a fascinating mix of prescriptive and descriptive snippets. For example, the author says this about exclamation points: “In everyday writing, the exclamation mark is often overused in the belief that it adds drama and excitement. It is, perhaps  the punctuation mark that should be used with the most restraint” (p 19). Did you notice that “should’”? Classic marker of a prescriptivist claiming their territory. But then you have this about Guillements: “Guillements are used in several languages to indicate passages of speech in the same way that single and double quotation marks (” “”) are used in the English language” (p. 22). (Guillements look like this, since I know you were wondering;  « and ». ) See, that’s a classical description of what a language does, along with parallels drawn to another, related, languages. It may not seem like much, but try to find a comparably descriptive stance in pretty much any widely-distributed grammar manual. And if you do, let me know so that I can go buy a copy of it. It’s change, and it’s positive change, and I’m a fan of it. Is this an indication of a sea-change in grammar manuals? I don’t know, but I certainly hope so.

Over all, I found this book fascinating (though not, perhaps, for the reasons the author intended!). Particularly because it seems to stand in contrast to the division that I just spent this whole post building up. It’s always interesting to see the ways that stances towards language can bleed and melt together, for all that linguists (and I include myself here) try to show that there’s a nice, neat dividing line between the evil, scheming prescriptivists and the descriptivists in their shining armor here to bring a veneer of scientific detachment to our relationship with language. Those attitudes can and do co-exist. Data is messy.  Language is complex. Simple stories (no matter how pretty we might think them) are suspicious. But these distinctions can be useful, and I’m willing to stand by the descriptivist/prescriptivist, even if it’s harder than you might think to put people in one camp or the others.

But beyond being an interesting study in language attitdues, it was a fun read. I learned lots of neat little factoids, which is always a source of pure joy for me. (Did you know that this symbol:  is called a Pilcrow? I know right? I had no idea either; I always just called it the paragraph mark.)

About these ads

Why is it hard to model speech perception?

So this is a kick-off post for a series of posts about various speech perception models. Speech perception models, you ask? Like, attractive people who are good at listening?

Romantic fashion model

Not only can she discriminate velar, uvular and pharyngeal fricatives with 100% accuracy, but she can also do it in heels.

No, not really. (I wish that was a job…) I’m talking about a scientific model of how humans perceive speech sounds. If you’ve ever taken an introductory science class, you already have some experience with scientific models. All of Newton’s equations are just a way of generalizing general principals generally across many observed cases. A good model has both explanatory and predictive power. So if I say, for example, that force equals mass times acceleration, then that should fit with any data I’ve already observed as well as accurately describe new observations. Yeah, yeah, you’re saying to yourself, I learned all this in elementary school. Why are you still going on about it? Because I really want you to appreciate how complex this problem is.

Let’s take an example from an easier field, say, classical mechanics. (No offense physicists, but y’all know it’s true.) Imagine we want to model something relatively simple. Perhaps we want to know whether a squirrel who’s jumping from one tree to another is going to make. What do we need to know? And none of that “assume the squirrel is a sphere and there’s no air resistance” stuff, let’s get down to the nitty-gritty. We need to know the force and direction of the jump, the locations of the trees, how close the squirrel needs to get to be able to hold on, what the wind’s doing, air resistance and how that will interplay with the shape of the squirrel, the effects of gravity… am I missing anything? I feel like I might be, but that’s most of it.

So, do you notice something that all of these things we need to know the values of have in common? Yeah, that’s right, they’re easy to measure directly. Need to know what the wind’s doing? Grab your anemometer. Gravity? To the accelerometer closet! How far apart the trees are? It’s yardstick time. We need a value , we measure a value, we develop a model with good predictive and explanatory power (You’ll need to wait for your simulations to run on your department’s cluster. But here’s one I made earlier so you can see what it looks like. Mmmm, delicious!) and you clean up playing the numbers on the professional squirrel-jumping circuit.

Let’s take a similarly simple problem from the field of linguistics. You take a person, sit them down in a nice anechoic chamber*, plop some high quality earphones on them and play a word that could be “bite” and could be “bike” and ask them to tell you what they heard. What do you need to know to decide which way they’ll go? Well, assuming that your stimuli is actually 100% ambiguous (which is a little unlikely) there a ton of factors you’ll need to take into account. Like, how recently and often has the subject heard each of the words before? (Priming and frequency effects.) Are there any social factors which might affect their choice? (Maybe one of the participant’s friends has a severe overbite, so they just avoid the word “bite” all together.) Are they hungry? (If so, they’ll probably go for “bite” over “bike”.) And all of that assumes that they’re a native English speaker with no hearing loss or speech pathologies and that the person’s voice is the same as theirs in terms of dialect, because all of that’ll bias the  listener as well.

The best part? All of this is incredibly hard to measure. In a lot of ways, human language processing is a black box. We can’t mess with the system too much and taking it apart to see how it works, in addition to being deeply unethical, breaks the system. The best we can do is tap a hammer lightly against the side and use the sounds of the echos to guess what’s inside. And, no, brain imaging is not a magic bullet for this.  It’s certainly a valuable tool that has led to a lot of insights, but in addition to being incredibly expensive (MRI is easily more than a grand per participant and no one has ever accused linguistics of being a field that rolls around in money like a dog in fresh-cut grass) we really need to resist the urge to rely too heavily on brain imaging studies, as a certain dead salmon taught us.

But! Even though it is deeply difficult to model, there has been a lot of really good work done on towards a theory of speech perception. I’m going to introduce you to some of the main players, including:

  • Motor theory
  • Acoustic/auditory theory
  • Double-weak theory
  • Episodic theories (including Exemplar theory!)

Don’t worry if those all look like menu options in an Ethiopian restaurant (and you with your Amharic phrasebook at home, drat it all); we’ll work through them together.  Get ready for some mind-bending, cutting-edge stuff in the coming weeks. It’s going to be [fʌn] and [fʌnetɪk]. :D

*Anechoic chambers are the real chambers of secrets.

Why do I really, really love West African languages?

So I found a wonderful free app that lets you learn Yoruba, or at least Yoruba words,  and posted about it on Google plus. Someone asked a very good question: why am I interested in Yoruba? Well, I’m not interested just in Yoruba. In fact, I would love to learn pretty much any western African language or, to be a little more precise, any Niger-Congo language.

Niger-Congo-en

This map’s color choices make it look like a chocolate-covered ice cream cone.

Why? Well, not to put too fine a point on it, I’ve got a huge language crush on them. Whoa there, you might be thinking, you’re a linguist. You’re not supposed to make value judgments on languages. Isn’t there like a linguist code of ethics or something? Well, not really, but you are right. Linguists don’t usually make value judgments on languages. That doesn’t mean we can’t play favorites!  And West African languages are my favorites. Why? Because they’re really phonologically and phonetically interesting. I find the sounds and sound systems of these languages rich and full of fascinating effects and processes. Since that’s what I study within linguistics, it makes sense that that’s a quality I really admire in a language.

What are a few examples of Niger-Congo sound systems that are just mind blowing? I’m glad you asked.

  • Yoruba: Yoruba has twelve vowels. Seven of them are pretty common (we have all but one in American English) but if you say four of them nasally, they’re different vowels. And if you say a nasal vowel when you’re not supposed to, it’ll change the entire meaning of a word. Plus? They don’t have a ‘p’ or an ‘n’ sound. That is crazy sauce! Those are some of the most widely-used sounds in human language. And Yoruba has a complex tone system as well. You probably have some idea of the level of complexity that can add to a sound system if you’ve ever studied Mandarin, or another East Asian language. Seriously, their sound system makes English look childishly simplistic.
  • Akan: There are several different dialects of Akan, so I’ll just stick to talking about Asante, which is the one used in universities and for official business. It’s got a crazy consonant system. Remember how  Yoruba didn’t have an “n” sound? Yeah, in Akan they have nine. To an English speaker they all  pretty much sound the same, but if you grew up speaking Akan you’d be able to tell the difference easily. Plus, most sounds other than “p”, “b”, “f” or “m” can be made while rounding the lips (linguists call this “labialized” and are completely different sounds). They’ve also got a vowel harmony system, which means you can’t have vowels later in a word that are completely different from vowels earlier in the word. Oh, yeah, and tones and a vowel nasalization distinction and some really cool tone terracing. I know, right? It’s like being a kid in a candy store.

But how did these language get so cool? Well, there’s some evidence that these languages have really robust and complex sound systems because the people speaking them never underwent large-scale migration to another Continent. (Obviously, I can’t ignore the effects of colonialism or the slave trade, but it’s still pretty robust.) Which is not to say that, say, Native American languages don’t have awesome sound systems; just just tend to be slightly smaller on average.

Now that you know how kick-ass these languages, I’m sure you’re chomping at the bit to hear some of them. Your wish is my command; here’s a song in Twi (a dialect of Akan) from one of my all-time-favorite musicians: Sarkodie. (He’s making fun of Ghanaian emigrants who forget their roots. Does it get any better than biting social commentary set to a sick beat?)

Meme Grammar

So the goal of linguistics is to find and describe the systematic ways in which humans use language. And boy howdy do we humans love using language systematically. A great example of this is internet memes.

What are internet memes? Well, let’s start with the idea of a “meme”. “Memes” were posited by Richard Dawkin in his book The Selfish Gene. He used the term to describe cultural ideas that are transmitted from individual to individual much like a virus or bacteria. The science mystique I’ve written about is a great example of a meme of this type. If you have fifteen minutes, I suggest Dan Dennett’s TED talk on the subject of memes as a much more thorough introduction.

So what about the internet part? Well, internet memes tend to be a bit narrower in their scope. Viral videos, for example, seem to be a separate category from intent memes even though they clearly fit into Dawkin’s idea of what a meme is. Generally, “internet meme” refers to a specific image and text that is associated with that image. These are generally called image macros. (For a through analysis of emerging and successful internet memes, as well as an excellent object lesson in why you shouldn’t scroll down to read the comments, I suggest Know Your Meme.) It’s the text that I’m particularly interested in here.

Memes which involve language require that it be used in a very specific way, and failure to obey these rules results in social consequences. In order to keep this post a manageable size, I’m just going to look at the use of language in the two most popular image memes, as ranked by memegenerator.net, though there is a lot more to study here. (I think a study of the differing uses of the initialisms MRW [my reaction when]  and MFW [my face when] on imgur and 4chan would show some very interesting patterns in the construction of identity in the two communities. Particularly since the 4chan community is made up of anonymous individuals and the imgur community is made up of named individuals who are attempting to gain status through points. But that’s a discussion for another day…)

The God tier (i.e. most popular) characters at on the website Meme Generator as of February 23rd, 2013. Click for link to site.

The God tier (i.e. most popular) characters at on the website Meme Generator as of February 23rd, 2013. Click for link to site. If you don’t recognize all of these characters, congratulations on not spending all your free time on the internet.

Without further ado, let’s get to the grammar. (I know y’all are excited.)

Y U No

This meme is particularly interesting because its page on Meme Generator already has a grammatical description.

The Y U No meme actually began as Y U No Guy but eventually evolved into simply Y U No, the phrase being generally followed by some often ridiculous suggestion. Originally, the face of Y U No guy was taken from Japanese cartoon Gantz’ Chapter 55: Naked King, edited, and placed on a pink wallpaper. The text for the item reads “I TXT U … Y U NO TXTBAK?!” It appeared as a Tumblr file, garnering over 10,000 likes and reblogs.

It went totally viral, and has morphed into hundreds of different forms with a similar theme. When it was uploaded to MemeGenerator in a format that was editable, it really took off. The formula used was : “(X, subject noun), [WH]Y [YO]U NO (Y, verb)?”[Bold mine.]

A pretty good try, but it can definitely be improved upon. There are always two distinct groupings of text in this meme, always in impact font, white with a black border and in all caps. This is pretty consistent across all image macros. In order to indicate the break between the two text chunks, I will use – throughout this post. The chunk of text that appears above the image is a noun phrase that directly addresses someone or something, often a famous individual or corporation. The bottom text starts with “Y U NO” and finishes with a verb phrase. The verb phrase is an activity or action that the addressee from the first block of text could or should have done, and that the meme creator considers positive. It is also inflected as if “Y U NO” were structurally equivalent to “Why didn’t you”. So, since you would ask Steve Jobs “Why didn’t you donate more money to charity?”, a grammatical meme to that effect would be “STEVE JOBS — Y U NO DONATE MORE MONEY TO CHARITY”. In effect, this meme questions someone or thing who had the agency to do something positive why they chose not to do that thing. While this certainly has the potential to be a vehicle for social commentary, like most memes it’s mostly used for comedic effect. Finally, there is some variation in the punctuation of this meme. While no punctuation is the most common, an exclamation points, a question mark or both are all used. I would hypothesize that the the use of punctuation varies between internet communities… but I don’t really have the time or space to get into that here.

A meme (created by me using Meme Generator) following the guidelines outlined above.

Futurama Fry

This meme also has a brief grammatical analysis

The text surrounding the meme picture, as with other memes, follows a set formula. This phrasal template goes as follows: “Not sure if (insert thing)”, with the bottom line then reading “or just (other thing)”. It was first utilized in another meme entitled “I see what you did there”, where Fry is shown in two panels, with the first one with him in a wide-eyed expression of surprise, and the second one with the familiar half-lidded expression.

As an example of the phrasal template, Futurama Fry can be seen saying: “Not sure if just smart …. Or British”. Another example would be “Not sure if highbeams … or just bright headlights”. The main form of the meme seems to be with the text “Not sure if trolling or just stupid”.

This meme is particularly interesting because there seems to an extremely rigid syntactic structure. The phrase follow the form “NOT SURE IF _____ — OR _____”. The first blank can either be filled by a complete sentence or a subject complement while the second blank must be filled by a subject complement. Subject complements, also called predicates (But only by linguists; if you learned about predicates in school it’s probably something different. A subject complement is more like a predicate adjective or predicate noun.), are everything that can come after a form of the verb “to be” in a sentence. So, in a sentence like “It is raining”, “raining” is the subject complement. So, for the Futurama Fry meme, if you wanted to indicate that you were uncertain whther it was raining or sleeting, both of these forms would be correct:

  • NOT SURE IF IT’S RAINING — OR SLEETING
  • NOT SURE IF RAINING — OR SLEETING

Note that, if a complete sentence is used and abbreviation is possible, it must be abbreviated. Thus the following sentence is not a good Futurama Fry sentence:

  • *NOT SURE IF IT IS RAINING — OR SLEETING

This is particularly interesting  because the “phrasal template” description does not include this distinction, but it is quite robust. This is a great example of how humans notice and perpetuate linguistic patterns that they aren’t necessarily aware of.

A meme (created by me using Meme Generator) following the guidelines outlined above. If you’re not sure whether it’s phonetics or phonology, may I recommend this post as a quick refresher?

So this is obviously very interesting to a linguist, since we’re really interested in extracting and distilling those patterns. But why is this useful/interesting to those of you who aren’t linguists? A couple of reasons.

  1. I hope you find it at least a little interesting and that it helps to enrich your knowledge of your experience as a human. Our capacity for patterning is so robust that it affects almost every aspect of our existence and yet it’s easy to forget that, to let our awareness of that slip our of our conscious minds. Some patterns deserve to be examined and criticized, though, and  linguistics provides an excellent low-risk training ground for that kind of analysis.
  2. If you are involved in internet communities I hope you can use this new knowledge to avoid the social consequences of violating meme grammars. These consequences can range from a gentle reprimand to mockery and scorn The gatekeepers of internet culture are many, vigilant and vicious.
  3. As with much linguistic inquiry, accurately noting and describing these patterns is the first step towards being able to use them in a useful way. I can think of many uses, for example, of a program that did large-scale sentiment analyses of image macros but was able to determine which were grammatical (and therefore more likely to be accepted and propagated by internet communities) and which were not.

What’s the best way to teach grammar?

The night before last I had the good fortune to see Goeff Pullum, noted linguist and linguistics blogger, give a talk entitled: The scandal of English grammar teaching: Ignorance of grammar, damage to writing skills, and what we can do about it. It was an engaging talk and clearly showed that the basis for many of the “grammar rules” that are taught in English language and composition courses have little to no bearing on how the English language is actually used. Some of the bogeyman rules (his term) that he lambasted included the interdiction against ending a sentence in a preposition, the notion that “since” can only to refer to the passage of time and not causality and the claim that only “which” can begin a restrictive clause. Counterexamples for all of these “grammar rules” are easy to find, both in written and spoken language. (If you’re interested in learning more, check out Geoff Pullum on Language Log.)

Evaluarán las distintas estrategias para enseñar a leer en los establecimientos subvencionados chilenos

“And then they python ate little Johnny because he had the gall to cheekily split his infinitives.”

So there’s a clear problem here. Rules that have no bearing on linguistic reality are being used as the backbone of grammar instruction, just as they have for over two hundred years. Meanwhile, the investigation of human language has advanced considerably. We know much more about the structure of language now than we did when E. B. White was writing his grammar guide. It’s linguistic inquiry that has lead to better speech therapy, speech recognition and synthesis programs and better foreign language teaching. Grammar, on the other hand, has led to little more than frustration and an unsettling elitism. (We all know at least one person who uses their ”knowledge” of “correct” usage as a weapon.) So what can be done about it? Well, I propose that instead of traditional “grammar”, we teach “grammar” as linguists understand it. What’s the difference?

Traditional grammar: A variety of usage and style rules that are based on social norms and a series of historic accidents.

Linguistic grammar: The set of rules which can accurately discribe a native speaker’s knowaldge of their language.

I’m not the first person to suggest a linguistics education as a valuable addition to the pre-higher educational experience. You can read proposals and arguments from others herehere, and here, and an argument for more linguistics in higher education here.

So, why would you want to teach linguistic grammar? After all, by the time you’re five or six, you already have a pretty good grasp of your language. (Not a perfect one, as it turns out; things like the role of stress in determining the relationship between words in a phrase tend to come in pretty late in life.) Well, there are lots of reasons.

  • Linguistic grammar is the result of scientific inquiry and is empirically verifiable. This means that lessons on linguistic grammar can take the form of experiments and labs rather than memorizing random rules.
  • Linguistic grammar is systematic. This can appeal to students who are gifted at math and science but find studying language more difficult.
  • Linguistic grammar is a good way to gently introduce higher level mathematics. Semantics, for example, is a good way to introduce set theory or lambda calculus.
  • Linguistic grammar is immediately applicable for students. While it’s difficult to find applications for oceanology for students who live in Kansas, everyone uses language every day, giving students a multitude of opportunities to apply and observe what they’re learned.
  • Linguistic grammar shows that variation between different languages and dialects is systematic, logical and natural. This can help reduce the linguistic prejudice that speakers of certain languages or dialects face.
  • Linguistic grammar helps students in learning foreign languages.  For example, by increasing students’ phonetic awareness (that’s their awareness of language sounds) and teaching them how to accurately describe and produce sounds, we can avoid the frustration of not knowing what sound they’re attempting to produce and its relation to sounds they already know.
  • Knowledge of linguistic grammar, unlike traditional grammar, is relatively simple to evaluate. Since much of introductory linguistics consists of looking at data sets and constructing rules that would generate that data set, and these rules are either correct or not, it is easier to determine whether or not the student has mastered the concepts.

I could go on, but I think I’ll leave it here for now. The main point is this: teaching linguistics is a viable and valuable way to replace traditional grammar education. What needs to happen for linguistic grammar to supplant traditional grammar? That’s a little thornier. At the very least, teachers need to receive linguistic training and course materials appropriate  for various ages need to be developed. A bigger problem, though, is a general lack of public knowledge about linguistics. That’s part of why I write this blog; to let you know about what’s going on in a small but very productive field. Linguistics has a lot to offer, and I hope that in the future more and more people will take us up on it.

 

Ask vs. Aks: Let me axe you a question

Do you know which one of these forms is the correct one? You sure about that?

Four things are inevitable: death, taxes, the eventual heat-death of the universe, and language change. All (living) languages are constantly in a state of flux, at all levels of the linguistic system. Meanings change, new structures come into being and old ones die out, words are born and die and pronunciations change. And no one, it seems, is happy about it. New linguistic forms tend to be the source of endless vitriol and argument, and language users love constructing rules that have more to do with social norms than linguistic reality. Rules that linguists create, which attempt to model the way language is used, are called “descriptive”, while rules that non-linguists create, which attempt to suggest how they believe language should be used, are called “prescriptive”. I’m not going to talk that much more about it here; if you’re interested, Language Log and Language Hippie both discuss the issue at length. The reason that I bring this up is that prescriptive rules tend to favor older forms. (An occasionally forms from other languages. That whole “don’t split an infinitive” thing? Based on Latin. English speakers have been happily splitting infinitives since the 13th century, and I imagine we’ll continue to boldly split them for centuries to come.) There is, however, one glaring exception: the whole [ask] vs. [aks] debate.

Axt zum spalten

In a way, it’s kinda like Theseus’ paradox or Abe Lincoln’s axe. If you replace all the sounds in a word one by one, it is the same word at the end of the process as it was in the beginning?

Historically, it’s [aks], the homophone of the chopping tool pictured above, that has precedence. Let’s take a look at the Oxford English Dictionary’s take on the history of the word, shall we?

The original long á gave regularly the Middle English (Kentish) ōxi ; but elsewhere was shortened before the two consonants, giving Middle English a , and, in some dialects, e . The result of these vowel changes, and of the Old English metathesis asc- , acs- , was that Middle English had the types ōx , ax , ex , ask , esk , ash , esh , ass , ess . The true representative of the orig. áscian was the s.w. and w.midl. ash , esh , also written esse (compare æsce ash n.1, wæsc(e)an wash n.), now quite lost. Acsian, axian, survived inax, down to nearly 1600 the regular literary form, and still used everywhere in midl. and southern dialects, though supplanted in standard English by ask, originally the northern form. Already in 15th cent. the latter was reduced dialectally to asse, past tense ast, still current dialectally.*

So, [aks] was the regular literary form (i.e. the one you would have been taught to say in school if you were lucky enough to have gone to school) until the 1600 or so? Ok, so, if older forms are better, than that should be the “right” one. Right? Well, let’s see what Urban Dictionary has to say on the matter, since that tends to be a  pretty good litmus test of language attitudes.

“What retards say when they don’t know how to pronounce the word ask.” — User marcotte on Urban Dictionary, top definition

Oh. Sorry, Chaucer, but I’m going to have to inform you that you were a retard who didn’t know how to pronounce the word ask. Let’s unpack what’s going on here a little bit, shall we? There’s clearly a disconnect between the linguistic facts and language attitudes.

  • Facts: these two forms have both existed for centuries, and [aks] was considered the “correct” form for much of that time.
  • Language attitude: [aks] is not only “wrong”, it reflects negatively on those people who use it, making them sound less intelligent and less educated.

This is probably (at least in America) tangled in with the fact that [aks] is a marker of African American English. Even within the African American community, the form is stigmatized. Oprah, for example, who often uses markers of African American English (especially when speaking with other African Americans) almost never uses [aks] for [ask]. So the idea that [aks] is the wrong form and that [ask] is correct is based on a social construction of how an intelligent, educated individual should speak. It has nothing to do with the linguistic qualities of the word itself. (For a really interesting discussion of how knowledge of linguistic forms is acquired by children and the relationship between that and animated films, see Lippi-Green’s chapter “Teaching children to discriminate” from English with an Accent: Language  ideology and discrimination in the United States here.)

Now, the interesting thing about these forms is that they both have phonological pressures pushing English speakers towards using them. That’s because [s] has a special place in English phonotactics. In general, you want the sounds that are the most sonorant nearer the center of a syllable. And [s] is more sonorant than [k], so it seems like [ask] should be the favored form. But, like I said, [s] is special. In “special”, for example, it comes at the very beginning of the word, before the less-sonorant [p]. And all the really long syllables in English, like “strengths”, have [s] on the end. So the special status of [s] seems to favor [aks]. The fact that each form can be modeled perfectly well based on our knowledge of the way English words are formed helps to explain why both forms continue to be actively used, even centuries after they emerged. And, who knows? We might decide that [aks] is the “correct” form again in another hundred years or so. Try and keep that in mind the next time you talk about the right and wrong ways to say something.

* “ask, v.”. OED Online. December 2012. Oxford University Press. 12 February 2013 <http://www.oed.com.offcampus.lib.washington.edu/view/Entry/11507&gt;.

Of cups, mugs, glasses and semantic drift

One of the more interesting little sub-fields in linguistics is diachronic semantics. That’s the study of how word meanings change over time. Some of these changes are relatively easy to track. A “mouse” to a farmer in 1900 was a small rodent with unfortunate grain-pilfering proclivities. To a farmer today, it’s also one of the tools she uses to interact with her computer. The word has gained a new semantic sense without losing it’s original meaning. Sometimes, however,  you have a weird little dance where a couple of words are negotiating over the same semantic space–that’s another way of saying a related group of concepts that a language groups together–and that’s where things get interesting. “Cup”, “mug” and “glass” are engaged in that little dance-off right now (at least in American English). Let’s see how they’re doing, shall we?

Glasses 800 edit

Cup? Glass? Jug? Mug? Why don’t we just call them all “drinking vessels” and be done with it?

Cup: Ok, quick question for you: does a cup have to have a handle? The Oxford dictionaries say “yes“, but I really think that’s out of date at this point. Dr. Reed pointed out that this was part of her criteria for whether something could be called a “cup” or not, but that a lot of younger speakers no longer make that distinction. In fact, recently I noticed that someone of my acquaintance uses “cup” to refer only to disposable cups. Cup also has the distinct advantage of being part of a lot of phrases: World cup, Stanley cup, cup of coffee, teacup, cuppa, cup of sugar, in your cups, and others that I can’t think of right now.

So “cup” is doing really well, and gaining semantic ground.

Glass: Glass, on the other hand, isn’t doing as well. I haven’t yet talked to someone who can use “glass” to refer to drinking vessels that aren’t actually made of glass including, perhaps a little oddly, clear disposable cups. On the other hand, there are some types of drinking vessels that I can only refer to as glasses. Mainly those for specific types of alcohol: wine glass, shot glass, martini glass, highball glass (though I’ve heard people referring to the glass itself just as a highball, so this might be on the way out). There are alcohol-specific pieces of glassware that don’t count as glasses though–e.g. champagne flute, brandy snifter–so it’s not a categorical distinction by any means.

“Glass” seems to be pretty stable, but if “cup” continues to become broader and broader it might find itself on the outs.

Mug: I don’t have as much observational data on this one, but there seems to be another shift going on here. “Mug” originally referred only to drinking vessels that were larger than cups (see below), and still had handles.

Mugs2000ppx

Note that the smaller ones on top are “cups” and the larger ones on the bottom are labelled as “mugs”.

Most people call those insulated drinking vessels with the attached lids “travel mugs” rather than “travel cups” (640,000 Google hits vs. 22,400) but I find myself calling them “cups” instead. I think it’s because 1) I pattern it with disposable coffee cups and 2) I find handledness is a necessary quality for mugs. I can call all of the drinking vessels in the picture above “mugs” and prefer “mug” to “cup”.

So, at least for me, “mug” is beginning to take over the semantic space allotted to “cup” by older speakers.

Of course, this is a very cursory, impressionistic snapshot of the current state of the semantic space. Without more robust data I’m hesitant to make concrete predictions about the ways in which these terms are negotiating their semantic space, but there’s definitely some sort of drift going on.

Hard science vs. soft science and the science mystique

So, recently I’ve been doing a lot of thinking and reading about what it means to do science, what science entails  and what is (and is not) science. Partly, this was sparked by the  fact that, at a recent middle school science education event, I was asked more than once why linguistics counted as a science. This intrigued me, as no one at the Lego robots display next to us had their discipline’s qualifications questioned, despite the fact that engineering is not scientific. Rigorous, yes. Scientific, no.

Science01science

Hmm, I dunno. Looks science-y, but I don’t see any lab coats. Or goggles. There should definitely be more goggles.

This subject is particularly near and dear to me because my own research looks into, among other things, how the ways in which linguists gather data affect the data they gather and the potential for systematic bias that introduces. In order to look at how we do things, I also need to know why. And that’s where this discussion of science comes in. This can be a hard discussion to have, however, since conversations about what science is, or should be, tends to get muddied by the popular conception of science. I’m not saying people don’t know what science is, ’cause I think most people do, just that we (and I include myself in that) also have a whole bucketful of other socially-motivated ideas that we tend to lump in with science.

I’m going to call the social stuff that we’ve learned to associate with science The Science Mystique. I’m not the first person to call it that, but I think it’s fitting. (Note that if you’re looking for the science of Mystique, you’ll need to look elsewhere.) To start in our exploration of the Science Mystique, let’s start with a quote from another popular science writer, Phil Plait.

They [the scientists who made the discoveries discussed earlier in the speech] used physics. They used math. They used chemistry, biology, astronomy, engineering.

They used science.

These are all the things you discovered doing your projects. All the things that brought you here today.

Computers? Cell phones? Rockets to Saturn, probes to the ocean floor, PSP, gamecubes, gameboys, X-boxes? All by scientists.

Those places I talked about before? You can get to know them too. You can experience the wonder of seeing them for the first time, the thrill of discovery, the incredible, visceral feeling of doing something no one has ever done before, seen things no one has seen before, know something no one else has ever known.

No crystal balls, no tarot cards, no horoscopes. Just you, your brain, and your ability to think.

Welcome to science. You’re gonna like it here.

Inspirational! Science-y! Misleading! Wait, what?

So there are a couple things here that I find really troubling, and I’m just going to break them down and go though them one by one. These are things that are part of the science mystique, that permeate our cultural conception of what science is, and I’ve encountered them over and over and over again. I’m just picking on this particular speech because it’s been slathered all over the internet lately and I’ve encountered a lot of people who really resonated with its message.

  1. Science and engineering and math are treated as basically the same thing.  This. This is one of my biggest pet peeves when it comes to talking about science. Yes, I know that STEM fields (that’s Science, Technology, Engineering and Mathematics) are often lumped together. Yes, I know that there’s a lot of cross-pollination. But one, and only one, of these fields has as its goal the creation of testable models. And that’s science. The goal of engineering is to make stuff. And I know just enough math to know that there’s no way I know what the goal of mathematics is. The takeaway here is that, no matter how “science-y” they may seem, how enfolded they are into the science mystique, neither math nor engineering is a science. 
  2. There’s an insinuation that “science” =  thinking and “non-science” = NOT thinking.  This is really closely tied in with the idea that you have to be smart to be a scientist. False. Absolutely false. In fact, raw intelligence isn’t even on my list of the top five qualities you need to be a scientist:
    1. Passion. You need to love what you do, because otherwise being in grad school for five to ten years while living under the poverty line and working sixty hour weeks just isn’t worth it.
    2. Dedication. See above.
    3. Creativity. Good scientists ask good questions, and coming up with a good but answerable question that no one has asked before and  that will help shed new light on whatever it is you’re studying takes lateral thinking.
    4. Excellent time management skills. Particularly if you’re working in a university setting. You need to be able to balance research, teaching and service, all while still maintaining a healthy life. It’s hard.
    5.  Intelligibility. A huge part of science is taking very complex concepts and explaining them clearly. To your students. To other scientists. To people on the bus. To people on the internet (Hi guys!). You can have everything else on this list in spades, but if you can’t express your ideas you’re going to sink like a lead duck.
  3. Science is progress! Right? Right? Yes. Absolutely. There is no way in which science has harmed the human race and no way in which things other than science have aided it. It sounds really silly when you just come out and say it, doesn’t it? I mean, we have the knowledge to eradicate polio, but because of social and political factors it hadn’t happened yet. And you can’t solve social problems by just throwing science at them. And then there’s the fact that, while the models themselves maybe morally neutral, the uses to which they are put are not always so. See Einstein and the bomb. See chemical and biological warfare. And, frankly, I think the greatest advances of the 20th century weren’t in science or engineering or technology. They were deep-seated changes in how we, particularly Americans, treated people. My great-grandmother couldn’t go to high school because she was a woman. My mother couldn’t take college-level courses because she was a woman, though she’s currently working on her degree.  Now, I’m a graduate student and my gender is almost completely irrelevant. Segregation is over. Same sex relationships are legally acknowledged by nine states and DC. That’s the progress I would miss most if a weeping angel got me.
  4. Go quantitative or go home.  I’ve noticed a strong bias towards quantitative data, to the point that a lot of people argue that it’s better than qualitative data. I take umbridge at this. Quantitative data is easier, not necessarily better. Easier? Absolutely. It’s easier to get ten people to agree that a banana is ten inches than it does to agree that it’s tasty. And yet, from a practical standpoint, banana growers want to grow tastier bananas, ones that will ship well and sell well, not longer bananas. But it can be hard to plug “banana tastiness” into your mathematical models and measuring “tastiness” leaves you open to criticism that your data collection is biased. (That’s not to say that qualitative data can’t be biased.) This idea that quantitative data is better leads to an overemphasis on the type of questions that can best be answered quantitatively and that’s a problem. This also leads some people to dismiss the “squishy” sciences that use mainly qualitative data and that’s also a problem. All branches of science help us to shed new light on the world and universe around us and to ignore work because it doesn’t fit the science mystique is a grave mistake.

So what can we do to help lessen the effects of these biases? To disentangle the science mystique from the actual science? Well, the best thing we can do is be aware of it. Critically examine the ways the people talk about science. Closely examine your own biases. I, for example, find it far too easy to slip into the “quantitative is better” trap. Notice systematic similarities and question them. Science is, after all, about asking questions.

Soda vs. Pop vs. Coke … Which is right?

Short answer: they’re all correct (at least in the United States) but some are more common in certain dialectal areas. Here’s a handy-dandy map, in case you were wondering:

Maps! Language! Still one of my favorite combinations. This particular map, and the data collection it’s based on is courtesy of popvssoda.com. Click picture for link and all the lovely statistics. (You do like statistics, right?)

Long answer: I’m going to sort this into reactions I tend to get after answering questions like this one.

What  do you mean they’re all correct? Coke/Soda/Pop is clearly wrong. Ok, I’ll admit, there are certain situations when you might need to choose to use one over the other. Say, if you’re writing for a newspaper with a very strict style guide. But otherwise, I’m sticking by my guns here: they’re all correct. How do I know? Because each of them in is current usage, and there is a dialectal group where it is the preferred term. Linguistics (at least the type of linguistics that studies dialectal variation) is all about describing what people actually say and people actually say all three.

But why doesn’t everyone just say the same thing? Wouldn’t that be easier? Easier to understand? Probably, yes. But people use different words for the same thing for the same reasons that they speak different languages. In a very, very simplified way, it kinda works like this:

  • You tend to speak like the people that you spend time with. That makes it easier for you to understand each other and lets other people in your social group know that you’re all members of the same group. Like team jerseys.
  • Over time, your group will introduce or adopt new linguistic makers that aren’t necessarily used by the whole population. Maybe a person you know refers to sodas as “phosphates” because his grandfather was a sodajerk and that form really catches on among your friends.
  • As your group keeps using and adopting new words (or sounds, or grammatical markers or any other facet of language)  that are different from other groups their language slowly begins to drift away from the language used by other groups.
  • Eventually, in extreme cases, you end up with separate languages. (Like what happened with Latin: different speech communities ended up speaking French, Italian, Spanish, Portuguese, and the other Romance languages rather than the Latin they’d shared under Roman rule.)

This is the process by which languages or dialectal communities tend to diverge. Divergence isn’t the only pressure on speakers, however. Particularly since we can now talk to and listen to people from basically anywhere (Yay internet! Yay TV! Yay radio!) your speech community could look like mine does: split between people from the Pacific Northwest and the South. My personal language use is slowly drifting from mostly Southern to a mix of Southern and Pacific Northwestern. This is called dialect leveling and it’s part of the reason why American dialectal regions tend include hundreds or thousands of miles instead of two or three.

Dialect leveling: Where two or more groups of people start out talking differently and end up talking alike. Schools tend to be a huge factor in this.

So, on the one hand, there is pressure to start all talking alike. On the other hand, however, I still want to sound like I belong with my Southern friends and have them understand me easily (and not be made fun of for sounding strange, let’s be honest) so when I’m talking to them I don’t retain very many markers of the Pacific Northwest. That’s pressure that’s keeping the dialect areas separate and the reason why I still say “soda”, even though I live in a “pop” region.

Huh. That’s pretty cool. Yep. Yep, it sure is.

Why is it so hard for computers to recognize speech?

This is a problem that’s plagued me for quite a while. I’m not a computational linguist  myself, but one of the reasons that theoretical linguistics is important is that it allows us to create robust concpetional models of language… which is basically what voice recognition (or synthesis) programs are. But, you may say to yourself, if it’s your job to create and test robust models, you’re clearly not doing very well. I mean, just listen to this guy. Or this guy. Or this person, whose patience in detailing errors borders on obsession. Or, heck, this person, who isn’t so sure that voice recognition is even a thing we need.

Electronic eye

You mean you wouldn’t want to be able to have pleasant little chats with your computer? I mean, how could that possibly go wrong?

Now, to be fair to linguists, we’ve kinda been out of the loop for a while. Fred Jelinek, a very famous researcher in speech recognition, once said “Every time we fire a phonetician/linguist, the performance of our system goes up”. Oof, right in the career prospects. There was, however, a very good reason for that, and it had to do with the pressures on computer scientists and linguists respectively. (Also a bunch of historical stuff that we’re not going to get into.)

Basically, in the past (and currently to a certain extent) there was this divide in linguistics. Linguists wanted to model speaker’s competence, not their performance. Basically, there’s this idea that there is some sort of place in your brain where you knew all the rules of language and  have them all perfectly mapped out and described. Not in a consious way, but there nonetheless. But somewhere between the magical garden of language and your mouth and/or ears you trip up and mistakes happen. You say a word wrong or mishear it or switch bits around… all sorts of things can go wrong. Plus, of course, even if we don’t make a recognizable mistake, there’s a incredible amount of variation that we can decipher without a problem. That got pushed over to the performance side, though, and wasn’t looked at as much. Linguistics was all about what was happening in the language mind-garden (the competence) and not the messy sorts of things you say in everyday life (the performance). You can also think of it like what celebrities actually say in an interview vs. what gets into the newspaper; all the “um”s and “uh”s are taken out, little stutters or repetitions are erased and if the sentence structure came out a little wonky the reporter pats it back into shape. It was pretty clear what they meant to say, after all.

So you’ve got linguists with their competence models explaining them to the computer folks and computer folks being all clever and mathy and coming up with algorithms that seem to accurately model our knowledge of human linguistic competency… and getting terrible results. Everyone’s working hard and doing their best and it’s just not working.

I think you can probably figure out why: if you’re a computer and just sitting there with very little knowledge of language (consider that this was before any of the big corpora were published, so there wasn’t a whole lot of raw data) and someone hands you a model that’s supposed to handle only perfect data and also actual speech data, which even under ideal conditions is far from perfect, you’re going to spit out spaghetti and call it a day. It’s a bit like telling someone to make you a peanut butter and jelly sandwich and just expecting them to do it. Which is fine if they already know what peanut butter and jelly are, and where you keep the bread, and how to open jars, and that food is something humans eat, so you shouldn’t rub it on anything too covered with bacteria or they’ll get sick and die. Probably not the best way to go about it.

So the linguists got the boot and they and the computational people pretty much did their own things for a bit. The model that most speech recognition programs use today is mostly statistical, based on things like how often a word shows up in whichever corpus they’re using currently. Which works pretty well. In a quiet room. When you speak clearly. And slowly. And don’t use any super-exotic words. And aren’t having a conversation. And have trained the system on your voice. And have enough processing power in whatever device you’re using. And don’t get all wild and crazy with your intonation. See the problem?

Language is incredibly complex and speech recognition technology, particularly when it’s based on a purely statistical model, is not terrific at dealing with all that complexity. Which is not to say that I’m knocking statistical models! Statistical phonology is mind-blowing and I think we in linguistics will get a lot of mileage from it. But there’s a difference. We’re not looking to conserve processing power: we’re looking to model what humans are actually doing. There’s been a shift away from the competency/performance divide (though it does still exist) and more interest in modelling the messy stuff that we actually see: conversational speech, connected speech, variation within speakers. And the models that we come up with are complex. Really complex. People working in Exemplar Theory, for example, have found quite a bit of evidence that you remember everything you’ve ever heard and use all of it to help parse incoming signals. Yeah, it’s crazy. And it’s not something that our current computers can do. Which is fine; it give linguists time to further refine our models. When computers are ready, we will be too, and in the meantime computer people and linguistic people are showing more and more overlap again, and using each other’s work more and more. And, you know, singing Kumbayah and roasting marshmallows together. It’s pretty friendly.

So what’s the take-away? Well, at least for the moment, in order to get speech recognition to a better place than it is now, we need  to build models that work for a system that is less complex than the human brain. Linguistics research, particularly into statistical models, is helping with this. For the future? We need to build systems that are as complex at the human brain. (Bonus: we’ll finally be able to test models of child language acquisition without doing deeply unethical things! Not that we would do deeply unethical things.) Overall, I’m very optimistic that computers will eventually be able to recognize speech as well as humans can.

TL;DR version:

  • Speech recognition has been light on linguists because they weren’t modeling what was useful for computational tasks.
  • Now linguists are building and testing useful models. Yay!
  • Language is super complex and treating it like it’s not will get you hit in the face with an error-ridden fish.
  • Linguists know language is complex and are working diligently at accurately describing how and why. Yay!
  • In order to get perfect speech recognition down, we’re going to need to have computers that are similar to our brains.
  • I’m pretty optimistic that this will happen.