An emoji dance notation system for TikTok dance tutorials πŸ‘€πŸ’ƒ

This blog post is more of a quick record of my thoughts than a full in-depth analysis, because when I saw this I immediately wanted to start writing about it. Basically, TikTok is a social media app for short form video (RIP Vine, forever in our hearts) and one of the most popular genres of content is short dances; you may already be familiar with the concept.

HOWEVER, what’s particularly intriguing to me is this sort of video here, where someone creates a tutorial for a specific dance and includes an emoji-based dance notation:

Example of a dance with an emoji notation system by

Back in grad school, when I was studying signed languages, I probably spent more time than I should have reading about writing systems for signed languages and also dance notations. To roughly sum up an entire field of study: representing movements of the human body in time and space using a writing system, or even a more specialized notation, is extremely difficult. There are a LOT of other notations out there, and you probably haven’t run into them for a reason: they’re complex, hard to learn, necessarily miss nuances and are a bit redundant given that the vast majority of dance is learned through watching & copying movement. Probably the most well-known type of dance notation is for ballroom dance where the footwork patterns are represented on the floor using images of footsteps, like so:

Langsamer Walzer Grundschritt

I think part of the reason that this notation in particular tends to work well is that it’s completely iconic: the image of a shoe print is where your shoe print should go. It also captures a large part of the relevant information; the upper body position can be inferred from the position of the feet (and in many cases will more or less remain the same throughout).

I think that’s true to some degree of these emoji notations as well. The fact that they work at all may be arising in part due to the constraints of the TikTok dance genre. In most TikTok dances, the dancer faces in a single direction for the dance, there is minimal movement around the space and the feet move minimally if at all. The performance format itself helps as well: the videos are short and easy to repeat, and you can still see the movements being preformed in full with the notation being used as a shorthand.

And it’s clear that this use of style of notation isn’t idiosyncratic; this compilation has a variety of tutorials from different creators that use variations on the same style of notation.

A selection of tiktok dance tutorials, some of which include emoji notation

Some of the types of ways emoji are used here are similar to the ways that things like Stokoe notation are, to indicate handshape and movement (although not location). A few other types of ways that emoji are used that stick out:

  • Articulator (hands with handshape, peach emoji for the hips)
  • Manner of articulation/movement (“explosive”, a specific number of repetitions, direction of movement using arrows)
  • Iconic representation of a movement using an object (helicopter = hands make the motion of helicopter blades, mermaid = bodywaves, as if a mermaid swimming)
  • Iconic representation of a shape to be traced (a house emoji = tracing a house shape with the hands, heart = trace a heart shape)
  • (Not emoji) Written shorthand for a (presumably) already known dance, for example “WOAH” for the woah

To sum up: I think this is a cool idea, it’s an interesting new type of dance notation that is clearly useful to a specific artistic community. It’s also another really good piece of evidence in the bucket of “emoji are gestures”: these are clearly not a linguistic system and are used in a variety of ways by different users that don’t seem entirely systematic.

Buuuut there’s also the way that the emojis are groups into phrases for a specific set of related motions, which smells like some sort of shallow parsing even if it’s not a full consistency structure, and I’d say that’s definitely linguistic-ish. I think I’d need to spend more time on analysis to have any more firmly held opinion than that.

Advertisement

Where πŸ‘ do πŸ‘ the πŸ‘ claps πŸ‘ go πŸ‘ when πŸ‘ you πŸ‘ write πŸ‘ like πŸ‘ this πŸ‘?

You may already be familiar with the phenomena I’m going to be talking about today: when someone punctuates some text with the clap emoji. It’s a pretty transparent gestural scoring and (for me) immediately brings to mind the way my mom would clap with every word when she was particularly exasperated with my sibling and I (it was usually along with speech like “let’s go, let’s go, let’s go” or “get up now”). It looks like so:

This innovation, which started on Black TwitterΒ is really interesting to me because it ties in with my earlier work on emoji ordering. I want to know where emojis go, particularly in relation to other words. Especially since people have since extended this usage to other emoji, like the US Flag:

Logically, there are several different ways you can intersperse clap emojis with text:

  • Claps πŸ‘ Β are πŸ‘ used πŸ‘ between πŸ‘ every πŸ‘ word.
  • Β πŸ‘ Claps πŸ‘ are πŸ‘ used πŸ‘ around πŸ‘ every πŸ‘ word. πŸ‘
  • Β πŸ‘ Claps πŸ‘ are πŸ‘ used πŸ‘ before πŸ‘ every πŸ‘ word.
  • Claps πŸ‘ are πŸ‘ used πŸ‘ after πŸ‘ every πŸ‘ word. πŸ‘
  • Claps πŸ‘ areΒ used πŸ‘Β between phrases πŸ‘Β not words

I want to know which of these best describes what people actually do. I’m not aiming to write an internet style guide, but I am hoping to characterize this phenomena in a general way: this is howΒ mostΒ people who do this do it, and if you want to use this style in a natural way, you should probably do it the same way.

Data

I used Fireant to grab 10,000 tweets from the Twitter streaming API which had the clap emoji in them at least once. (Twitter doesn’t let you search for a certain number of matches of the same string. If you search for “blob” and “blob blob” you’ll get the same set of results.)

Analysis

From that set of 10,000 tweets, I took only the tweets that had a clap emoji followed by a word followed by another clap emoji and threw out any repeats. That left me with 260 tweets. (This may seem pretty small compared to my starting dataset, but there were a lot of retweets in there, and I didn’t want to count anything twice.) Then I removed @usernames, since those show up in the beginning of any tweet that’s a reply to someone, and URL’s, which I don’t really think of as “words”. Finally, I looked at each word in a tweet and marked whether it was a clap or not. You can see the results of that here:

timecourse

The “word” axis represents which word in the tweet we’re looking at: the first, second, third, etc. The red portion of the bar are the words that areΒ the clap emoji. The yellow portion is the words that aren’t. (BTW, big shoutout to Hadley Wickham’s emo(ji) package for letting me include emoji in plots!)

From this we can see a clear pattern: almost no oneΒ startsΒ a tweet with an emoji, but most people follow the first word with an emoji. The up-down-up-down pattern means that people are alternating the clap emoji with one word. So if we look back at our hypotheses about how emoji are used, we can see right off the bat that three of them are wrong:

  • Claps πŸ‘ Β are πŸ‘ used πŸ‘ between πŸ‘ every πŸ‘ word.
  • Β πŸ‘ Claps πŸ‘ are πŸ‘ used πŸ‘ around πŸ‘ every πŸ‘ word. πŸ‘
  • Β πŸ‘ Claps πŸ‘ are πŸ‘ used πŸ‘ before πŸ‘ every πŸ‘ word.
  • Claps πŸ‘ are πŸ‘ used πŸ‘ after πŸ‘ every πŸ‘ word. πŸ‘
  • Claps πŸ‘ areΒ used πŸ‘Β between phrases πŸ‘Β not words

We can pick between the two remaining hypotheses by looking at whether people are ending thier tweets with a clap emoji.Β As it turns out, the answer is “yes”, more often than not.

endWithClap

If they’re using this clapping-between-words pattern (sometimes called the “ratchet clap“) people are statistically more likely to end their tweet with a clap emoji than with a different word or non-clap emoji. This means the most common pattern is to use πŸ‘Β a πŸ‘Β clap πŸ‘Β after πŸ‘ every πŸ‘ word,Β πŸ‘Β  includingΒ  πŸ‘Β theΒ  πŸ‘Β last. πŸ‘

This makes intuitive sense to me. This pattern is mimicking someone is clapping on every word. Since we can’t put emoji on top of words to indicate that they’re happening at the same time, putting them after makes good intuitive sense. In some sense, each emoji is “attached” to the word that comes before it in a similar way to how “quickly” is “attached” to “run” in the phrase “run quickly”. It makes less sense to put emoji betweenΒ words, becuase then you end up with less claps than words, which doesn’t line up well with the way this is done in speech.

The “clap after every word” pattern is also what this website that automatically puts claps in your tweets does, so I’m pretty positive this is a good characterization of community norms.

 

So there you have it! If you’re going to put clap emoji in your tweets, you should probably do πŸ‘Β it πŸ‘Β like πŸ‘Β this. πŸ‘Β It’s not wrong if you don’t, but it does look kind of weird.

What’s up with calling a woman “a female”? A look at the parts of speech of “male” and “female” on Twitter .

This is something I’ve written about before, but I’ve recently had several discussions with people who say they don’t find it odd to refer to a women as a female. Personally, I don’t like beingΒ called “a female” becuase its a term I to associate stronglyΒ with talking about animals. (Plus, it makes you sound like a Ferengi.)Β  I would also protest men being called males, for the same reason, but my intuition is that that doesn’t happen as often. I’m willing to admit that my intuition may be wrong in this case, though, soΒ I’ve decided to take a more data-driven approach. I had two main questions:

  • Do “male” and “female” get used as nouns at different rates?
  • Does one of these terms get used more often?

Data collection

I used the Twitter public API to collect two thousand English tweets, one thousand each containing the exact string “a male” and “a female”. I looked for these strings to help get as many tweets as possible with “male” or “female” used as a noun. “A” is what linguist call a determiner, and a determinerΒ hasΒ to have a noun after it. It doesn’t have to be the very next word, though; you can get an adjective first, like so:

  • A female mathematician proved the theorm.
  • A female proved the theorm.

SoΒ this will let me directly compare these words in a situation where we should only be able to see a limited number of possible parts of speech & see if they differ from each other. Rather than tagging two thousand tweets by hand, I usedΒ a Twitter specific part-of-speech tagger to tag each set of tweets.

A part of speech tagger is a tool that guessesΒ the part of speech of every word in a text. So if you tag a sentence like “Apples are tasty”, you shouldΒ get back that “apples” is a plural noun, “are” is a verb and “tasty” is an adjective. You can try one out for yourself on-line here.

Parts of Speech

In line with my predictions, every instance of “male” or “female” was tagged as either a noun, an adjective or a hashtag. (I went through and looked at the hashtags and they were all porn bots. #gross #hazardsOfTwitterData)

However, not every noun was tagged as the sameΒ typeΒ of noun. I saw three types of tags in my data: NN (regular old noun), NNS (plural noun) and, unexpectedly, NNP (proper noun, singular). (If you’re confused by the weird upper case abbreviations, they’re the tags used in theΒ Penn Treebank, and you can see the full list here.) In case it’s been a while since you studied parts of speech, proper nouns are things like personal or place names. The stuff that tend to get capitalized in English. The examples fromΒ the Penn Treebank documentation include “Motown”, “Venneboerger”, Β and “Czestochwa”. I wouldn’t consider either “female” or “male” a name, so it’s super weird that they’re getting tagged as proper nouns. What’s even weirder? It’s pretty much only “male” that’s getting tagged as a proper noun, as you can see below:

maleVsFemalePOS
Number of timesΒ each word tagged as each part of speech by the GATE Twitter part-of-speech tagger. NNS is a plural noun, NNP a proper noun, NN a noun and JJ an adjective.

The differences in tagged POS between “male” and “female”Β was super robust(X2(6, N =Β 2033) = 1019.2, p <.01.). So what’s happening here? Β My first thought was that it might be that, for some reason, “male” is getting capitalized more often and that was confusing the tagger. But when I looked into, there wasn’t a strong difference between the capitalization of “male” and “female”: both were capitalized about 3% of the time.Β 

My secondΒ thought was that it was a weirdness showing up becuase I used a tagger designed for Twitter data. Twitter is notoriously “messy” (in the sense that it can be hard for computers to deal with) so it wouldn’t be surprising if tagging “male” as a proper noun is the result of the taggerΒ being trained on Twitter data. So, to check that, I re-tagged the same data using the Stanford POS tagger. And, sure enough, the weird thing where “male” is overwhelming tagged as a proper noun disappeared.

stanfordTaggerPOS
Number of times each word tagged as each part of speech by the Stanford POS tagger. NNS is a plural noun, NNP a proper noun, NN a noun, JJ an adjective and FW a “foreign word”.

SoΒ it looks like “male” being tagged as a proper noun is an artifact of the tagger being trained on Twitter data, and once we use a tagger trained on a different set of texts (in this case the Wall Street Journal) there wasn’t aΒ strong difference in whatΒ POS “male” and “female” were tagged as.

Rate of Use

That said,Β there wasΒ a strong difference between “a female” and “a male”: how often they get used. In order to get one thousand tweets with the exact string “a female”, Twitter had to go back an hour and thirty-four minutes. In order to get a thousand tweets with “a male”, however, Twitter had to go back two hours and fifty eight minutes.Β Based on this sample, “a female” gets said almost twice as often as “a male”.

So what’s the deal?

  • Do “male” and “female” get used as nouns at different rates? Β It depends on what tagger you use! In all seriousness, though, I’m not prepared to claim this based on the dataset I’ve collected.
  • Does one of these terms get used more often? Yes! Based on my sample, Twitter users useΒ “a female” about twice as often as “a male”.

I think the greater rate of use of “a female” that points to the possibility of an interesting underlying difference in how “male” and “female” are used, one that calls for a closer qualitative analysis. Does one term get used to describe animals more often than the other? What sort of topicsΒ are people talking about when they say “a male” and “a female”? These questions, however, will have to wait for the next blog post!

In the meantime, I’m interested in getting more opinions on this.Β How do you feel about using “a male” and “a female” as nouns to talk about humans? Do they sound OK or strike you as odd?

My code and isΒ available on my GitHub.

Do emojis have their own syntax?

So a while ago I got into a discussion with someone on Twitter about whether emojis have syntax. Their original question was this:

As someone who’s studied sign language, my immediate thought was “Of course there’s a directionality to emoji: they encode the spatial relationships of the scene.” This is just fancy linguist talk for: “if there’s a dog eating a hot-dog, and the dog is on the right, you’re going to use πŸŒ­πŸ•, not πŸ•πŸŒ­.” But the more I thought about it, the more I began to think that maybe it would be better not to rely on my intuitionsΒ in this case. First, because I know American Sign Language and that might be influencing me and, second, because I am pretty gosh-darn dyslexicΒ and I can’t promise that my really excellent ability to flip adjacent characters doesn’t extend to emoji.

So, like any good behavioral scientist, I ran a little experiment.Β I wanted to know two things.

  1. Does an emoji description of a scene show the way that things are positioned in that scene?
  2. Does the order of emojis tend to be the same as the ordering of those same concepts in an equivalent sentence?

As it turned out, the answers to these questions are actually fairly intertwined, and related to a third thing I hadn’t actually considered while I was putting together my stimuli (but probably should have): whether there was an agent-patient relationship in the photo.

Agent: The entity in a sentence that’s affecting a changed, the “doer” of the action.

  • The dog ate the hot-dog.
  • TheΒ raccoonsΒ pushed over all the trash-bins.

Patient: The entity that’s being changed, the “receiver” of the action.

  • The dog ate the hot-dog.
  • The raccoonsΒ pushed over all theΒ trash-bins.

Data

To get data, I showed people three pictures and asked them to “pick the emoji sequence that best describes the scene” and then gave them two options that used different orders of the same emoji. Then, once they were done with the emoji part, I asked them to “please type a short sentence to describe each scene”. For all the language data, I just went through and quickly coded the order that the same concepts as were encoded in the emoji showed up.

Examples:

  • “The dog ate a hot-dog” Β -> dog hot-dog
  • “The hot-dog was eaten by the dog” -> hot-dog dog
  • “A dog eating” -> dog
  • “The hot-dog was completely devoured” -> hot-dog

So this gave me two parallel data sets: one with emojis and one with language data.

All together,Β 133 people filled out the emoji half and 127 people did the whole thing, mostly in English (I had one person respond in Spanish and I went ahead and included it).Β I have absolutely no demographics on my participants, and that’s by design; since I didn’t go through the Institutional Review Board it would actually be unethical for me to collect data about people themselves rather than just general information on language use. (If you want to get into the nitty-gritty this is a really good discussion of different types of on-line research.)

Picture one – A man counting money

Watch, movie schedule, poster, telephone, cashier machine, cash register Fortepan 6680

I picked this photo as sort of a sanity-check: there’s no obvious right-to-left ordering of the man and the money, and there’s one pretty clearΒ way of describing what’s going on in this scene. There’s anΒ agent (the man) and a patient (the money), and since we tend to describe things as agent first, patient second I expected people to pretty much all do the same thing with this picture. (Side note: IΒ knowΒ I’ve read a paper about the cross-linguistic tendency for syntactic structures where the agent comes first, but I can’t find it and I don’t remember who it’s by. Please let me know if you’ve got an idea what it could be in the comments–it’s driving me nuts!)

manmoney

And they did! Pretty much everyone described this picture by putting the man before the money, both with emoji and words. This tells us that, when there’s no information about orientation you need to encode (e.g. what’s on theΒ right or left), people do tend to use emoji in the same order as they would the equivalent words.

Picture two – A man walking by a castle

ChΓ’teau de Canisy (5)

But now things get a little more complex. What if thereΒ isn’tΒ a strong agent-patient relationship and thereΒ isΒ a strong orientation in the photo? Here, a man in a red shirtΒ is walking by a castle, but he shows up on the right side of the photo. Will people be more likely to describe this scene with emoji in a way that encodes the relationship of the objects in the photo?

mancastle

I found that they were–almost four out of five participants described this scene by using the emoji sequence “castle man”, rather than “man castle”. This is particularly striking because, in the sentence writing part of the experiment, most people (over 56%) wrote a sentence where “man/dude/person etc.” showed up before “castle/mansion/chateau etc.”.

So while people can use emoji to encode syntax, they’reΒ alsoΒ using them to encode spatial information about the scene.

Picture three – A man photographing a model

Photographing a model

Ok, so let’s add a third layer of complexity: what about when spatial information and the syntactic agent/patient relationships are pointing in opposite directions? For the scene above, if you’re encoding the spatial information then you should use an emoji ordering like “woman camera man”, but if you’re encoding an agent-patient relationship then, as we saw in the picture of the man counting money, you’ll probably want to put the agent first: “man camera woman”.

(I leave it open for discussion whether the camera emoji here is representing a physical camera or a verb like “photograph”.)

mangirlcamera
For this chart I removed some data to make it readable. I kicked out anyone who picked another ordering of the emoji, and any word order that fewer than ten people (e.g. less than 10% of participants) used.

So people were a little more divided here. It wasn’t quite a 50-50 split, but it really does look like you can go either way with this one. The thing that jumped out at me, though, was how the word order and emoji order pattern together: if your sentence is something like “A man photographs a model”, then you are far more likely to use theΒ “man camera woman” emoji ordering. On the other hand, if your sentence is something like “A woman being photographed by the sea” or “Photoshoot by the water”, then it’s more likely that your emoji ordering described the physical relation of the scene.

So what?

So what’s the big takeaway here? Well, one thing is that emoji don’t really have a fixed syntax in the same way language does. If they did, I’d expect that there would be a lot more agreement between people about the right way to represent a scene with emoji. There was aΒ lotΒ of variation.

On the other hand, emoji ordering isn’t just random either. ItΒ isΒ encoding information, either about the syntactic/semantic relationship of the concepts or their physical location in space. The problem is that you really don’t have a way of knowing which one is which.

Edit 12/16/2016: The dataset and the R script I used to analyze it are now avaliable on Github.

What’s the difference between & and +?

So if you’re like me, you sometimes take notes on the computer and end up using some shortcuts so you can keep up with the speed of whoever’s talking. One of the short cuts I use a lot is replacing the word “and” with punctuation. When I’m handwriting things I only ever use “+” (becuase I can’t reliably write an ampersand), but in typing I use both “+” and “&”. And I realized recently, after going back to change which one I used, that I hadΒ the intuition that they should be used for different things.

Ampersand-handwriting-3.png
I don’t use Ampersands when I’m handwriting things becuaseΒ they’re hard to write.

Like sometimes happens with linguistic intuitions, though, I didn’t really have a solid idea of how they were different, just that they were. Fortunately, I had a ready-made way to figure it out. Since I use both symbols on Twitter quite a bit, all I had to do was grab tweets of mine that used either + or & and figure out what the difference was.

I got 450 tweets from between October 7th and November 11th of this year from my own account (@rctatman). I used either & or + in 83 of them, or roughly 18%. This number is a little bit inflated because I was livetweeting a lot of conference talks in that time period, and if a talk has two authors I start every livetweet from that talk with β€œAuthorName1 & AuthorName2:”. 43 tweets useΒ & in this way. If we get rid of those, only around 8% of my tweets contain either + or &. They’re still a lot more common in my tweets than in writing in other genres, though, so it’s still a goodΒ amount of data.

So what do I use + for? See for yourself! Below are all the things I conjoined with + in my Twitter dataset. (Spelling errors intact. I’m dyslexic, so if I don’t carefully edit textβ€”and even sometimes when I do, to my eternal chagrinβ€”I tend to have a lot of spelling errors. Also, a lot of these tweets are fromΒ EMNLP so there’s quite a bit of jargon.)

  • time + space
  • confusable Iberian language + English
  • Data + code
  • easy + nice
  • entity linking + entity clustering
  • group + individual
  • handy-dandy worksheet + tips
  • Jim + Brenda, Finn + Jake
  • Language + action
  • linguistic rules + statio-temporal clustering
  • poster + long paper
  • Ratings + text
  • static + default methods
  • syntax thing + cattle
  • the cooperative principle + Gricean maxims
  • Title + first author
  • to simplify manipulation + preserve struture

If you’ve had some syntactic training, it might jump out to you that most of these things have the same syntactic structure: they’re noun phrases! There are just a couple of exception. The first is β€œstatic + default methods”, where the things that are being conjoined are actually adjectives modifying a single noun. The other is β€œto simplify manipulation + preserve struture”. I’m going to remain agnostic about where in the verb phrase that coordination is taking place, though, so I don’t get into any syntax arguments ;). That said, this is a fairly robust pattern! Remember that I haven’t been taught any rules about what I β€œshould” do, so this is just an emergent pattern.

Ok, so what about &? Like I said, myΒ number one use is for conjunction of names. This probably comes from my academic writing training. Most of the papers I read that use author names for in-line citations use an & between them. But I do also use it in the main body of tweets. My use of & is a little bit harder to characterize, so I’m going to go through and tell you about each type of thing.

First, I use it to conjoin user names with the @ tag. This makes sense, since I have a strong tendency to use & with names:

  • @uwengineering & @uwnlp
  • @amazon @baidu @Grammarly & @google

In some cases, I do use it in the same way as I do +, for conjoining noun phrases:

  • Q&A
  • the entities & relations
  • these features & our corpus
  • LSTM & attention models
  • apples & concrete
  • context & content

But I also use it for comparatives:

  • Better suited for weak (bag-level) labels & interpretable and flexible
  • easier & faster

And, perhaps more interestingly, for really high-level conjugation, like at the level of the sentence or entire verb phrase (again, I’m not going to make ANY claims about what happens in and around verbsβ€”you’ll need to talk to a syntactician for that!).

  • Classified as + or – & then compared to polls
  • in 30% of games the group performance was below average & in 17% group was worse than worst individual
  • math word problems are boring & kids learn better if they’re interested in the theme of the problem
  • our system is the first temporal tagger designed for social media data & it doesn’t require hand tagging
  • use a small labeled corpus w/ small lexicon & choose words with high prob. of 1 label

And, finally, it gets used in sort of miscellaneous places, like hashtags and between URLs.

So & gets used in a lot more places than + does. I think that this is probably because, on some subconscious level I consider & to be the default (or, in linguistics terms, “unmarked“). This might be related to how I’m processing these symbols when I read them. I’m one of those people who hears an internal voice when reading/writing, so I tend to have canonical vocalizations of most typed symbols. I read @ as β€œat”, for example, and emoticons as a prosodic beat with some sort of emotive sound. Like I read the snorting emoji as the sound of someone snorting. For & and +, I read & as β€œand” and + as β€œplus”. I also use β€œplus” as a conjunction fairly often in speech, as do many of my friends, so it’s possible that it may pattern with my use in speech (I don’t have any data for that, though!). But I don’t say β€œplus” nearly as often as I say β€œand”. β€œAnd” is definitely the default and I guess that, by extension, & is as well.

Another thing that might possibly be at play here is easeΒ of entering these symbols. While I’m on my phone they’re pretty much equally easy to type, on a full keyboardΒ + is slightly easier, since I don’t have to reach as far from the shift key. But if that were the only factorΒ my default would be +, so I’m fairly comfortable claiming that the fact that I use & for more types of conjunction is based on the influence of speech.

A BIG caveat before I wrap upβ€”this is a bespoke analysis. It may hold for me, but I don’t claim that it’s the norm of any of my language communities. I’d need a lot more data forΒ that! That said, I think it’s really neat that I’ve unconsciously fallen into a really regular pattern of use for two punctuation symbols that are basically interchangeable. It’s a great little example of the human tendency to unconsciouslyΒ tidy up language.

There, their and they’re: linguistics style!

The most frustrating homophone triplet in English is there, their and they’re, which are all said [Γ°Ι›r]. They’reΒ a pain, and one that I’ve found that even really smart adults struggle with. And, frankly, I think a lot of that has to do with the fact that they’re not usually taught in a very linguistically sophisticated way. Luckily for y’all, “linguistic sophistication” is my middle name*. And, like all good linguists I’ve got some tests to help you figure out which [Γ°Ι›r] you need.

googleChart
If tests aren’t your style and you just want to play the odds, though, guess β€œtheir”, β€œthere” and β€œthey’re” in that order. According to Google’s n-gram viewer (click the chart to go play around with it) β€œtheir” is the most common [Γ°Ι›r] in writing, followed by β€œthere” and then β€œthey’re”.

  • There. So the confusing thing here is that there are really *two* there’s in English and they play really different roles.
    • PleonasticΒ there. So in English we really need subjects, even when we don’t. Some sentences like “It’s raining” and “There’s no more ice-cream” don’t actually need a subject to convey what we’re getting at. There’s no thing, “it”, up in the sky that is doing the raining like there’s a person throwing a ball in “They threw the ball”. We just stick it up in there to fill out our sentence.
      • Test: Can you replaceΒ [Γ°Ι›r] with “it”? If so, it’s probably “there”.
      • Test: IfΒ the sentence has “[Γ°Ι›r] was/were/is/are/will” it will almost always be “there”.
    • LocativeΒ there. So “locative” is just a fancy word for “relating to a place”. Are you talking about a place? If so, then you probably need “there”.
      • Test: Is [Γ°Ι›r]Β referring to a place? If so, it’s probably “there”.
  • Their. So people tend to use a semantic definition for this one; does itΒ belongΒ to someone? It’s way easier to figure it out with part of speech, though. “Their” is part of a pretty small class of words called “determiners”– you may also have heard Β “articles”. One good way to test if a word belongs to the same part of speech as another is to replace it in the sentence. You know “snake” and “pudding” are both nouns because you say either “My snake fell off the shelf” or “My pudding fell off the shelf”. So all you have to do is swap it out with one of the other English Determiners and see if it works.
    • Test: Can you replaceΒ [Γ°Ι›r] with words like “my”, “our”, “the” or “some”? If so, it’s “their”.
  • They’re. This is probably the easiest one. They’re is a contraction of “they” and “are”. If you can uncontract them and the sentence still works, you’re golden.
    • Test: Can you replaceΒ [Γ°Ι›r] with “they are”? If so, it’s probably “they’re”.

Try out these tests next time you’re not sure which [Γ°Ι›r] is the right one and you should figure it out pretty quickly. Of course, there are some marginal cases (like when you’re talking about the words themselves) that may throw you off, but these guidelines should pull you through 99% of the time.

* Not actually my middle name.

Great Ideas in Linguistics: Grammaticality Judgements

Today’s Great Idea in Linguistics comes to use from syntax. One interesting difference between syntax and other fields of linguistics is what is considered compelling evidence for a theory in syntax. The aim of transformational syntax is to produce a set of rules (originally phrase structure rules) that will let you produce all the grammatical sentences in a language and none of the ungrammatical ones. Β So, if you’re proposing a new rule you need to show that the sentences it outputs are grammatical… but how do you do that?

Wessel smedbager04.jpg
I sentence you to ten hours of community service for ungrammatical utterances!

One way to test whether something is grammatical is to see whether someone’s said it before. Back in the day, before you had things like large searchable corpora–or, heck even the internet–this was Β difficult, so say the least. Especially since the really interesting syntactic phenomena tend to be pretty rare. Lots of sentences have a subject and an object, but a lot fewer have things like wh-islands.

Another way is to see if someoneΒ willΒ say it. This is a methodology that is often used in sociolinguistics research. The linguist interviews someone using questions that are specifically designed to elicit certain linguistic forms, like certain words or sounds. However, this methodology is chancy at best. Often times the person won’t produce whatever it is you’re looking for. Also it can be very hard to make questions or prompts to access very rare forms.

Another way to see whetherΒ something is grammatical is to see whether someoneΒ wouldΒ say it. This is the type of evidence that has, historically, been used most often in syntax research. The concept is straightforward. You present a speaker of a language with a possible sentence and Β they use thier intuition as a native speaker to determine whether it’s good (“grammatical”) or not (“ungrammatical”). These sentences are often outputs of a proposed structure and used to argue either for or against it.

However, in practice grammaticalityΒ judgements can occasionally be a bit more difficult. Think about the following sentences:

  • I ate the carrot yesterday.
    • This sounds pretty good to me. I’d say it’s “grammatical”.
  • *I did ate the carrot yesterday.
    • I put a star (*) in front of this sentence because it sounds bad to me, and I don’t think anyone would say it. I’d say it’s “ungrammatical”.
  • ? I done ate the carrot yesterday.
    • This one is a little more borderline. It’s actually something I might say, but only in a very informal context and I realize that not everyone would say it.

So if you were a syntacticianΒ working on these sentences, you’d have to decide whetherΒ your model should account for the last sentence or not. One way to get around this is by building probability into the syntactic structure. So I’m moreΒ likelyΒ to use a structure that producesΒ the first example but there’s a small probability I might use the structure in the third example. To know what those probabilities are, however, you need to figure out how likely people are to use each of the competing structures (and whether there are other factors at play, like dialect) and for that you need either lots and lots of grammaticality judgements. It’s a new use of a traditional tool that’s helping to expandΒ our understanding of language.

New series: 50 Great Ideas in Linguistics

As I’ve been teaching this summer (And failing to blog on a semi-regular basis like a loser. Mea culpa.) I’ll occasionally find that my students aren’t familiar with something I’d assumed they’d covered at some point already. I’ve also found that there are relatively few resources for looking up linguistic ideas that don’t require a good deal of specialized knowledge going in. SIL’s glossary of linguistic termsΒ is good but pretty jargon-y, and the variousΒ handbooks tend not to have on-line versions. And even with a concerted effort by linguistsΒ to make Wikipedia a good resource, I’m still not 100% comfortable with recommending that my students use it.

Therefore! I’ve decided to make my own list of Things That Linguistic-Type People Should Know and then slowly work on expounding on them. I have something to point my students to and it’s a nice bite-sized way to talk about things; perfect for a blog.

Here, in no particular order, are 50ish Great Ideas of Linguistics sorted by sub-discipline. (You may notice a slightly sub-disciplinary bias.) I might change my mind on some of these–and feel free to jump in with suggestions–but it’s a start. Look out for more posts on them.

  • Sociolinguistics
    • Sociolinguistic variables
    • Social class and language
    • Social networks
    • Accommodation
    • Style
    • Language change
    • Linguistic security
    • Linguistic awareness
    • Covert and overt prestige
  • Phonetics
    • Places of articulation
    • Manners of articulation
    • Voicing
    • Vowels and consonants
    • Categorical perception
    • “Ease”
    • Modality
  • Phonology
    • Rules
    • Assimilation and dissimilation
    • Splits and mergers
    • Phonological change
  • Morphology
  • Syntax
  • Semantics
    • Pragmatics
    • Truth values
    • Scope
    • Lexical semantics
    • Compositional semantics
  • Computational linguistics
    • Classifiers
    • Natural Language Processing
    • Speech recognition
    • Speech synthesis
    • Automata
  • Documentation/Revitalization
    • Language death
    • Self-determination
  • Psycholinguistics

Meme Grammar

So the goal of linguistics is to find andΒ describeΒ the systematic ways in which humans use language. And boy howdy do we humans love using language systematically. A great example of this is internet memes.

What are internet memes? Well, let’s start with the idea of a “meme”. “Memes” wereΒ positedΒ by Richard Dawkin in his bookΒ TheΒ SelfishΒ Gene. He used the term toΒ describeΒ culturalΒ ideas that are transmitted from individual to individual much like a virus or bacteria. The science mystique I’ve written about is a great example of a meme of this type. If you have fifteen minutes, I suggest Dan Dennett’s TED talk on the subject of memes as a much more thoroughΒ introduction.

So what about the internet part? Well, internet memes tend to be a bit narrower inΒ their scope. Viral videos, for example, seem to be aΒ separate category fromΒ intentΒ memes even though they clearly fit into Dawkin’s idea of what a meme is. Generally, “internetΒ meme” refers to a specific image and text that is associated with that image. These are generally called image macros. (For a through analysis of emerging andΒ successfulΒ internet memes, as well as an excellent object lesson in why you shouldn’t scroll down to read the comments, I suggest Know Your Meme.) It’s the text that I’m particularly interested in here.

Memes which involve language require that it be used in a veryΒ specificΒ way, and failure to obey these rules results in social consequences. In order to keep this post a manageable size, I’m just going to look at the use of language in the two most popular image memes, as ranked byΒ memegenerator.net, though there is aΒ lotΒ more to study here. (I think a study of theΒ differingΒ uses of the initialisms MRW [my reaction when] Β and MFW [my face when] on imgur and 4chan would show some very interesting patterns in the construction of identityΒ in the two communities. Particularly since the 4chan community is made up of anonymous individuals and the imgurΒ communityΒ is made up of named individuals who are attempting to gain status through points. But that’s a discussion for another day…)

The God tier (i.e. most popular) characters at on the website Meme Generator as of February 23rd, 2013. Click for link to site.
The God tier (i.e. most popular) characters at on the website Meme Generator as of February 23rd, 2013. Click for link to site. If you don’t recognize all of these characters,Β congratulationsΒ on not spending all your free time on the internet.

Without further ado, let’s get to the grammar. (I know y’all are excited.)

Y U No

This meme isΒ particularlyΒ interestingΒ because its page on Meme Generator already has a grammaticalΒ description.

The Y U No meme actually began as Y U No Guy but eventually evolved into simply Y U No, the phrase being generally followed by some often ridiculous suggestion. Originally, the face of Y U No guy was taken from Japanese cartoon Gantz’ Chapter 55: Naked King, edited, and placed on a pink wallpaper. The text for the item reads β€œI TXT U … Y U NO TXTBAK?!” It appeared as a Tumblr file, garnering over 10,000 likes and reblogs.

It went totally viral, and has morphed into hundreds of different forms with a similar theme. When it was uploaded to MemeGenerator in a format that was editable, it really took off. The formula used was : β€œ(X, subject noun), [WH]Y [YO]U NO (Y, verb)?”.Β [Bold mine.]

A pretty good try, but it canΒ definitelyΒ be improved upon. There are always twoΒ distinctΒ groupings of text in this meme, always in impact font, white with a black border and in all caps. This is pretty consistent across all image macros. In order to indicate the break between the two text chunks, I will use —Β throughoutΒ this post. The chunk of text that appears above the image is a noun phrase that directly addresses someone or something, often a famous individual or corporation. The bottom text starts with “Y U NO” and finishes with a verb phrase. The verb phrase is anΒ activityΒ or action that the addressee from the first block of text couldΒ orΒ shouldΒ have done, and that the meme creator considers positive. It is also inflectedΒ as if “Y U NO” were structurallyΒ equivalentΒ to “Why didn’t you”. So, since you would ask Steve Jobs “Why didn’t you donate more money to charity?”, a grammatical meme to that effect would be “STEVE JOBS — Y U NO DONATE MORE MONEY TO CHARITY”. In effect, this meme questions someone or thing who had the agency to do something positive why they chose not to do that thing. While this certainly has theΒ potential to be aΒ vehicleΒ for social commentary, like most memes it’s mostly used for comedic effect. Finally, there is some variation in theΒ punctuationΒ of this meme. While noΒ punctuationΒ is the most common, an exclamationΒ points, a question mark or both are all used. I wouldΒ hypothesizeΒ that the the use of punctuation varies between internet communities… but I don’t really have the time or space to get into that here.

A meme (created by me using Meme Generator) following the guidelines outlined above.

Futurama Fry

This meme also has a brief grammatical analysis

The text surrounding the meme picture, as with other memes, follows a set formula. This phrasal template goes as follows: β€œNot sure if (insert thing)”, with the bottom line then reading β€œor just (other thing)”. It was first utilized in another meme entitled β€œI see what you did there”, where Fry is shown in two panels, with the first one with him in a wide-eyed expression of surprise, and the second one with the familiar half-lidded expression.

As an example of the phrasal template, Futurama Fry can be seen saying: β€œNot sure if just smart …. Or British”. Another example would be β€œNot sure if highbeams … or just bright headlights”. The main form of the meme seems to be with the text β€œNot sure if trolling or just stupid”.

This meme is particularly interestingΒ becauseΒ there seems to anΒ extremelyΒ rigid syntactic structure. The phrase follow the form “NOT SURE IF _____ — OR _____”. The first blank can either be filled by a complete sentence or a subject complement while the second blankΒ mustΒ be filled by a subject complement. Subject complements, also called predicates (But only by linguists; if you learned about predicates in school it’s probably something different. A subject complement is more like aΒ predicateΒ adjective or predicate noun.), are everything that can come after a form of the verb “to be” in a sentence. So, in a sentence like “It is raining”, “raining” is the subject complement. So, for the FuturamaΒ Fry meme, if you wanted to indicate that you were uncertain whther it was raining or sleeting, both of these forms would be correct:

  • NOT SURE IF IT’S RAINING — OR SLEETING
  • NOT SURE IF RAINING — OR SLEETING

Note that, if a complete sentence is used andΒ abbreviationΒ is possible, it must be abbreviated. Thus the following sentence is not a good FuturamaΒ FryΒ sentence:

  • *NOT SURE IF IT IS RAINING — OR SLEETING

This is particularly interesting Β becauseΒ the “phrasal template” descriptionΒ does not include this distinction, but it is quite robust. This is a great example of how humans notice andΒ perpetuateΒ linguisticΒ patterns that they aren’t necessarily aware of.

A meme (created by me using Meme Generator) following the guidelines outlined above. IfΒ you’reΒ not sure whether it’s phonetics or phonology, may I recommend this postΒ as a quick refresher?

So this is obviously very interesting to a linguist, since we’re really interested in extracting and distilling those patterns. But why is this useful/interesting to those of you who aren’tΒ linguists? A couple of reasons.

  1. I hope you find it at least a little interesting and that it helps to enrich yourΒ knowledgeΒ of your experience as a human. Our capacity for patterning is so robust that it affects almost every aspect of ourΒ existence and yet it’s easy to forget that, to let our awareness of that slip our of ourΒ conscious minds. Some patterns deserve to be examined andΒ criticized, though, andΒ Β linguistics provides an excellent low-risk training ground for that kind of analysis.
  2. If you are involved in internet communities I hope you can use this newΒ knowledgeΒ to avoid the social consequences of violating meme grammars. TheseΒ consequences can range from a gentle reprimandΒ to mockery and scorn The gatekeepers of internet culture are many,Β vigilantΒ andΒ vicious.
  3. As with much linguisticΒ inquiry, accurately noting and describingΒ these patterns is the first step towards being able to use them in a useful way. I can think of many uses, for example, of a program that did large-scale sentiment analyses of image macros but was able to determine which were grammatical (and therefore more likely to beΒ accepted andΒ propagatedΒ by internet communities) and which were not.

The Many Moods of “Alarming”

So you’ll all be doubtless relieved to know that I have cheerfully settled in Seattle andΒ immediatelyΒ returned to my old tricks. Observe this gemΒ broughtΒ to you by Seattle City Light:

Something’s alarming right enough… but I think it’s actually my linguistics sense.

Now, as both a linguist and native speaker of American English, I find this command troubling. NotΒ becauseΒ I have a problem with civic-mindedΒ individualsΒ alerting the power company to potentially dangerous problems, butΒ becauseΒ it’sΒ ambiguous. I’ve written aboutΒ ambiguityΒ in languageΒ before, but it’s something that I revisit often and it’s a complex enough subject that you can easily spend an entire lifetime studying it, let alone more than one blog post.

Let’s examineΒ whyΒ this sign isΒ ambiguous a little more closely.

First, there’s (what I would consider) a non-standard usage of the word “alarming”. Β IΒ tend to imagine something that is “alarming” to be capable of putting me in a state of alarm, rather than currently expressing alarm. Or, as the OED puts it:

“Disturbing or exciting with the apprehension of danger.”

Yeah, that’s right, “alarming” is one of the few words that the OED only has one definition for. Let’s put that aside for the moment, though, and assume that there’s aΒ linguistically-creative sign maker working for Seattle City Light who has coined a neologism based on parallels with words like “understanding” or “revolving”. The real crux of the matter is that the command is not a sentence, and has just too many gaps where the reader has to fill in information.

These are just a couple of the possible interpretations I came up for the sign:

  • If [the alarm is] alarming (in the sense of performing the action which alarms traditionally do, such asΒ whoopingΒ and revolving) [then] call.
  • If [you are] alarming [other people, then] call.
  • If [the alarm is] alarming [you, regardless ofΒ whetherΒ or not it’s currently flashing or making noise then] call.

Now, English syntax is a prettyΒ resilientΒ beast and can put up with a certain amount of words Β left out. The fancy linguistics term for this is “ellipsis“, just like theΒ punctuationΒ mark. (This one: …) Words have to be left out of of certain places in certain ways, Β though. Like you don’t have to say “you”Β every timeΒ you tell someone to do something. “Don’t sit there!” is perfectly acceptable as a sentence, and if someone told Β you that you’d have no problem figuring out that they were tellingΒ youΒ not to sit onΒ theirΒ cat. Like everything else inΒ language, though, there are rules and by breaking them you run the risk of failing to communicate what you’re trying to… just like this sign.