In the Seinfeld episode, "the opposite", George says that his life is the opposite of everything he wanted it to be, and that every instinct he has is wrong. He decides to go against his instincts and do the opposite of everything. When the waitress asks him whether to bring him his usual order, "tuna on toast, coleslaw, and a cup of coffee", he decides to have the opposite: "Chicken salad, on rye, untoasted. With a side of potato salad. And a cup of tea!". Jerry argues with him on what's the opposite of tuna, which is according to him, salmon. So which one of them is right? If you ask me, nor salmon nor chicken salad is the opposite of tuna. There is no opposite of tuna. But this funny video demonstrates one of the biggest problems in the task of automatically detecting antonyms: even us humans are terrible at that!
It's a Bird, It's a Plane, It's Superman (not antonyms)
Many people would categorize a pair of words as opposites if they represent two mutually exclusive options/entities in the world, like male and female. black and white, and tuna and salmon. The intuition is clear when these two words x and y represent the only two options in the world. In set theory, it means that y is the negation/complement of x. In other words, everything in the world which is not x, must be y (figure 1).
Figure 1: x and y are the only options in the world U |
In this sense, tuna and salmon are not antonyms - they are actually more accurately defined as co-hyponyms: two words that share a common hypernym (fish). They are indeed mutually exclusive, as one cannot be both a tuna and a salmon. However, if you are not a tuna, you are not necessarily a salmon. You can be another type of fish (mackerel, cod...) or something else which is not a fish at all (e.g. person). See figure 2 for a set theory illustration.
|
Similarly, George probably had in mind that tuna and chicken salad are mutually exclusive options for sandwich fillings. He was probably right; a tuna-chicken salad sandwich sounds awful. But since there are other options for sandwich fillings (peanut butter, jelly, peanut butter and jelly...), these two can hardly be considered as antonyms, even if we define antonyms as complements within a restricted set of entities in the world (e.g. fish, sandwich fillings). I suggest the "it's a bird, it's a plane, it's superman" binary test for antonymy: if you have more than two options, it's not antonymy!
Wanted Dead or Alive (complementary antonyms)
What about black and white? These are two colors out of a wide range of colors in the world, failing the bird-plane-Superman test. However, if we narrow our world down to people's skin colors, these two may be considered as antonyms.
Other examples for complementary antonyms are day and night, republicans and democrats, dead and alive, true and false, stay and go. As you may have noticed, they can be of different parts of speech (noun, adjective, verb), but the two words within each pair both share the same part of speech (comment if you can think of a negative example!).
Figure 3: Should I stay or should I go now? |
So are we cool with complementary antonyms? Well, not quite. If you say that female and male are complementary antonyms, people might tell you that gender is not binary, but a spectrum. Some of these antonyms actually have other, uncommon or hidden options. Like in coma for the dead and alive pair, libertarians in addition to republicans and democrats, etc. Still, these pairs are commonly considered as antonyms, since there are two main options.
So what have we learned about complementary antonyms? That they are borderline, they depend on the context in which they occur, and they might be offensive to minorities. Use them with caution.
The Good, the Bad [and the Ugly?] (graded antonyms)
Even the strictest definition of antonymy includes pairs of gradable adjectives representing the two ends of a scale. Some examples are hot and cold, fat and skinny. young and old, tall and short, happy and sad. Set theory and my binary test aren't suitable for these types of antonyms.
Set theory isn't adequate because a gradable adjective can't be represented as a set, e.g. "the set of all tall people in the world". The definition of a graded adjective changes depending on the context and is very subjective. For example, I'm relatively short, so everyone looks tall to me, while my husband is much taller than me, so he is more likely to say someone is short. The set of tall people in the world changes according to the person who defines it.
In addition, by definition, testing for binarism fails. A cup of coffee can be more than just hot or cold. It can be boiling, very hot, hot, warm, cool, cold or freezing. And we can add more and more discrete options to the scale of coffee temperature.
What makes specific pairs of gradable adjectives into antonyms? While the definition requires that they would be in the ends of the scale, intuitively I would say that they should only be symmetric in the scale, e.g. hot and cold, boiling and freezing, warm and cool, but not hot and freezing.
Antonymy in NLP
While there is a vast linguistics literature about antonyms, I'm less familiar with it, and I'm going to focus on some observations and interesting points about antonymy that appear in NLP papers that I read.
The natural logic formulation ([1]) makes a distinction between "alternation" - words that are mutually exclusive, and "negation" - words that are both mutually exclusive and cover all the options in the world. While I basically claimed in this post that the former is not antonymy, we've seen that in some cases, if the two words represent the two main options, they may be considered as antonyms.
However, people tend to disagree on these borderline word pairs, so sometimes it's easier to conflate them under a more loose definition. For example, [2] had an annotation task in which they asked crowdsourcing workers to choose the semantic relation that holds for a pair of terms. They followed the natural logic relations, but decided to merge "alternation" and "negation" into a weaker notion of "antonyms".
More interesting observations about antonyms, and references to linguistic papers, can be found in [3], [4], and [5].
After we established that humans find it difficult to decide whether two words are antonyms, you must be wondering whether automatic methods can do reasonably well on this task. There has been a lot of work on antonymy identification (see the papers in the references, and their related work sections). I will focus on my little experience with antonyms. We've just published a new paper ([6]) in which we analyze the roles of two main information sources used for automatic identification of semantic relations. The task is defined as follows: given a pair of words (x, y), determine what is the semantic relation that holds between them, if any (e.g. synonymy, hypernymy, antonymy, etc). As in this post, we've used information from x and y's joint occurrences in a large text corpus, as well as information about the separate occurrences of each word x and y. We found that among all the semantic relations we tested, antonymy was almost the hardest to identify (only synonymy was worse).
The use of information about separate occurrences of x and y is based on the distributional hypothesis, which I've mentioned several times in this blog. Basically, if we look at the distribution of neighboring words of a word x, it may tell us something about the meaning of x. If we'd like to know what's the relation between x and y, we can compute something on top of the neighbor distributions of each word. For example, we can expect the distributions of x and y to be similar if x and y are antonyms, since one of the properties of antonyms is that they are interchangeable (a word can be replaced with its antonym and the sentence will remain grammatical and meaningful). Think about replacing tall with short, day with night, etc. The problem is that this is similarly true for synonyms - you can expect high and tall to also appear with similar neighboring words. So basing the classification on distributional information may lead to confusing antonyms with synonyms.
The joint occurrences may help identifying the relation that holds between the words in a pair, as some patterns indicate a certain semantic relation - for instance, "x is a type of y" may indicate that y is a hypernym of x. The problem is that patterns that are indicative of antonymy, such as "either x or y" (either cold or hot) and "x and y" (day and night), may also be indicative of co-hyponymy (either tuna or chicken salad). In any case, this seems far less bad than confusing antonyms with synonyms; in some applications it may suffice to know that x and y are mutually exclusive, with no importance to whether they are antonyms or co-hyponyms. For instance, when you query a search engine, you'd like it to retrieve results including synonyms of your search query (e.g. returning New York City subway map when you search for NYC subway map), but you wouldn't want it to include mutually exclusive words (e.g. Tokyo subway map).
One last thing to remember is that these automatic methods are trained and tested on data collected from humans. If we can't agree on what's considered antonymy, we can't expect these automatic methods to succeed in this any better than we do.
References
[1] Natural Logic for Textual Inference. Bill MacCartney and Christopher D. Manning. RTE 2007.
[2] Adding Semantics to Data-Driven Paraphrasing. Ellie Pavlick, Johan Bos, Malvina Nissim, Charley Beller, Benjamin Van Durme, and Chris Callison-Burch. ACL 2015.
[2] Adding Semantics to Data-Driven Paraphrasing. Ellie Pavlick, Johan Bos, Malvina Nissim, Charley Beller, Benjamin Van Durme, and Chris Callison-Burch. ACL 2015.
[3] Computing Word-Pair Antonymy. Saif Mohammad, Bonnie Dorr and Graeme Hirst. EMNLP 2008.
[4] Computing Lexical Contrast. Saif Mohammad, Bonnie Dorr, Graeme Hirst, and Peter Turney. CL 2013.
[5] Taking Antonymy Mask off in Vector Space. Enrico Santus, Qin Lu, Alessandro Lenci, Chu-Ren Huang. PACLIC 2014.
[6] Path-based vs. Distributional Information in Recognizing Lexical Semantic Relations. Vered Shwartz and Ido Dagan. CogALex 2016.
[4] Computing Lexical Contrast. Saif Mohammad, Bonnie Dorr, Graeme Hirst, and Peter Turney. CL 2013.
[5] Taking Antonymy Mask off in Vector Space. Enrico Santus, Qin Lu, Alessandro Lenci, Chu-Ren Huang. PACLIC 2014.
[6] Path-based vs. Distributional Information in Recognizing Lexical Semantic Relations. Vered Shwartz and Ido Dagan. CogALex 2016.
> the two words within each pair both share the same
ReplyDelete> part of speech (comment if you can think of
> a negative example!)
>
Well, there is some discussion on this in the linguistic literature (e.g., Fellbaum, 1995). But typical examples involve words that are derived from (or are at least related to) a word from the other word category (e.g. loving/hate, loved/hatred).
In general, there are lot of ideas/definitions/studies in the linguistic literature that this article only touches the surface of (e.g., the difference between opposites/antonmys and the question whether antonymy is a relation between words or between concepts). If you were interested in more background material, I would recommend (parts of) Stephen Jones' book. That and work by M. Lynne Murphy were basically the starting points for our work on paradigmatic relations at ACL 2014 (apologies for the shameless plug :)).
Fellbaum, C. 1995. Co-Occurrence and Antonymy. International Journal of Lexicography, 8 (1995), pp. 281--303.
Jones, S. 2002. Antonymy: a corpus-based perspective. London, Routledge.
Roth, M. and Schulte im Walde, S. 2014. Combining word patterns and discourse markers for paradigmatic relation classification. Proceedings of ACL (Vol. 2), pp. 524--530.
Hi Michael, thanks for the comment and the references! I actually read your ACL paper a few months ago, but I read it as related work for hypernymy detection so it slipped my mind now :) I will give it another reading and add the other references to my reading list. Thanks!
DeleteBy the way, most of the posts in my blog only touch the surface of the topics they discuss, both because I write about topics other than my main research, and because I try to simplify it. So comments with complementary information and references are always welcome!
Regarding the same POS tag, I thought about related contrasting words (e.g. thirsty/drink), which I don't consider as antonyms, but I didn't think of derivations. That makes sense.
Ah yes, we considered both antonymy and hypernymy in the ACL paper. To be honest, I have always felt that antonyms are more interesting than hypernyms linguistically, and that detecting them is probably more useful for downstream applications (to reuse your search example to illustrate this point: should the results include "public transport" maps in addition to "subway" maps? Maybe, maybe not?). So I was glad when I saw your paper(s) at the CogALex workshop!
DeleteYour post to me suggests that you are interested in continuing work in this direction. Is that correct? If so, I think that would be great. :)
I think it depends on the application, and in the case of search it's probably a trade-off between precision (not to include antonyms of the query) and recall (to include also hyponyms/hypernyms of the query). It's best if we could have a model that accurately predicts all these relations, but as you know we're not there yet :)
DeleteSure! I currently work on predicate alignment, but in the future I definitely plan to go back to work on the relations that are harder to recognize (synonyms and antonyms).