Monday, July 13, 2015

Lexical Inference

After I dedicated the previous post to the awesome field of natural language processing, in this post I will drill down and tell you about the specific task that I'm working on: recognizing lexical inference. Most of the work that I will describe was done by other talented people. You can see references to their papers at the bottom of the post, in case you would like to read more about a certain work.

I'll start by defining what lexical inference is. We are given two terms, x and y (a term is a word such as cat or a multi-word expression such as United States of America). We would like to know whether we can infer the meaning of y from x (denoted by → y throughout this post).

For example, we can infer 
animal from cat, because when we talk about a cat we refer to an animal. In general, y can be inferred from x if they hold a certain lexical or semantic relation; for example, if y is a x (cat animal, Lady Gaga singer), if x causes y (flu fever), if x is a part of y (London England), etc. 


Now would be a good time to ask - why is this task important? We know that a cat is an animal. How would it help us if the computer can automatically infer that? I'll give a usage example. Let's say you use a search engine and type the query "actor Scientology" (or "actors engaged in Scientology", if you don't search by keywords). You expect the search engine to retrieve the following documents:

Figure 1: search results for the query "actor Scientology" that don't directly involve the word "actor"

since they are talking about a certain actor (Tom Cruise or John Travolta) and Scientology. However, what if these documents don't contain the word actor? The search engine needs to know that Tom Cruise → actor to retrieve the first document, and that John Travolta → actor to retrieve the second.
There are many other applications, and in general, knowing that one term infers another term helps dealing with language variability (there is more than one way of saying the same thing).

People have been working on this task for many years. As many other NLP tasks, this one is also difficult. There are two main approaches to recognize lexical inference:
  • Resource-based: in this approach, the inference is based on knowledge from hand-crafted resources, that specify the semantic or lexical relations between words or entities in the world. In particular, the resource which is usually used for this task is WordNet, a lexical database of the English language. WordNet contains words which are connected to each other via different relations, such as (tail, part of, cat) and (cat, subclass of, feline).1  See figure 2 for an illustration of WordNet.

    This approach is usually very precise (it is correct in most of the times that it says that 
    → y), because it relies on knowledge which is quite precise. However, its coverage (the percentage of times in which it recognizes that → y, out of all the times that → y is true) is limited, because some of the knowledge needed for the inference may be absent from the resource.
    Figure 2: an excerpt of WordNet - a lexical database of the English language

  • Corpus-based: this approach uses a huge text called "corpus" (e.g. all the English articles in Wikipedia) which is supposed to be representative of the language. The inference is based on the statistics of occurrences of x and y in the corpus. There are several ways to use a corpus to recognize lexical inference:

    • pattern-based approach - there are some patterns such as "and other y" or "y such as x" that indicate that → y; if you find it difficult to understand, think about "animals such as cats" and "cat and other animals" and ignore the plural/singular. If x and y frequently occur in the corpus in such patterns, this approach will recognize that → y. It is not enough to observe one or two occurrences; think about the sentence "my brother and other students". It may occur in the corpus, but this is not a general phenomenon: student is not a common attribute of brother. Positive examples such as cat and animal will probably occur more frequently in these patterns in the corpus. 

      The first method defined these patterns manually [1] . A later work found such patterns automaticall[2]. This work was highly referenced and used. It is quite precise and also has a good coverage. However, it requires that x and y occur together in the corpus, and some words tend not to occur together, even though they are highly related; for instance, synonyms (e.g. elevator and lift).
    • distributional approach - the second approach solves exactly this. It is based on a linguistic hypothesis [3] that says that if words occur with similar neighboring words, then they tend to have similar meanings (e.g. elevator and lift will both appear next to downupbuildingfloor, and stairs). There has been plenty of work in this approach: earlier methods defined some similarity measure between words which was based on the neighbors (the more common neighbors they share, the more similar they are) [4],[5]. In recent years, some automatic methods (that don't require defining a similarity measure) were developed (I might elaborate on these in another post, but it requires knowledge in topics that I haven't covered yet).
    Corpus-based methods, and in particular distributional ones, have a much higher coverage than resource-based methods, because they utilize huge texts. The amount of texts available on the web is incredible, as opposed to structured knowledge. However, they are much less precise. The distributional hypothesis says something about the similarity of x and y and it is a vague definition. Just because x and y are similar (what does that even mean?) it doesn't mean that we can infer x from y or vice versa; for instance, the words football and basketball are similar, and will probably share some common neighbors such as ball, player, team, match, and win. However, you can't infer one from the other. Moreover, distributional methods may say that hot and cold are similar, because both occur with weather, temperature, drink, water, etc. Now this is too much. Not only that hot ↛ cold and cold ↛ hot, but they mean exactly the opposite!

So what have we been doing?
We developed a new resource-based method for recognizing lexical inference [6]. We weren't going to compromise on precision, but we still wanted to improve upon the coverage of prior methods. In particular, we found that prior methods are incapable of recognizing inferences that contain recent terminology (e.g. social networks) and named-entities (called proper-names, e.g. Lady Gaga). This simply happens because prior methods are based on WordNet, and these terms are absent from WordNet; WordNet is an "ontology of the English language", so by definition it's not supposed to contain world-knowledge about named entities. Also, it hasn't been updated in years, so it doesn't cover recent terminology.

We used other structured knowledge resources that contain exactly this kind of information, are much larger than WordNet and are frequently updated. These resources contain information such as (Lady Gaga, occupation, singer) and (singer, subclass, person), that can indicate that Lady Gaga → singer and Lady Gaga person. However, they may also contain information such as (Lady Gaga, producer, Giorgio Moroder) but that does not indicate that Lady Gaga → Giorgio Moroder. As in WordNet, we needed to define which relations in the resource are relevant for lexical inference. For instance, the occupation relation is relevant for lexical inference, because a person infers its occupation (Lady Gaga → singerBarack Obama → president).

As opposed to WordNet-based methods, which only need to select relevant relations out of the few relations WordNet defines, it would be excruciating to do the same for the resources we used. They contain thousand of relations. So we developed a method that automatically recognizes which resource relations are indicative of lexical inference. Then, if it finds that x and y are connected to each other via a path containing only relevant relations, it predicts that → y. So in our previous example, since occupation and subclass were found indicative of lexical inference, then Lady Gaga → person. 

Similarly to the example, we've made successful inferences, and in particular inferences containing proper-names that were not captured by previous methods. We also maintained a very high precision. This is basically the simplified version of our paper.

So, is it perfect now?
Well... not exactly. First of all, our coverage is still lower than that of the corpus-based methods (but with higher precision, usually). Second, there are still some open issues left. I'll give one of them as an example, as this post is already very long (and I challenge you to tl;dr it).

Answer the following question:
apple __ fruit?
(a) →
(b) ↛

Well, I know this seems like a trivial question, but the answer is - it depends!
Are we talking about  or about?
The problem in determining whether apple → fruit, is that the word apple has two senses (meanings). In one of its senses, apple → fruit, and in the other, apple ↛ fruitIn order to decide correctly, we need to know which of the senses of apple is the one we are being asked about. 

Figure 3: I've just seen this on my Facebook feed after publishing the post and I had to add it :)

As I mentioned before, recognizing lexical inference is usually a component in some NLP application. In such application, xy or both x and y are part of a text, and the application asks "does x infer y?", "what can we infer from x?" or "what infers y?". If x=apple, and we would like to know whether it infers y=fruit, the solution (for humans) would be to look at the texts. 

Say we have the sentence I ate a green apple for breakfast. We can easily understand that the correct sense of apple in this sentence is fruit. How did we know that? We noticed words like ate, breakfast and green that are related to apple in the sense of fruit (and unrelated to Apple the company). There are already automatic methods that do that (with some success, of course). So one of the next challenges is to incorporate them and apply context-sensitive lexical inference. In this case, infer that I ate a fruit and not that I ate a company. I promise to update in case I have any progress with that.

[1] Hearst, Marti A. "Automatic acquisition of hyponyms from large text corpora." Proceedings of the 14th conference on Computational linguistics-Volume 2. Association for Computational Linguistics, 1992.  
[2] Snow, Rion, Daniel Jurafsky, and Andrew Y. Ng. "Learning syntactic patterns for automatic hypernym discovery." Advances in Neural Information Processing Systems 17. 2004.
[3] Harris, Zellig S. "Distributional structure." Word. 1954. 
[4] Weeds, Julie, and David Weir. "A general framework for distributional similarity." Proceedings of the 2003 conference on Empirical methods in natural language processing. Association for Computational Linguistics, 2003. 
[5] Kotlerman, Lili, et al. "Directional distributional similarity for lexical inference." Natural Language Engineering 16.04: 359-389. 2010. 
[6] Shwartz, Vered, Omer Levy, Ido Dagan, and Jacob Goldberger. "Learning to Exploit Structured Resources for Lexical Inference." Proceedings of the Nineteenth Conference on Computational Natural Language Learning.  Association  for Computational Linguistics. 2015. 

1 These relations actually have less friendly names: holonym/meronym and hyponym/hypernym.

Saturday, July 4, 2015

Natural Language Processing

I'm afraid I'm pretty lousy at explaining people what I do. I think my parents learned to memorize the key words "Natural Language Processing" so that they can tell their friends about my occupation. Another relative of mine is under the illusion that my current research is about to replace Google search, just as soon as I'm done (I swear I never told her anything like that!). When I try to simplify it, I sometimes tell people that it is a subfield of Artificial Intelligence. Then again, I think it makes some people imagine me talking with a robot as my everyday routine.

In this post I would like to tell you a little bit about what Natural Language Processing is and why I find it such an interesting field of research. In the following post, I will elaborate on what I actually (try to) do in this field.

Natural Language Processing (NLP, not to be confused with the other NLP) is mainly about filling the gap between how humans communicate (with natural languages such as English) and what computers understand (machine language). When this task will be fully solved, you will be able to communicate with your computer (or your tablet, cell phone, your smart refrigerator and your car) just as you do with another human being.

"Computers are incredibly fast, accurate and stupid; humans are incredibly slow, inaccurate and brilliant; together they are powerful beyond imagination." (Albert Einstein)1

Computers are basically completely stupid, just as Albert Einstein pointed out. When you are engaged in a conversation with a person, each of you understands the meaning of what the other is saying. Computers basically understand only machine language, and are programmed to understand very specific instructions on top of this language. Human language is much more complex than that; you can say one thing in multiple ways, for example "where is the nearest sushi restaurant?" and "can you please give me addresses of sushi places nearby?"-- this is called language variability. Sometimes you say something that can have several meanings, like "time flies like an arrow" -- this is called language ambiguity. A human being usually understands the correct meaning in the context of the conversation. A computer... doesn't really.

However, human knowledge is limited, while today, in the big data era, the computer has access to almost unlimited knowledge. So what if we taught computers to understand us? We can have the answers to all the questions in the universe!

Of course, some of these applications already exist. If you have an Android phone, you can say "Ok Google" and then ask a question that Google (with some success) will answer. The same for Siri on Apple devices. However, this is an ongoing research, and none of these applications is perfect yet.

In addition to human language understanding, this field is also occupied with teaching computers to generate human language, so that they can fool you to think you are talking with an actual human being. I'm sure you have encountered virtual assistants:

I just want to talk. Is that a problem?
Such applications require both the understanding and generation of human language. I'm sure that with this example I've now convinced you that there's still plenty of work in this field. It is quite fun to challenge these virtual agents with complex language and topics they weren't trained to answer. I recommend it as a game :)

So how can NLP help us in our everyday life? In many ways. Here is a small subset of NLP tasks you may have encountered in applications:

  • Speech to text / text to speech - translation of spoken words into text and vice versa. The first and last step of applications in which you speak with a device. The internal processing is done over written text. NLP is actually a very small part of this task, which is related to electrical engineering, machine learning and other fields.
  • Machine translation - in two words: Google Translate.
  • Language model - determines how likely a certain sentence is in the language. For instance, the sentence "I'm reading this post now" is more likely to be said than "This post now reading I'm", even though both sentences contain only correct English words; and the sentence "I called my mother on the" is more likely to end with "phone" than with "banana". It is used in many applications, for example, the auto-suggest in your phone. Though it sometimes has funny suggestions, it can be very helpful.

    Mmm... what was that offer again?
  • Automatic summarization - you know you don't have the patience to read long news / entertainment articles on the web, reviews of restaurants in TripAdvisor, and not to mention any texts you need to read for work or school purposes. This application takes long texts and provides you with a concise version of them.
  • Information Retrieval - support search engines and improve search results by understanding what the user really means in his query. For instance, you may have noticed the special search results you get on Google when searching for things such as time, weather, and flight details:

    If you ever wondered how Google is so smart to understand you, I think you may have some of the answer now.

And here is a cool glimpse into the future (though some of it is already implemented, but definitely not common): when computers can generate human language, your refrigerator can tell you "hey, you're running out of milk - I added it to your grocery list". That would also require some help from other fields such as computer vision (enabling scanning the bar codes of the milk and other products inside the fridge). I think it's a cool example, though.

So now you see that you've actually encountered applications of NLP many times before, you just couldn't name it. I hope I managed to excite you about NLP, and hopefully I will also succeed with other topics in the next posts.

Small survey question: when you search something in Google (or any other search engine of your preference), is your query:
(1) a full question, such as "What is the height of Mount Everest?"
(2) composed of key words, such as "height Everest"

The results will be published when there will be enough readers to infer a meaningful statistical conclusion (probably never).

1 05/07/2015: Thanks to Yuval who doubted the authenticity of this quote, it turns out that it probably wasn't Einstein who said it, though it is not clear who did.

Thursday, July 2, 2015

Take #1

I've recently had my first scientific paper published. Some of my family members and friends were truly interested in what I do polite enough to show interest in the result of a year of hard work, and they actually tried to read the paper. Most of them lost it during reading the abstract. Some of them probably gave up reading the title.

This incident was my inspiration for this blog; I actually think that every scientific concept could be explained in a way that non-professionals could understand. I'm still relatively new in my field, and though I'm capable of reading and understanding papers (at least those related to my field of interest), I understand much better when being explained with examples and easy words.

I hope that after this long introduction, I will find the time to write in this blog. I plan to convey my thoughts about some work-related topics and everyday life in the context of computer science. And in a more personal tone, I want to give people a glimpse to what I do, and deprive them of their excuses not to talk about my work ;)