Saturday, November 12, 2016

Question Answering

In the my introductory post about NLP I introduced the following survey question: when you search something in Google (or any other search engine of your preference), is your query:
(1) a full question, such as "What is the height of Mount Everest?"
(2) composed of keywords, such as "height Everest"

I never published the results since, as I suspected, there were too few answers to the survey, and they were probably not representative of the entire population. However, my intuition back then was that only older people are likely to search with a grammatical question, while people with some knowledge in technology would use keywords. Since then, my intuition was somewhat supported by (a) this lovely grandma that added "please" and "thank you" to her search queries, and (b) this paper from Yahoo Research that showed that search queries with question intent do not form fully syntactic sentences, but are made of segments (e.g. [height] [Mount Everest]). 

Having said that, searching the web to get an answer to a question is not quite the same as actually asking the question and getting a precise answer:

Here's the weird thing about search engines. It was like striking oil in a world that hadn't invented internal combustion. Too much raw material. Nobody knew what to do with it. 
Ex Machina

It's not enough to formulate your question in a way that the search engine will have any chance of retrieving relevant results. Now you need to process the returned documents and search for the answer. 

Getting an answer to a question by querying a search engine is not trivial; I guess this is the reason so many people ask questions in social networks, and some other people insult them with Let me Google that for you

The good news is that there are question answering systems, designed to do exactly that: automatically answer a question given as input; the bad news is that like most semantic applications in NLP, it is an extremely difficult task, with limited success. 

Question answering systems have been around since the 1960s. Originally, they were developed to support natural language queries to databases, before web search was available. Later, question answering systems were able to find and extract answers from free text.

A successful example of a question answering system is IBM Watson. Today Watson is described by IBM as "a cognitive technology that can think like a human", and is used in many of IBM's projects, not just for question answering. Originally, it was trained to answer natural logic questions -- or more precisely, to form the correct question to a given answer, as in the television game show Jeopardy. On February 2011, Watson competed in Jeopardy against former winners of the show, and won! It had access to millions of web pages, including Wikipedia, which were processed and saved before the game. During the game, it wasn't connected to the internet (so it couldn't use a search engine, for example). The Jeopardy video is pretty cool, but if you have no patience watching it all (I understand you...), here's a highlight:

HOST: This trusted friend was the first non-dairy powdered creamer. Watson?
WATSON: What is milk?
HOST: No! That wasn’t wrong, that was really wrong, Watson.

Another example is the personal assistants: Apple's Siri, Amazon's Alexa, Microsoft's Cortana, and Google Assistant. They are capable of answering an impressively wide range of questions, but it seems they are often manually designed to answer specific questions.

So how does question answering work? I assume that each question answering system employs a somewhat different architecture, and some of the successful ones are proprietary. I'd like to present two approaches. The first is a general architecture for question answering from the web, and the second is question answering from knowledge bases.

Question answering from the web

I'm following a project report I submitted to a course 3 years ago, in which I exemplified this process on the question "When was Mozart born?". This example was originally taken from some other paper, which is hard to trace now. Apparently, it is a popular example in this field.

The system preforms the following steps:

A possible architecture for a question answering system. 
  • Question analysisparse the natural language question, and extract some properties:

    • Question type - mostly, QA systems support factoid questions (a question whose answer is a fact, as in the given example). Other types of questions, e.g. opinion questions, will be discarded at this point.

    • Answer type - what is the type of the expected answer, e.g. person, location, date (as in the given example), etc. This can be inferred with simple heuristics using the WH-question word, for example who => person, where => location, when => date. 

    • Question subject and object - can be extracted easily by using a dependency parser. These can be used in the next step of building the query. In this example, the subject is Mozart.

  • Search - prepare the search query, and retrieve documents from the search engine. The query can be an expected answer template (which is obtained by applying some transformation to the question), e.g. "Mozart was born in *". Alternatively, or in case the answer template retrieves no results, the query can consist of keywords (e.g. Mozart, born).

    Upon retrieving documents (web pages) that answer the query, the system focuses on certain passages that are more likely to contain the answer ("candidate passages"). These are usually ranked according to the number of query words they contain, their word similarity to the query/question, etc.

  • Answer extraction - try to extract candidate answers from the candidate passages. This can be done by using named entity recognition (NER) that identifies in the text mentions of people, locations, organizations, dates, etc. Every mention whose entity type corresponds to the expected answer type is a candidate answer. In the given example, any entity recognized as DATE in each candidate passage will be marked as a candidate answer, including "27 January 1756" (the correct answer) and "5 December 1791" (Mozart's death date).

    The system may also keep some lists that can be used to answer closed-domain questions, such as "which city [...]" or "which color [...]" that can be answered using a list of cities and a list of colors, respectively. If the system identified that the answer type is color, for example, it will search the candidate passage for items contained in the list of colors. In addition, for "how much" and "how many" questions, regular expressions identifying numbers and measures can be used.

  • Ranking - assign some score for each candidate answer, rank the candidate answers in descending order according to their scores, and return a list of ranked answers. This phase differs between systems. The simple approach would be to represent an answer by some characteristics (e.g. surrounding words) and learn a supervised classifier to rank the answers.

    An alternative approach is to try to "prove" the answer logically. In the first phase, the system creates an expected answer template. In our example it would be "Mozart was born in *". By assigning the candidate answer "27 January 1756" to the expected answer template, we get the hypothesis "Mozart was born in 27 January 1756", which we would like to prove from the candidate passage. Suppose that the candidate passage was "[...] Wolfgang Amadeus Mozart was born in Salzburg, Austria, in January 27, 1756. [...]", a person would know that given the candidate passage, the hypothesis is true, therefore this candidate answer should be ranked high.

    To do this automatically, Harabagiu and Hick ([1]) used a textual entailment system: the system receives two texts and determines whether if the first text (text) is true, it means that the second one (hypothesis) is also true. Some of these systems return a number, indicating to what extent this is true. This number can be used for ranking answers.

    While this is a pretty cool idea, the unfortunate truth is that textual entailment systems do not perform better than question answering systems, or very good in general. So reducing the question answering problem to that of recognizing textual entailment doesn't really solve question answering. 

Question answering from knowledge bases

A knowledge base, such as Freebase/Wikidata and DBPedia, is a large-scale set of facts about the world in a machine-readable format. Entities are related to each other via relations, creating triplets like (Donald Trump, spouse, Melania Trump) and (idiocracy, instance of, film) (no association between the two facts whatsoever ;)). Entities can be people, books and movies, countries, etc. Example relations are birth place, spouse, occupation, instance of, etc. While these facts are saved in a format which is easy for a machine to read, I never heard of a human who searches information in knowledge bases. Which is too bad, since it contains an abundance of information.

So some researchers (e.g. [2], following [3]) came up with the great idea of letting people ask a question in natural language (e.g. "When was Mozart born?"), parsing the question automatically to relate it to a fact in the knowledge base, and answer accordingly.
This reduces the question answering task to understanding the natural language question, whereas querying for the answer from a knowledge base requires no text processing. The task is called executable semantic parsing. The natural language question is mapped into some logic representation, e.g. Lambda calculus. For example, the example question would be parsed to something like λx.DateOfBirth(Mozart, x). The logical form is then executed against a knowledge base; for instance, it would search for a fact such as (Mozart, DateOfBirth, x) and return x. 

Despite having the answer appear in a structured format rather than in free text, this task is still considered hard, because parsing a natural language utterance into a logical form is difficult.* 

By the way, simply asking Google "When was Mozart born?" seems to take away my argument that "searching the web to get an answer to a question is not quite the same as actually asking the question and getting a precise answer":

Google understands the question and answers precisely.

Only that it doesn't. Google added this feature to its search engine in 2012, in which it presents information boxes above the regular search results, for some queries and questions. They parse the natural language query and try to retrieve results from their huge knowledge base, known as Google knowledge graph. Well, I don't know exactly how they do it, but I guess that similarly to the previous paragraph, their main effort is in parsing and understanding the query, which can then be matched against facts in the graph.

[1] Methods for Using Textual Entailment in Open-Domain Question Answering. Sanda Harabagiu and Andrew Hick. In ACL and COLING 2006.
[2] Semantic Parsing on Freebase from Question-Answer Pairs. Jonathan Berant, Andrew Chou, Roy Frostig, and Percy Liang. In EMNLP 2013.
[3] Learning to parse database queries using inductive logic programming. John M. Zelle and Raymond J. Mooney. In AAAI 1996.

* If you're interested in more details, I recommend going over the materials from the very interesting ESSLLI 2016 course on executable semantic parsing, which was given by Jonathan Berant.


  1. The athletes who were not performing according to their abilities although they had all the capabilities are currently able to understand their issues and have possessed the capacity to amend them. Educating the athletes about the distinctive aspects of their games and legitimate trainings based on those standards have brought about high class comes about.personal statement proofreading

  2. This comment has been removed by the author.