Employing External Rich Knowledge for Machine Comprehension

1 Employing External Rich Knowledge for Machine Comprehen...
Author: Brenda Rodgers
0 downloads 0 Views

1 Employing External Rich Knowledge for Machine ComprehensionBingning Wang, Shangmin Guo, Kang Liu, Shizhu He, Jun Zhao National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences (CASIA)

2 What is Machine Comprehension?One night I was at my friend's house where he threw a party. We were enjoying our dinner at night when all of a sudden we heard a knock on the door. I opened the door and saw this guy who had scar on his face. (......)As soon as I saw him I ran inside the house and called the cops. The cops came and the guy ran away as soon as he heard the cop car coming. We never found out what happened to that guy after that day. Document Question 1: What was the strange guy doing with the friend? A) enjoying a meal B) talking about his job C) talking to him *D) trying to beat him 2: Why did the strange guy run away? *A) because he heard the cop car B) because he saw his friend C) because he didn't like the dinner D) because it was getting late Candidate answers Machine comprehension is a natural language understanding tasks in which given a question, we must select the best answer from the four answer candidates based on the context documents. This figure shows an example of machine comprehension.

3 Dataset Other Resources: MCtest: Document: 150-300 wordsRichardson M, Burges C J C, Renshaw E. MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text //EMNLP. 2013, 1: 2. Documents Other Resources: Document: words Facebook : bAbI1 Google Deepmind: CNN and Daily Mail articles2 MCTest is a typical machine comprehension dataset in which each context story contains hundreds of words and each question spans about 10 words. The stories are relatively easy that are restricted to a 7-year-old can understand. There exits many other datasets also refers to machine comprehension ,for example, facebooks bAbi data set contains 20 types of questions which is so-called’AI-COMPLETE’, however, this dataset are syntactic and the vocabulary size are about 40 to 50, even a rull-based systems could solve them totally correct. CNN and daily mail or CBTest are two much larger datasets in which the target is to filling the slot in a statement based on the context, however in this cloze-style questions the target answer are almost noun which limited the inference ability that required to answer the question. Stanford Processbank is another machine comprehension dataset which is also accompany with semantic role labeled document, however, these dataset are confined to biological process. Question: About 10 words Facebook: CBTest3 Stanford: ProcessBank 4 [1] Weston J, Bordes A, Chopra S, et al. Towards ai-complete question answering: A set of prerequisite toy tasks[J]. arXiv preprint arXiv: , 2015. [2] Hermann K M, Kocisky T, Grefenstette E, et al. Teaching machines to read and comprehend[C]//Advances in Neural Information Processing Systems. 2015: [3] Hill F, Bordes A, Chopra S, et al. The Goldilocks Principle: Reading Children‘s Books with Explicit Memory Representations[C] ICLR [4] Berant J, Srikumar V, Chen P C, et al. Modeling Biological Processes for Reading Comprehension[C]//EMNLP

4 From which can we make improvement?Neural architectures that have shown great advantage in natural language inference... Attention based NN, Memory Networks, Neural Turing Machine … But these methods are data hungry that require a lot of annotated data. However , in MC the data are limited Training document # Training question # MC160 120 480 MC500 400 1200 Traditional machine comprehension models are sometimes resort to feature engenering systems, which exhuastly exploit off-the-shelf NLP tools such as pos-tags, dependency parsings or co-reference resolution to generate features. However, recent years, with the development of deep learning, many neural networks has been proposed, for example,……, have achieved great success in NLP tasks such as question answering. Instead of gathering more and more tagged MC data, can we employing existing resources to help MC?

5 = Answer selection+Answer generation Machine ComprehensionEmploying External Rich Knowledge for Machine Comprehension Machine Comprehension = Answer selection+Answer generation In this work, we treat the machine comprehension as a traditional question answering problem which is divided to the two sub-tasks, which is answer selection and answer generation

6 AS Answer selection Answer SelectionTom had to fix some things around the house. He had to fix the door. He had to fix the window. But before he did anything he had to fix the toilet. Tom called over his best friend Jim to help him. Jim brought with him his friends Molly and Holly.[…].They all pushed on the window really hard until finally it opened. Once the window was fixed the four of them made a delicious dinner and talked about all of the good work that they had done. Tom was glad that he had such good friends to help him with his work. Answer Selection Definition: Given a question, find the best answer sentence from a candidate sentence pool… Q:What did Tom need to fix first? A) Door B) House C) Window *D) Toilet In the first stage, given a question, we select the best sentence in the document that can answer the questions, and this is a typical answer selection problem in QA and there exits many external answer selection resources such as .. DATASET : WikiQA, TrecQA, InsuranceQA …

7 Answer generation RTE Recognizing Textual EntailmentTom had to fix some things around the house. He had to fix the door. He had to fix the window. But before he did anything he had to fix the toilet. Tom called over his best friend Jim to help him. Jim brought with him his friends Molly and Holly.[…].They all pushed on the window really hard until finally it opened. Once the window was fixed the four of them made a delicious dinner and talked about all of the good work that they had done. Tom was glad that he had such good friends to help him with his work. But before he did anything he had to fix the toilet. Recognizing Textual Entailment RTE Definition: Given a pair of sentence, judge whether there exit ENTAILMENT, NEUTRAL or CONTRADICTION relationship between them. Q:What did Tom need to fix first? A) Door B) House C) Window *D) Toilet Tom need to fix toilet first. And then , given a supported sentence from the answer selection stage, we can transform the candidate answer and question into a statement and judge whether the support sentence can entail the hypothesis statement, and this is a typical recognizing textual entailment problems. And there are many external RTE resources such as SICK and SNLI DATASET : SICK, SNLI …

8 Answer Selection This is our answer selection architecture which is a typical attention-based recurrent nueral networks models, we represent the question by rnn, and then, when represent the target sentences, we use the attention information from question to weight each time-step hidden representation, and then average this weighted hidden state to get a question-attended sentence representation.

9 Answer Selection a* a WIKIQA External AS model Answer Selection RTEDocuments Answer Selection p(S|q, D) supervision a* RTE p(a|q, S) a External AS model supervision WIKIQA However, from this architecture, the only supervision to out answer selection model is the limited final golden standard A-B-C-D labels, and the error propagation from the next stage may led our answer selection model poorly fitted, we will report this in experiment. In this work, we train a complicated bi-directional LSTM attention based models on external WIKIQA answer selection dataset, and use this model to give additional supervision to our inner answer selection models.

10 Answer Selection Add external AS knowledge as a supplementary supervision to our AS model We minimize the the cross-entropy between the output of our inner answer selection models and the output of the external answer selection models and then use this adcritera as ditional training objective in our target. And we use a hyperparameter eta to adjust how much we learn from external answer selection knowledge, when set eta to zero, our answer selection model is totally self-supervised by the golden standard labels, when set eta to infinity, our AS models are totally fitted to external answer selection model.

11 Question TransformationWe should first transform the question and a answer to a statement … -11 rules based on dependency tree For the answer generation, we should first transform the candidate answer and the question into a statement, in this work, we use 11 rules based on dependency tree to do this transformation, an example of this transformation in this page.

12 DATASET: STANFORD SNLIRTE Similar with answer selection, we also use a attention based RNN models on the external SNLI dataset, we use a 3-d tensor on top of the sentence representation to get the final RTE label. DATASET: STANFORD SNLI

13 RTE Combine simple robust lexical matching method with external RTEWhere 𝑠 𝑞 − denotes the transformed statement which replaces the answer with a common word ’ANSWER’ Similarity: ROUGE-1,2 However, we fond that in the answer generation, sometime after the answer selection process, the answer is obvious given the supporting sentence, so we … however instead of use a pre-defined hyper-parameter beta we use the ROUGE1,2 critera to determine how much we depend on lexical matching, and the lexical match are based on …. Constituency match: In constituency tree, subtree are denoted as triplet: a parent node and its two child nodes. We add the number of triplet that I: the POS of three nodes are matching. II: the head words of parent nodes matching. 𝑝 𝑠 𝑞 𝑠; 𝜃 1 Dependency match: In dependency tree, a depen- dency is denoted as (u,v,arc(u,v)) where arc(u,v) de- note dependency relation. We add two terms similarity: I:u1 =u2 ,v1 =v2 and arc(u1, v1 )=arc(u2 , v2 ).II: whether the root of two dependency tree matches.

14 Model Architecture This figure shows the architecture of our model, the green circle and brown circle is where we exploying external knowledge.

15 Result The influence of η For our MC model This figure shows the …….

16 External Model Result Yin W, Schütze H, Xiang B, et al. ABCNN: Attention-Based Convolutional Neural Network for Modeling Sentence Pairs[J]. TACL Rocktäschel T, Grefenstette E, Hermann K M, et al. Reasoning about Entailment with Neural Attention[C]. ICLR

17 Result

18 谢谢 Thank you ありがとう 감사합니다 Merci Danke شكرا