Linguistic Essentials

1 Linguistic EssentialsNLP 1주차 강의 Linguistic Essentials (...
Author: Madeleine Walton
0 downloads 4 Views

1 Linguistic EssentialsNLP 1주차 강의 Linguistic Essentials (Ch 3)

2 통계적 언어처리 통계? 전혀 모르는 언어가 쓰인 부호를 본다고 하자. 그러면 통계적으로 언어 문제를 해결할 수 있는가? 예측?압축? 그러면 통계적으로 언어 문제를 해결할 수 있는가? 구글은 현재 통계적 방법으로 큰 성과를 거둠 중국어-영어 기계번역: 순수 통계에서 약간 벗어남 언어 현상을 반영한 통계적 접근 Long distance dependency Context-free???

3 Competence and PerformanceInnate  Learning, Categorical  Statistical CFG (Context free grammar) Performance

4 The Description of LanguageGrammar set of rules which describe what is allowable in a language Classic Grammars (Quirk et al.) meant for humans who know the language definitions and rules are mainly supported by examples no (or almost no) formal description tools; cannot be programmed Explicit Grammar (CFG, LFG, GPSG, HPSG, Dependency Grammars, Link Grammars,...) formal description can be programmed & tested on data (texts)

5 Levels of (Formal) Description6 basic levels (more or less explicitly present in most theories): and beyond (pragmatics/logic/...) meaning (semantics) (surface) syntax morphology Phonology(음운론) Phonetics(음성학, 발음학)/orthography(정서법, 맞춤법) Each level has an input and output representation output from one level is the input to the next (upper) level sometimes levels might be skipped (merged) or split

6 Phonetics/OrthographyInput: acoustic signal (phonetics) / text (orthography) Output: phonetic alphabet (phonetics) / text (orthography) Deals with: Phonetics: consonant & vowel (& others) formation in the vocal tract classification of consonants, vowels, ... in relation to frequencies, shape & position of the tongue and various muscles in the vocal t. intonation Orthography: normalization, punctuation, etc.

7 Phonology Input: sequence of phones/sounds (in a phonetic alphabet); or “normalized” text (sequence of (surface) letters in one language’s alphabet) [NB nota bene (note well): phones vs. phonemes] Output: sequence of phonemes (~ (lexical) letters; in an abstract alphabet) Deals with: relation between sounds and phonemes (units which might have some function on the upper level) e.g.: [u] ~ oo (as in book), [æ] ~ a (cat); i ~ y (flies)

8 Morphology Input: sequence of phonemes (~ (lexical) letters) Output:sequence of pairs (lemma, (morphological) tag) Deals with: composition of phonemes into word forms and their underlying lemmas (lexical units) + morphological categories (inflection, derivation, compounding) e.g. quotations ~ quote/V + -ation(der.V->N) + NNS.

9 (Surface) Syntax Input: sequence of pairs (lemma, (morphological) tag)Output: sentence structure (tree) with annotated nodes (all lemmas, (morphosyntactic) tags, functions), of various forms Deals with: the relation between lemmas & morph. categories and the sentence structure uses syntactic categories such as Subject, Verb, Object,... e.g.: I/PP1 see/VB a/DT dog/NN ~ ((I/sg)SB ((see/pres)V (a/ind dog/sg)OBJ)VP)S

10 Meaning (semantics) Input:sentence structure (tree) with annotated nodes (lemmas, (morphosyntactic) tags, surface functions) Output: sentence structure (tree) with annotated nodes (autosemantic -has meaning in isolation - lemmas, (morphosyntactic) tags, deep functions) Deals with: relation between categories such as “Subject”, “Object” and (deep) categories such as “Agent”, “Effect”; adds other cat’s e.g. ((I)SB ((was seen)V (by Tom)OBJ)VP)S ~ (I/Sg/Pat/t (see/Perf/Pred/t) Tom/Sg/Ag/f)

11 ...and Beyond Input: sentence structure (tree): annotated nodes (autosemantic lemmas, (morphosyntactic) tags, deep functions) Output: logical form, which can be evaluated (true/false) Deals with: assignment of objects from the real world to the nodes of the sentence structure e.g.: (I/Sg/Pat/t (see/Perf/Pred/t) Tom/Sg/Ag/f) ~ see(Mark-Twain[SSN:...],Tom-Sawyer[SSN:...])[Time:bef 99/9/27/14:15][Place:39ş19’40”N76ş37’10”W]

12 Phonology (Surface <-> Lexical) Correspondence“symbol-based” (no complex structures) En.: (stem-final change) lexical: b a b y + s (+ denotes start of ending) surface: b a b i e s (phonetic-related: bébì0s) Arabic: (interfixing, inside-stem doubling) (lit. ‘read’) lexical: kTb+uu+CVCCVC (CVCC...vowel/consonant pattern) surface: kuttub

13 Phonology Examples German (umlaut) (satz ~ sentence)lexical: s A t z + e (A denotes “umlautable” a) surface: s ä t z e (phonetic: zæcƏ, vs. zac) Turkish (vowel harmony) lexical: e v + l A r (←houses) b a š + l A r surface: e v l e r (heads→) b a š l a r Czech (e-insertion & palatalization) lexical: m a t E K + 0 (<-mothers/gen.) m a t E K + ě surface: m a t e k (mother/dat. →) m a t c e

14 Parts of Speech and MorphologyParts of Speech correspond to syntactic or grammatical categories such as noun, verb, adjective, adverb, pronoun, determiner, conjunction, and preposition. Word categories are systematically related by morphological processes such as the formation of plural form from the singular form. The major types of morphological processes are inflection, derivation and compounding.

15 Parts of Speech Correspond to syntactic or grammatical categories such as noun, verb, adjectives, prepositions…. Word categories are systematically related by morphological processes such as the formation of plural form from the singular form, past tense from present tense.

16 The Parts of Speech Noun – Refer to entities like people, places, things or idea. Pronoun – words that take the place of nouns. Proper noun – names. Determiner – describes the particular action in a noun. Adjective – describes the properties of nouns or pronouns. Verb – action in a sentence. Adverb – describes a verb, an adjective or another adverb. And many more

17 POS Labeling Children (NOUN) eat (VERB) sweet(ADJECTIVE) candy(NOUN)The(ARTICLE) children(NOUN) ate(VERB) the(ARTICLE) cake(NOUN) The(ARTICLE) news(NOUN) has(AUXILIARY) been(MAIN VERB) quite(ADVERB) sad(ADJECTIVE) in(PREPOSITION) fact(NOUN) .(PERIOD)

18 Morphology: Morphemes & OrderHandles what is an isolated form in written text Grouping of phonemes into morphemes sequence deliverables → deliver, able and s (3 units) could as well be some “ID” numbers: e.g. deliver ~ 23987, s ~ 12, able ~ 3456 Morpheme Combination certain combinations/sequencing possible, other not: deliver+able+s, but not able+derive+s; noun+s, but not noun+ing typically fixed (in any given language)

19 Morphology: From Morphemes to Lemmas & CategoriesLemma: lexical unit, “pointer” to lexicon might as well be a number, but typically is represented as the “base form”, or “dictionary headword” possibly indexed when ambiguous/polysemous: state1 (verb), state2 (state-of-the-art), state3 (government) from one or more morphemes (“root”, “stem”, “root+derivation”, ...) (derivation vs. inflection) Categories: non-lexical small number of possible values (< 100, often < 5-10)

20 Morphology Level: The MappingFormally: A+ → 2(L,C1,C2,...,Cn) A is the alphabet of phonemes (A+ denotes any non-empty sequence of phonemes) L is the set of possible lemmas, uniquely identified Ci are morphological categories, such as: grammatical number, gender, case person, tense, negation, degree of comparison, voice, aspect, ... tone, politeness, ... part of speech (not quite morphological category, but...) 2(L,C1,C2,...,Cn) denotes the power set of (L,C1,C2,...,Cn) A, L and Ci are obviously language-dependent

21 The Dictionary (or Lexicon)Repository of information about words: Morphological: description of morphological “behavior”: inflection patterns/classes Syntactic: Part of Speech relations to other words: subcategorization (or “surface valency frames”) Semantic: semantic features valency frames ...and any other! (e.g., translation)

22 The Categories: Part of Speech: Open and Closed CategoriesPart of Speech - POS (pretty much stable set across languages) not so much morphological (can be looked up in a dictionary), but: morphological “behavior” is typically consistent within a POS category Open categories: (“open” to additions) verb, noun, pronoun, adjective, numeral, adverb subject to inflection (in general); subject to cross-category derivations newly coined words always belong to open POS categories potentially unlimited number of words Closed categories: preposition, conjunction, article, interjection, clitic, particle not a base for derivation (possibly only by compounding) finite and (very) small number of words

23 The Categories: Part of Speech, Open Categories: Verbsinfl. categories: person, number, tense, voice, aspect, [gender, neg.], ... syntactic/semantic: classification: ordinary: (to) speak, (to) write auxiliaries: be, have, will, would, do, go (going) modals: can, could, may, should, must, want phrasal: begin, end, start morphological classification conjugation type: regular/irregular, (Ge.: weak/strong/irregular) conjugation class: (Cz.: 5 classes + ~100 combinations)

24 The Categories: Part of Speech, Open Categories: NounsNouns: infl. categories: number, [gender, case, negation, ...] semantic classification: human/animal/(non-living) things: driver/bird/stone concrete/abstract: computer/thought common/proper: table/Hopkins syntactic classification: countable/unc.: book, water morphological classification: pluralia/singularia tantum: data (is), police (are) declension type (“pattern” or “class”) (Cz.: 14 basic patterns, plus deviations: ~300 patterns, + irregular inflection) “adverbial” nouns: afternoon, home, east (no inflection)

25 The Categories: Part of Speech, Open Categories: PronounsPronouns: infl. categories: number, gender, case, negation; person much like nouns (syntactic usage also similar) (pro)noun ~ “stands for” a noun classification (mostly syntactic/semantic): personal: I, you, she, she, it, we, you, they demonstrative: this, that possessive: my, your, her, his, its, our, their; mine, yours, ours,... reflexive: myself, yourself, herself,..., oneself interrogative: what, which, who, whom, whose, that indefinite (“nominal”): somebody, something, one morphological classification: mostly idiosyncratic pattern

26 The Categories: Part of Speech, Open Categories: Adjectivesinfl. categories: degree of comp., [number, gender, case, negation] classification: ordinary: new, interesting, [test (equipment)] possessive: John’s, driver’s proper: Appalachian (Mountains) often derived from verbs/nouns: teaching (assistant), trendy, stylish morphological classification mostly regular declension (Cz.: 4 basic patterns, ~ 10 total) degrees of comparison (En.: big, bigger, biggest) but: large number of forms (agreement, cf. section on syntax)

27 The Categories: Part of Speech, Open Categories: AdverbsAdverbs: “infl.” categories: degree of comp., [negation] open cat.: regular derivation from adjectives common: new → newly, interesting → interestingly non-derived adverbs: ordinary: so, well, just, too, then, often, there wh-adverbs (interrogative): why, when, where, how degree adverbs/qualifiers: very, too morphological classification (not much, really...) degree of comparison: well, better, best soon, sooner (other lang.: all 3 degrees regular)

28 The Categories: Part of Speech, Open Categories: NumeralsNumerals: infl. categories: number, gender, case, negation open cat.: compounding (Ge.: einundzwanzig, 21) classification: cardinals: one, five, hundred NB: million etc. often considered noun ordinals/fractionals: first, second, thirtieth quantifiers: all, many, some, none multiplicative: times, twice (Cz.: dvaadvacetkrát, 22-times) multilateral: single, triple, twofold morphological classification: as nouns/adjectives; many irreg.

29 The Categories: Part of Speech, Closed CategoriesClosed categories: preposition, conjunction, article, interjection, clitic, particle Morphological behavior: indeclinable (no declension, no conjugation) preposition: of, without, by, to; conjunction: coordinating: and, but, or, however subordinating: that, if, because, before, after, although, as article: a, the; interjection: wow, eh, hello; clitic: ‘s; may be attached to whole phrases (at the end) particle: yes, no, not; to (+verb); many (otherwise) prepositions if part of phrasal verbs, e.g. (look) up

30 The Categories: Number and GenderGrammatical Number: Singular, Plural nouns, pronouns, verbs, adjectives, numerals computer / computers; (he) goes / (they) go In some languages (Czech): Dual (nouns, pronouns, adjectives) (Pl.) nohami / (Dl.) nohama (Cz.; (by) legs (of sth)/(by) legs (of sb)) Grammatical Gender: Masculine, Feminine, Neuter he/she/it; читал, читала, читало (Ru.; (he/she/it) was-reading) nouns: (mostly) do not change gender for a single lexical unit Also: animate/inanimate (gram., some genders), etc. Mädchen (Ge.; girl, neuter); děti (Cz.; children, masc. inanim.)

31 The Categories: Case CaseEnglish: only personal pronouns/possessives, 2 forms other languages: 4 (German), 6 (Russian), 7 (Czech,Slovak,...) nouns, pronouns, adjectives, numerals most common cases (forms in singular/plural) nominative I/we (work) tøída/tøídy (Cz.; class) genitive (picture of) me/us tøídy/tøíd dative (give to) me/us tøídě/tøídám accusative (see) me/us tøídu/tøídy vocative /- tøído/tøídy locative (about) me/us tøídě/tøídách instrumental (by) me/us tøídou/tøídami

32 The Categories: Person, Tenseverbs, personal pronouns 1st, 2nd, 3rd: (I) go, (you) go, (he) goes; (we) go, (you) go, (they) go jdu, jdeš, jde, jdeme, jdete, jdou (Cz.) Tense (Cz.: go) (Pol.: go) past: (you) went szliœcie present: (you pl.) go jdete idziecie future (!if not “analytical”) pùjdete - concurrent (gerund) going jda idąc preceding sze³szy

33 The Categories: Person, Tenseverbs, personal pronouns 1st, 2nd, 3rd: (I) go, (you) go, (he) goes; (we) go, (you) go, (they) go jdu, jdeš, jde, jdeme, jdete, jdou (Cz.) Tense (Cz.: go) (Pol.: go) past: (you) went szliœcie present: (you pl.) go jdete idziecie future (!if not “analytical”) pùjdete - concurrent (gerund) going jda id¹c preceding szed³szy

34 Note on Tense Grammars: more (syntactic/sematnic) tenses  Time Xbut: morphology handles isolated words → some tenses can be defined & handled only at an upper level (surface syntax) Examples of (traditional) tense (synthetical and analytical): infinitive: (to) write (tenseless, personless, ..., except negation (Cz.)) simple present/past: (I) write/(she) writes; (I,she) wrote progressive present/past: (I) am writing; (I) was writing perfect present/past: (I) have written; (I) had written all in passive voice (cf. later), too: (the book) is being/has been/had been written etc. all in conditional mood, too (mood: in Eng. not a morph. category!) (the book) would have been written

35 The Categories: Voice & Aspectactive vs. passive (I) drive / (I am being) driven (Ich) setzte (mich) / (Ich bin) gesetzt (Ge.: to sit down) Aspect imperfective vs. perfective: пoкупал / купил (Ru.: I used to buy, I was buying) / I (have) bought) imperfective continuous vs. iterative (repeating) spal / spával (Cz.: I was sleeping / I used to sleep (every ...))

36 The Categories: Negation, Degree of Comparisoneven in English: impossible (~ not possible) Cz: every verb, adjective, adverb, some nouns; prefix ne- Degree of Comparison (non-analytical): adjectives, adverbs: positive (big), comparative (bigger), superlative (biggest) Pol.: (new) nowy, nowszy, najnowszy Combination (by prefixing): order? both possible: (neg.: Cz./Pol.: ne-/nie-, sup.: nej-/naj-) Cz.: nejnemoٱnìjší (the most impossible) Pol.: nienajwierniejszy (the most unfaithful)

37 Typology of Languages By morphological featuresAnalytical: using (function) words to express categories English, also French, Italian, ..., Japanese, Chinese I would have been going ~ (Pol.) szłabym Inflective: using prefix/suffix/infix, combines several categ. Slavic: Czech, Russian, Polish,... (not Bulgarian); also French, German; Arabic (Cz. new(acc.)) novou (Adj, Fem., Sg., Acc., Non-neg., Pos.) Agglutinative: one category per (non-lexical) morpheme Finnish, Turkish, Hungarian (Fin. plural): -i-

38 Categories & Tags Tagset:list of all possible combinations of category values for a given language T Ì C1ⅹC2ⅹ... ⅹCn typically string of letters & digits: compact system: short idiosyncratic abbreviations: NNS (gen. noun, plural) positional system: each position i corresponds to Ci: AAMP3----2A---- (gen. Adj., Masc., Pl., 3rd case (dative), comparative (2nd degree of comparison), Affirmative (no negation)) tense, person, variant, etc.: N/A (marked by “empty position”, or ‘-’) Famous tagsets: Brown, Penn, Multext[-East], ...

39 Words’ Syntactic FunctionsTypically, nouns refer to entities in the world like people, animals and things. Determiners describe the particular reference of a noun and adjectives describe the properties of nouns. Verbs are used to describe actions, activities and states. Adverbs modify a verb in the same way as adjectives modify nouns. Prepositions are typically small words that express spatial or time relationships. Prepositions can also be used as particles to create phrasal verbs. Conjunctions and complementizers link two words, phrases or clauses.

40 Syntax or Phrase Structure: A simple context-free grammarS --> NP VP NP --> AT NNS | AT NN | NP PP VP --> VP PP | VBD | VBD NP P --> IN NP AT --> the NNS --> children | students | mountains VBD --> slept | ate | saw IN --> in | of NN --> cake The Grammar The Lexicon

41 Syntax or Phrase Structure: A Parse Tree

42 A Simple Context-Free GrammarThe Grammar rules S -> NP V NP -> N The Lexicon N -> John, Gaurav, Ram …… V -> walks, talks, eats, went …..

43 Tag Sets A tag indicates the various conventional parts of speech.Different Tag Sets have been used: E.g., Brown Tag Set, Penn Treebank Tag Set. Tag examples: NP Proper noun, NN Singular noun, AT Article, DET Determinant.

44 Stochastic Grammars Grammars obtained by adding probabilities in a fairly transparent way to “algebraic” (i. e., non-probabilistic) grammars. Stochastic grammars supplement underlying algebraic grammars.

45 Dependencies Local Dependency: dependence between two words expressed within the same syntactic rule. (n-grams model this well) Non-local dependency: is an instance in which two words can be syntactically dependent even though they occur far apart in a sentence.

46 Ambiguities “Children eat sweet candy”“Too much boiling will candy the molasses” In sentence (1) candy is a noun while in (2) it is an adjective. Word category (POS) ambiguity needs to be resolved.

47 Ambiguities (Cont.) Semantic Roles: Determining thematic roles in a sentence. Agent, Patient, Experiencer, Instrument, Goal …. Raju(AGENT) hit us (PATIENT) with a ball (INSTRUMENT). Complicated by the notions of direct and indirect object, active and passive voice.

48 Ambiguities (Cont.) Attachment ambiguities occur with phrases that could have been generated by two different nodes in the parse tree. E.g.: saw the man in the house with a pole. Rare Usage and spurious usage: A hectare is a hundred ares.

49 Garden-Path SentencesGarden-Path sentences are sentences that lead you along a path that suddenly turns out not to work E.g.: The horse raced past the barn fell.

50 Local and Non-Local DependenciesA local dependency is a dependency between two words expressed within the same syntactic rule. A non-local dependency is an instance in which two words can be syntactically dependent even though they occur far apart in a sentence (e.g., subject-verb agreement; long-distance dependencies such as wh-extraction). Non-local phenomena are a challenge for certain statistical NLP approaches (e.g., n-grams) that model local dependencies.

51 The Place of Syntax Between Morphology and MeaningMorphology provides/expects: lemmas (now it’s time to extract syntactic information from a dictionary) tags (Part-of-Speech and combination of morphological categories, such as number, case, tense, voice, ...) and of course, we also have word order now to look at/provide Typically multiple input (non-disambiguated morphology) / output (multiple syntactic structures, non-disambiguated)

52 Words, Phrases, Clauses, Sentencessmallest units on the syntax level function/autosemantic Phrases consist of words and/or phrases; “constituents” Clauses have predicative meaning (single predicate) Sentences consist of clauses (one or more)

53 Words Words lexical unitsauxiliary (function) words: have grammatical function autosemantic words (“lexical” words) idioms fixed phrases (non-compositional) -> “words” Relate to other words dictionary: repository of information for each words about its (idiosyncratic) relations to other words

54 Phrases Phrases sequences of words and/or phrases (i.e. of constituents) may be discontinuous, sometimes Types of Phrases: Simple/Clausal (i.e. clauses, which consist of phrases, behave like phrases... recursively!) According to head type: Noun: a new book Adjective: brand new Adverbial: so much Prepositional: in a class Verb: catch a ball

55 Noun Phrases Head: noun water a book new ideas that small villageThe greatest rise of interest rates since W.W.II within a single year an operating system which, despite great efforts on the part of our administrators, fails all too often

56 Adjective Phrases Head: adjectiveSimple APs very common, complex APs rare old very old really very old five times older than the oldest elephant in our ZOO (was) sure, as far as I know, to be there first

57 Adverbial and Numerical PhrasesHead: adverb three times as much quickly really (... speaks) more loudly than anybody could imagine yesterday Numerical Phrases (... lasted) three hours twenty-two

58 Prepositional PhrasesHead: preposition In fact, play the role of Adverbial Phrases often in the City at five o’clock to a brightest future without a glitch to the point where neither of them could get out of it up to five points instead of Charles

59 Verb Phrases Head: verb (It) rains... could ever see a large Unidentified Flying Object ..., why (we) have got so much rain Please! On Sunday, (he) was driven to the hospital (It) began to snow (...) prohibits smoking in this area

60 Coordination of Phrases“Head”: conjunction, punctuation and, or, but cats and dogs new or even newer quickly and precisely he came to the conclusion that it makes no sense to hide himself anymore and therefore we could hear him today (trains) from and to Baltimore eat your lunch now or at the picnic table

61 Ellipsis Word or Phrase missing where one would normally expect one; often happens in dialogues Whom did you see there? Peter. ?? verb ?? Most common in coordination (written text) Pittsburgh leads 4-0 but Detroit only ??verb in 2nd part?? Systematic in many languages: pro-drop (leave out a pers. pronoun in the Subject position) [She] Passed the exam easily.

62 Clauses Predicative function:some activity of some subjects/objects, somewhere in time, under certain circumstances Main clause not part of a greater clause Embedded clause part of other clause, having some function (like a phrase) Function of a Clause same as for phrase, plus some (direct speech/discourse etc.)

63 Gaps (Non-Continuous Constituents)Constituent moves from the expected position: happens in questions and relative clauses Who(m) do you work for whom? strictly speaking, do you work should be you (do work) I don’t know why we have got so much rain why? On Sundays, I usually work On Sundays but I stay home on Tuesdays. The story he never wrote the story And finally the car she was supposed to use the car for her trip to New York broke. The last two: also could be considered ellipsis (which) plus a gap.

64 Sentences Consist of a single or several main clausesIf several main clauses: coordination, much like coordinated phrases more coordinating conjunctions: and, or, but, (and) therefore, ... In written text, starts with a capital letter Ends by period/question mark/exclamation mark not all periods end a sentence! Sometimes even semicolon (;) might be a sentence break (...vague)

65 Syntax: RepresentationTree structure (“tree” in the sense of graph theory) one tree per sentence Two main ideas for the shape of the tree: phrase structure (~ derivation tree, cf. parsing later) using bracketed grouping brackets annotated by phrase type heads (often) explicitly marked dependency structure (lexical relations “local”, functions) basic relation: head (governor) - dependent links (edges) annotated by syntactic function (Sb, Obj, ...) phrase structure: implicitly present (but 1:n mapping Dep→PS)

66 Phrase Structure Tree Example:((DaimlerChrysler’s shares)NP (rose (three eights)NUMP (to 22)PP-NUM )VP )S

67 Dependency Tree Example:rosePred(sharesSb(DaimlerChrysler’sAtr),eightsAdv(threeAtr),toAuxP(22Adv))

68 Semantic Roles Most commonly, noun phrases are arguments of verbs. These arguments have semantic roles: the agent of an action, the patient and other roles such as the instrument or the goal. In English, these semantic roles correspond to the notions of subject and object. But things are complicated by the notions of direct and indirect object, active and passive voice.

69 Subcategorization Different verbs can relate different numbers of entities: transitive versus intransitive verbs. Tightly related verb arguments are called complements but less tightly related ones are called adjuncts. Prototypical examples of adjuncts tell us time, place, or manner of the action or state described by the verb. Verbs are classified according to the type of complements they permit. This called subcategorization. Subcategorizations allow to capture syntactic as well as semantic regularities.

70 Attachment Ambiguity and Garden-Path SentencesAttachment ambiguities occur with phrases that could have been generated by two different nodes in the parse tree. The child ate the cake with a spoon. Genuinely ambiguous: Fruit flies like a banana. Garden-Path sentences are sentences that lead along a path that suddenly turns out not to work. The horse raced past the barn fell.

71 Semantics Semantics is the study of the meaning of words, constructions, and utterances. Semantics can be divided into two parts: lexical semantics and combination semantics. Lexical semantics: hypernymy, hyponymy, antonymy, meronymy, holonymy, synonymy, homonymy, polysemy, and homophony. Compositionality: the meaning of the whole often differs from the meaning of the parts. Idioms correspond to cases where the compound phrase means something completely different from its parts.

72 Pragmatics Pragmatics is the area of studies that goes beyond the study of the meaning of a sentence and tries to explain what the speaker really is expressing. Understand the scope of quantifiers, speech acts, discourse analysis, anaphoric relations. The resolution of anaphoric relations is crucial to the task of information extraction.