10.26 FLST (Uszkoreit) Speech Translation pipeline Ambiguity: "At some point, you need to realise how bad it is. Then you need to lose that fear." - Phonetic (homophones) - Orthographic (homographs) - Lexical (homonymy) - Logical (tautology) ## Not really, this is a joke. ***Ambiguity as a pathological example (*see the German sentence example on the slide*) can exhibit both lexical, and syntactic, and anaphoric ambiguities. - Note that syntax doesn't know that 'islands on the weekends' is ambiguious. ## I think syntax doesn't know that it's *not* ambiguous, right? Or that 'islands on the weekends' is an unlikely interpretation ##I think whether or not this sentence / phrase is ambiguious is not really due to its syntax. To determine that it's ambiguious (or un-ambiguious), one needs more than syntactic knowledge (##maybe, that's why getting into the Ling. Competense topic....) ==> Competence vs. Performance: Linguistic competence: it's implicit, finite-state of knowledge(?) (ex. first lang. acquisition). Piraha <-- language without recursion near linguists. ==> Computational Complexity of Human Language Chomsky Hierarchy (http://en.wikipedia.org/wiki/Chomsky_hierarchy ) Typ 0: untractable languages, very difficult to computationally analyse. (Untractable/Intractable: http://en.wikipedia.org/wiki/Computational_complexity_theory#Intractability ) Typ 1: Context-sensitive language Typ 2: Context-free languages (in the PS tree, going from right to left..example can be Turkish, and another one Russian (*as far as I know*), but Russian is completely irregular, so there is no right-to-left / left-to-right, etc..., more difficult to analyze. Typ 3: regular languages: ## they are very regular :D Grammar can be expressed in regular expressions Topicalisation - phrase movement to the front. In Chinese you get topicalistion, with a particle, and not movement. Ex: Kim likes bagels <==> Bagels like Kim. Pullum - showed that context-free proofs were wrong, that we're only mildly context free. Cross-Serial Dependencies (*long distance dependencies*) Ex: (in Dutch) ...dat Jan de kinderen zag zwemmen ... that Jan the children saw swim '...that Jan saw the children swim' ## this is same in Turkish syntactically. However, it's even closer to the Swiss-German examples (*see the next slide*) because specific verbs require specific noun cases. # "Swiss German is impossible because they don't like immigration" Bambara - Mali - to give emphasis, they repeat full stretches of languages. (http://en.wikipedia.org/wiki/Bambara_language ) 'ww' language = not context free. The chomsky hierarchy has survived in comuter science quite well. The computational power needed to parse chomsky should go up exponentially with length - but we know that this is not the case for humans. Syntactic Phenomena that make parsing harder: * Right-extraposition: He invited those people to the party on the occasion of his 40th birthday [...] who had already come to his 35th birthday party. Probabilistic parsing: It's very much the case that probability is used along with syntax to parse. Hard-to-understand Sentences: In mud eels are, in clay are none. ##Apparently grammatical - not to me, at all, because 'are' doesn't have the meaning 'exist' for me (Richard). ##Haha, we know what color you're! :D ##Aye, but the colors will be lost when I transcribe this. ## The sentence is actually from a nursery rhyme, see http://www.mamalisa.com/?t=hes&p=1823 - supposed to sound like Latin when you read it fast ##I thought maybe it was Shakeasperian or something (augh..never can spell that right). Yes, we have rules, and we can do parsing - but our expectations, the probabilities we assign, gets us the wrong readings. Humans' produce sentences incrementally. === Language Technology! Some cool notes on why LT is the best. 20B€ in the world in speech technology, same in translation 50% is in Europe 500.000 translation profs in Europe