10.26 FLST (Uszkoreit)

Speech Translation pipeline

Ambiguity: "At some point, you need to realise how bad it is. Then you need to lose that fear."
    - Phonetic (homophones)
    - Orthographic (homographs)
    - Lexical (homonymy)
    - Logical (tautology) ## Not really, this is a joke.
    
    ***Ambiguity as a pathological example (*see the German sentence example on the slide*) can exhibit both lexical, and syntactic, and anaphoric ambiguities.
    - Note that syntax doesn't know that 'islands on the weekends' is ambiguious. 
    ## I think syntax doesn't know that it's *not* ambiguous, right? Or that 'islands on the weekends' is an unlikely interpretation
        ##I think whether or not this sentence / phrase is ambiguious is not really due to its syntax. To determine that it's ambiguious (or un-ambiguious), one needs more than syntactic knowledge (##maybe, that's why getting into the Ling. Competense topic....)
        
==> Competence vs. Performance:

Linguistic competence: it's implicit, finite-state of knowledge(?) (ex. first lang. acquisition).

Piraha <-- language without recursion near linguists. 

==> Computational Complexity of Human Language

Chomsky Hierarchy (http://en.wikipedia.org/wiki/Chomsky_hierarchy )
Typ 0: untractable languages, very difficult to computationally analyse.
     (Untractable/Intractable: http://en.wikipedia.org/wiki/Computational_complexity_theory#Intractability
Typ 1: Context-sensitive language
Typ 2: Context-free languages (in the PS tree, going from right to left..example can be Turkish, and another one Russian (*as far as I know*), but Russian is completely irregular, so there is no right-to-left / left-to-right, etc..., more difficult to analyze.
Typ 3: regular languages: ## they are very regular :D
Grammar can be expressed in regular expressions

Topicalisation - phrase movement to the front. In Chinese you get topicalistion, with a particle, and not movement. 
     Ex: Kim likes bagels <==> Bagels like Kim.

    Pullum - showed that context-free proofs were wrong, that we're only mildly context free.

Cross-Serial Dependencies (*long distance dependencies*)
 Ex: (in Dutch) ...dat Jan de kinderen zag zwemmen
                        ... that Jan the children saw swim
                        '...that Jan saw the children swim' 
            ## this is same in Turkish syntactically. However, it's even closer to the Swiss-German examples (*see the next slide*) because specific verbs require specific noun cases.
    # "Swiss German is impossible because they don't like immigration"

Bambara - Mali - to give emphasis, they repeat full stretches of languages. (http://en.wikipedia.org/wiki/Bambara_language ) 'ww' language = not context free. 

The chomsky hierarchy has survived in comuter science quite well. The computational power needed to parse chomsky should go up exponentially with length - but we know that this is not the case for humans. 

Syntactic Phenomena that make parsing harder:
Probabilistic parsing: It's very much the case that probability is used along with syntax to parse.

Hard-to-understand Sentences:
    In mud eels are, in clay are none. ##Apparently grammatical - not to me, at all, because 'are' doesn't have the meaning 'exist' for me (Richard). ##Haha, we know what color you're! :D ##Aye, but the colors will be lost when I transcribe this. 
    ## The sentence is actually from a nursery rhyme, see http://www.mamalisa.com/?t=hes&p=1823 - supposed to sound like Latin when you read it fast ##I thought maybe it was Shakeasperian or something (augh..never can spell that right).
    
    Yes, we have rules, and we can do parsing - but our expectations, the probabilities we assign, gets us the wrong readings. 
    
Humans' produce sentences incrementally. 

===

Language Technology!
Some cool notes on why LT is the best. 

20B€ in the world in speech technology, same in translation
50% is in Europe
500.000 translation profs in Europe