Gregory Golberg (grisha@alum.mit.edu)

Lab 2b

Assignment: http://web.mit.edu/6.863/www/labs/laboratory2bnew.pdf.

  1. Sections 2.1 - 2.3

    This section combines answers for 2.1 - 2.3.

    Output: output-0.txt For ease of reading, this was produced by
    earley -blab2b-sentences.txt lab2b-0.grm | grep -v Kimmo > output-0.txt

    This way we only see the results, not Kimmo's verbosity (though it's useful for debugging).

    Further output-*.txt files have been produced similarly.

    Lexicon: english2b-0.lex  
    Grammar: lab2b-0.grm  

    Both the lexicon and grammar files above include detailed comments explaining the additions and modifications made. Same goes for answers to other problems in this lab -- I like to keep comments closer to code, because that way comments and code are less likely to go out of sync. Knuth agrees).

    As we can see from the output, all the requirements of the problem are satisfied: all sentences from lab2b-sentences.txt that should be accepted at this stage are accepted, all those that should be rejected are rejected. This applies to other answers below.

    In other words, here I am only concerned with correct parses up to and not including "the solution solves the case".)

    Note that "poirot sent the solutions to the police" sentence generates one parse only:

  2. Section 2.4

    Output: output-1.txt
    Lexicon: english2b-1.lex
    Grammar: lab2b-1.grm

    Grammar size questions (2a and 2b)

    The grammar size |G| from sections 2.1-2.3 is 91. After section 2.4 it is 136. The increase is about 50%. We are not counting anything in lexicon for this part.

  3. Section 2.5

    Output: output-feat.txt (We were not expected to parse the last phrase)
    Lexicon: english2b-feat.lex
    Grammar: lab2b-feat.grm

    The |G| for this section is 225. This is a significant increase from the previous section. I expected the complexity of the grammar to decrease with the use of features, because they provide an abstraction, if you will, that eliminates the need for redundant productions we were forced to use in section 2.4 to account for agreement.

    What happened here? We can see that with features, size of rules grows, but the number of rules decreses. Thus, the feature grammar would do better the more different kinds of agreement it has. But our grammar recognizes a limited subset of the language, and here, the advantages of features are not clearly seen.

    To clarify, for example, consider Russian language. It requires quite a lot of agreements; for example, between adjective and noun on gender (of which there are four: masculine, feminine, neuter and plural) and declension case (of which there are 6, but more can be recognized). Thus, oversimplifying: without features, we will have on the order of 4*6 = 24 rules, each of size 3 (e.g., NP -> Adj-fem-accus N-fem-accus, making |G| = 72. With features, we can basically have NP[Agr ?x ?y] >->Adj[gender ?x, case ?y] N[gender ?x, case ?y] (you get the idea - x's have to agree with each other and so do y's - but have I got the syntax right?). Here, using features will give us |G| = 14. The improvement will be greater if we consider other implicit cases, such as vocative case, if we want to recognize archaic texts.
  4. Section 2.6

    Output: output-gap.txt
    Input: input26.txt
    Lexicon: english2b-gap.lex
    Grammar: lab2b-gap.grm

  5. Discussion of slash feature machinery

    Slash feature machinery is cool! No, really. The feature mechanism by itself is powerful and flexible, the slash machinery makes it even more so, as it introduces a more poweful abstraction if you will - the fact that feature information can refer to a non-terminal, rather than just a constant value. This allows no upper bound on the size of the fragment between filler and gap. But counter to this advantage goes the fact that we need to prevent sentences with gaps without fillers, and thus need to propagate more information about displaced parts.

    This is the sort of thing up with which I will not put .


    RANDOM NOTE: Gotta watch trailing spaces in the lex file! kimmo.py gets confused!!!