Gregory Golberg (grisha@alum.mit.edu)

Lab 2a

Note: Questions 1-4 were done with NLTK 1.4.4, the rest with NLTK 1.2. In both cases, Win XP, Python 2.4.3 and Numarray 1.5.1 are used.
This is because I have already started doing the lab with the latest (1.4.4) version - the only not Lite available directly on SourceForge. But then I looked up the suggested versions, and after some searches (and trying to get a copy of /mit/6.863/python/nltk-1.2 to work), found http://mirror.optusnet.com.au/sourceforge/n/nl/nltk/.

Question 1

Parser gets stuck at this point:

This is because after the parser decides to reduce S => NP VP as step 14. Here is the state of the parser after step 13, just before wrongful reduction:

After correcting this by forcing the parser to shift as step 14, it fails to parse again, ending in the following state:

This time the culprit is the choice to again reduce instead of shifting as step #23. Here is the state after 22 steps.

Forcing step 23 to be a shift generates the correct parse:

Digression

Though a valid parse given the grammar, arguably, the above tree makes little sense -- what does it mean to see something with a statue? It is as if "with a statue" qualifies the "seeing". (Sort of like "with horses" as an adverb, which originated when my we saw a sign saying "DANGER WITH HORSES" and construed it so that it meant "extreme danger").
If we do the shift > reduce resolution at different places in the parse, namely, as steps 13 and 20, the result will make more sense to a human:

Question 2

Both times we had to force the parser to shift, where it automatically reduced which led it to eventually getting stuck. Thus, the parser uses reduce > shift strategy.

No. We have already seen that the parser's chosen strategy, reduce > shift, does not automatically work. However, neither does shift > reduce -- in this case we go down the wrong path after the very first step. In the picture below, the choice is between shifting "dog" or reducing the Det => my production.

If we shift here, we will never be able to create the NP "my dog"; "my" will forever be "hanging". In fact, since one can always shift (unless end of input has been reached), one will not reduce anything until it's too late:

Reduce > shift will produce "scragglier" parse trees. In our example, even though shift > reduce does not succeed in producing much of trees, whatever it does produce is "bushier" - a number of trees with 0 or 1 depth.

Reduce > shift clearly minimizes the stack depth, since every shift increases the stack's depth, while reductions mostly decrease it, or, in cases of reduction of non-terminal to a terminal, keep it the same.

One can argue that people perform a non-deterministic parse, using both strategies in various "sub-parsers". It is known that there's a limit to the number of things most humans can hold in short-term memory. For instance, as previous question points out, we don't only have a limited stack space. But this also means that the number of branches we can simultaneously non-deterministically look at is limited. Thus we sometimes also back-track, changing our strategy (much like we did in Question 1). We can think of this strategy change as actually adding another parser to look at non-deterministically, just at a different time. It can take one (well, me) some time, parsing and back-tracking and parsing again something like "The mouse the cat the dog chased caught ate the cheese."

Question 3

Judging from the grammar, there can be an R/R conflict in a VP NP PP phrase -- do we reduce NP PP into NP and get a VP NP, or reduce VP NP PP into VP?
An example input is "my dog saw the man in the statue". The R/R conflict is shown in the below picture (reductions that conflict are highlighted and underlined in red):

This conflict occurs after step 20, (but on the way here we have to manually resolve S/R conflict in step 13 in favor of Shift).

We can resolve the R/R conflict in favor of the reduction that reduces more elements. In the above example, it will be VP => VP NP PP production. In this example, this strategy will result in a successful parse in two more steps (but three more is required if another production is chosen). The default strategy of srparser, by the way, seems to be the opposite one (or maybe it's random?) which less effective in this example.

Question 4

The number of elements in the stack is limited by the number of words in the sentence (reductions decrease stack). It is thus O(n). But what does "We assume that if a sentencea cannot be parsed then its parsing time doesn't count in this calculation" mean? The question talks about order of space (stack depth), not time...

Shifting is O(1) time, however, reduction is not just a function of the stack size (limited by size of input, as above), but also of grammar size. Thus, grammar size (or some function of it) can be thought of as k in O(kn), with k>1. This would make srparser not real-time.
As an example, consider parsing the following sentence: "a man with a statue with a statue with a statue [...] saw a dog". The reductions get slower and slower as we move through the input.

Looks to me like 2^n-1-1, will come back to this...

Question 5
Top-down parsing creates 77 edges, and bottom-up - 54 edges (about 30% fewer).

I have to vent here about an annoying bug which, after changing Zoom, at least on Windows version, gets rid of the vertical scrollbar. And I tried to play with Zoom and various resizings in the effort to see edges that are partially visible in the far right side, as horizontal scroll bar is non-existent.

One reason why bottom-up strategy generates fewer (indeed, fewer partial) edges is because it, ultimately, does not generate edges for words that do not exist in the input. This will hold generally, unless, for example, there is only one reduction possible at any given time.

Question 6
As indicated above, top-down parser has edges for words that do not exist in the input, such as N => * dog, which is not present in the bottom-up parser.
Conversely, bottom-up generates, for example, S edges that do not start at the beginning of a sentence, such as a S => NP * VP edge. This is because whenever the bottom-up strategy creates a complete edge for a production, it creates partial edges for productions that begin with non-terminal just reduced. In other words, when NP is recognized, partial edges are created for all productions that begin with NP - in our case, S. This behavior is not present in top-down parse, as it only creates edges that begin with elements that exist on RHS of already created partial edges. In other words, since S does not ever appear on a RHS, top-down parsing will not create an edge for S that does not stem from the beginning of a sentence.

Challenge Question 7
TODO

Question 8
NOTE: Line 19 (USER = os.environ['USER']) of python-earley/editor.py does not play nice with Python 2.4.3 on XP the way I installed it, somehow... But a workaround is to just say USER = 'krap'...

Earley parser is more efficient than either of the strategies discussed above, as it generates 52 edges, less than either top-down (77) or bottom-up (54).

Question 9
The matrix for this Earley parse is as follows:

The reason for the "paradox" is that the chart does not actually "pack" the superpolynomial number of distinct trees into itself. It represents which states are connected by the created edges. Therefore, more than one edge can connect same states, and so multiple edges are referred to from a single matrix element. For example, in the snapshot of the log shown below, two subtrees occupying the same chart space are shown, circled in red.

This split stems from the edge shown in the snapshot below, circled in blue: