15-01-2013, 12:00 PM
Bottom-Up Parsing
1Bottom.pdf (Size: 119.22 KB / Downloads: 120)
As the name suggests, bottom-up parsing works in the opposite direction from topdown.
A top-down parser begins with the start symbol at the top of the parse tree and
works downward, driving productions in forward order until it gets to the terminal
leaves. A bottom-up parse starts with the string of terminals itself and builds from the
leaves upward, working backwards to the start symbol by applying the productions in
reverse. Along the way, a bottom-up parser searches for substrings of the working
string that match the right side of some production. When it finds such a substring, it
reduces it, i.e., substitutes the left side nonterminal for the matching right side. The goal
is to reduce all the way up to the start symbol and report a successful parse.
In general, bottom-up parsing algorithms are more powerful than top-down methods,
but not surprisingly, the constructions required are also more complex. It is difficult to
write a bottom-up parser by hand for anything but trivial grammars, but fortunately,
there are excellent parser generator tools like yacc that build a parser from an input
specification, not unlike the way lex builds a scanner to your spec.
Shift:
If it is impossible to perform a reduction and there are tokens remaining in the
undigested input, then we transfer a token from the input onto the stack. This
is called a shift. For example, using the grammar above, suppose the stack
contained ( and the input contained id+id). It is impossible to perform a
reduction on ( since it does not match the entire right side of any of our
productions. So, we shift the first character of the input onto the stack, giving
us (id on the stack and +id) remaining in the input.
Error:
If neither of the two above cases apply, we have an error. If the sequence on
the stack does not match the right-hand side of any production, we cannot
reduce. And if shifting the next input token would create a sequence on the
stack that cannot eventually be reduced to the start symbol, a shift action
would be futile. Thus, we have hit a dead end where the next token
conclusively determines the input cannot form a valid sentence. This would
happen in the above grammar on the input id+). The first id would be shifted,
then reduced to T and again to E, next + is shifted. At this point, the stack
contains E+ and the next input token is ). The sequence on the stack cannot be
reduced, and shifting the ) would create a sequence that is not viable, so we
have an error.
LR Parsing
LR parsers ("L" for left to right scan of input, "R" for rightmost derivation) are efficient,
table-driven shift-reduce parsers. The class of grammars that can be parsed using LR
methods is a proper superset of the class of grammars that can be parsed with predictive
LL parsers. In fact, virtually all programming language constructs for which CFGs can
be written can be parsed with LR techniques. As an added advantage, there is no need
for lots of grammar rearrangement to make it acceptable for LR parsing the way that LL
parsing requires.
The primary disadvantage is the amount of work it takes to build the tables by hand,
which makes it infeasible to hand-code an LR parser for most grammars. Fortunately,
there are LR parser generators that create the parser from an unambiguous CFG
specification. The parser tool does all the tedious and complex work to build the
necessary tables and can report any ambiguities or language constructs that interfere
with the ability to parse it using LR techniques.
Subset construction and closure
You may have noticed a similarity between subset construction and the closure
operation. If you think back to a few lectures, we explored the subset construction
algorithm for converting an NFA into a DFA. The basic idea was create new states that
represent the non-determinism by grouping the possibilities that look the same at that
stage and only diverging when you get more information. The same idea applies to
creating the configurating sets for the grammar and the successor function for the
transitions. We create a NFA whose states are all the different individual configurations.
We put all the initial configurations into one start state. Then draw all the transitions
from this state to the other states where all the other states have only one configuration
each.