18-09-2012, 12:33 PM
Syntax analysis
Syntax analysis.ppt (Size: 612.5 KB / Downloads: 23)
Context free grammars and languages
Many languages are not regular. Thus we need to consider larger classes of languages;
Context Free Language (CFL) played a central role in natural languages since 1950’s (Chomsky) and in compilers since 1960’s (Backus);
Context Free Grammar (CFG) is the basis of BNF syntax;
CFG is increasingly important for XML and DTD (XML Schema).
Formal definition of CFG
A CFG is a 4-tuple G=(Σ,N,P,S) where
Σ is a finite set of terminals;
N is finite set of variables (non-terminals);
P is a finite set of productions of the form Aα, where A is a nonterminal and α consists of symbols from Σ and N;
A is called the head of the production;
α is called the body of the production;
S is a designated non-terminal called the start symbol.
Example:
Gpal=({0,1}, {P}, A,P), where A ={Pε, P0, P1, P0P0, P1P1}
Sometimes we group productions with the same head.
e.g., A={Pε|0|1|0P0|1P1}.
Why use regular expression
Every regular set is a context free language.
Since we can use CFG to describe regular language, why using regular expression to define lexical syntax of a language? Why not using CFG to describe everything?
Lexical rules are simpler;
Regular expressions are concise and easier to understand;
More efficient lexical analyzers can be constructed from regular expression;
It is good to modularize compiler into two separate components.
BNF and EBNF
BNF and EBNF are commonly accepted ways to express productions of a context-free grammar.
BNF
Introduced by John Backus, first used to describe Algol 60. John won Turing award in 1977.
BNF originally stood for "Backus Normal Form". However, it was pointed out that this is not literally a "normal" form, and the acronym has come to stand for "Backus Naur Form".
EBNF stands for Extended BNF.
BNF format
lhs ::= rhs
Quote terminals or non-terminals
<> to delimit non-terminals,
bold or quotes for terminals, or ‘as is’
vertical bar for alternatives
There are many different notations, here is an example
opt-stats ::= stats-list | EMPTY .
stats-list ::= statement | statement ‘;’ stats-list .
The language of a grammar
If G is a grammar, the language of the grammar, denoted as L(G), is the set of terminal strings that have derivations from the start symbol.
If a language L is the language of some context free grammar, then L is said to be a context free language.
Example: the set of palindromes is a context free language.
Remove ambiguity
Some theoretical results (bad news)
Is there an algorithm to remove the ambiguity in CFG?
the answer is no
Is there an algorithm to tell us whether a CFG is ambiguous?
The answer is also no.
There are CFLs that have nothing but ambiguous CFGs.
That kind of language is called ambiguous language;
If a language has one unambiguous grammar, then it is called unambiguous language.
In practice, there are well-known techniques to remove ambiguity
Two causes of the ambiguity in the expr grammar
the precedence of operator is not respected. “*” should be grouped before “+”;
a sequence of identical operator can be grouped either from left or from right. 3+4+5 can be grouped either as (3+4)+5 or 3+(4+5).