31-10-2012, 04:32 PM
Lexical Analyzer
Lexical Analyzer[.ppt (Size: 170.5 KB / Downloads: 58)
Lexical Analyzer reads the source program character by character to produce tokens.
Normally a lexical analyzer doesn’t return a list of tokens at one shot, it returns a token when the parser asks a token from it.
Token
Token represents a set of strings described by a pattern.
Identifier represents a set of strings which start with a letter continues with letters and digits
The actual string (newval) is called as lexeme.
Tokens: identifier, number, addop, delimeter, …
Since a token can represent more than one lexeme, additional information should be held for that specific lexeme. This additional information is called as the attribute of the token.
For simplicity, a token may have a single attribute which holds the required information for that token.
For identifiers, this attribute a pointer to the symbol table, and the symbol table holds the actual attributes for that token.
Terminology of Languages
Alphabet : a finite set of symbols (ASCII characters)
String :
Finite sequence of symbols on an alphabet
Sentence and word are also used in terms of string
is the empty string
|s| is the length of string s.
Language: sets of strings over some fixed alphabet
Regular Expressions
We use regular expressions to describe tokens of a programming language.
A regular expression is built up of simpler regular expressions (using defining rules)
Each regular expression denotes a language.
A language denoted by a regular expression is called as a regular set.
Finite Automata
A recognizer for a language is a program that takes a string x, and answers “yes” if x is a sentence of that language, and “no” otherwise.
We call the recognizer of the tokens as a finite automaton.
A finite automaton can be: deterministic(DFA) or non-deterministic (NFA)
This means that we may use a deterministic or non-deterministic automaton as a lexical analyzer.
Both deterministic and non-deterministic finite automaton recognize regular sets.
Which one?
deterministic – faster recognizer, but it may take more space
non-deterministic – slower, but it may take less space
Deterministic automatons are widely used lexical analyzers.
Lexical Analyzer[.ppt (Size: 170.5 KB / Downloads: 58)
Lexical Analyzer reads the source program character by character to produce tokens.
Normally a lexical analyzer doesn’t return a list of tokens at one shot, it returns a token when the parser asks a token from it.
Token
Token represents a set of strings described by a pattern.
Identifier represents a set of strings which start with a letter continues with letters and digits
The actual string (newval) is called as lexeme.
Tokens: identifier, number, addop, delimeter, …
Since a token can represent more than one lexeme, additional information should be held for that specific lexeme. This additional information is called as the attribute of the token.
For simplicity, a token may have a single attribute which holds the required information for that token.
For identifiers, this attribute a pointer to the symbol table, and the symbol table holds the actual attributes for that token.
Terminology of Languages
Alphabet : a finite set of symbols (ASCII characters)
String :
Finite sequence of symbols on an alphabet
Sentence and word are also used in terms of string
is the empty string
|s| is the length of string s.
Language: sets of strings over some fixed alphabet
Regular Expressions
We use regular expressions to describe tokens of a programming language.
A regular expression is built up of simpler regular expressions (using defining rules)
Each regular expression denotes a language.
A language denoted by a regular expression is called as a regular set.
Finite Automata
A recognizer for a language is a program that takes a string x, and answers “yes” if x is a sentence of that language, and “no” otherwise.
We call the recognizer of the tokens as a finite automaton.
A finite automaton can be: deterministic(DFA) or non-deterministic (NFA)
This means that we may use a deterministic or non-deterministic automaton as a lexical analyzer.
Both deterministic and non-deterministic finite automaton recognize regular sets.
Which one?
deterministic – faster recognizer, but it may take more space
non-deterministic – slower, but it may take less space
Deterministic automatons are widely used lexical analyzers.