03-03-2012, 04:42 PM
Lexical Analyzer
lexical analysis.pdf (Size: 149.43 KB / Downloads: 30)
Lexical Analyzer
The lexical analyzer is responsible for:
• Reading in a stream of input characters
• Produce as output a sequence of tokens
• Upon get-next-token request from the parser, the analyzer
reads in a string of characters (lexeme) to generate a token.
For the remainder of the lecture, we will go over some examples
and an in class discussion of how the above works, and how we
can implement it.
Languages
We define a language as a set of strings over a fixed alphabet
(which is a set of symbols) and may also include the empty
string. Lets look at the following examples for an understanding
of operations on languages. Assuming that L and D are
languages or sets of symbols (letters and digits respectively)
String
• prefix of s : A target string derived from another source string
by removing 0 or more symbols from the end of the source
string
• suffix of s : A target string derived from another source string
by removing 0 or more symbols from the beginning of the
source string
• substring of s : A target string derived from another source
string by removing 0 or more symbols from the beginning
and/or 0 or more symbols from the end of the source string
• proper prefix, suffix or substring of s : A target string that is
not equal to the original string.
• subsequence of s : A target string derived from another
source string by removing 0 or more (not necessarily
contiguous) symbols from the source string.
Regular Expressions
What is a regular expression? And why do we care?
• A regular expression is a simple expression that denotes a
language. (A collection of strings that can be recognized).
• It has several features that promote easy writing of complex
strings. For example:
– a, b, c - we use to denote terminal symbols of the
language
– r, s, t - we use to denote regular expressions
– a | b - denotes choice (can use [abc] to denote choice –
one of the list)
To recognize identifiers.
We use the function gettoken() to determine if the current
identifier is a reserved word. If it is, we return the token
corresponding to that reserved word. We use the function
install id() to return a symbol table entry for any other identifier –
it will return a default value for reserved words.