Sebesta 4

Chapter 4 Topics
 Lexical and Syntax Analysis
 The Parsing Problem
 Recursive-Descent Parsing
 Bottom-Up Parsing
Syntax analyzers almost always based
on a formal description of the syntax of
the source language (grammars)
 Almost all compilers separate analyzing
syntax into:
 Lexical Analysis – low-level
 Syntax Analysis – high-level
Reasons to Separate Syntax and
Lexical Analysis
Simplicity – lexical analysis is less complex, so
the process is simpler when separated
Efficiency – allows for selective optimization
 Portability – lexical analyzer is somewhat
platform dependent whereas the syntax analyzer
is more platform independent
Lexical Analysis
A pattern matcher for character strings
 Performs syntax analysis at the lowest
level of the program structure
 Extracts lexemes from a given input
string and produce the corresponding
Lexical Analysis (continued)
result = oldsum – value / 100;
Building a Lexical Analyzer
Write a formal description of the tokens
and use a software tool that constructs
lexical analyzers when given such a
 Design a state transition diagram that
describes the tokens and write a program
that implements the diagram
 Design a state transition diagram that
describes the tokens and hand-construct a
table-driven implementation of the state
State (Transition) Diagram Design
A directed graph with nodes labeled with
state names and arcs labeled with input
 Including states and transitions for each
and every token pattern would be too
large and complex
 Transitions can be combined to simplify
the state diagram
The Parsing Problem
Two goals of syntax analysis:
 Check the input program for any syntax
errors, produce a diagnostic message if an
error is found, and recover
 Produce the parse tree, or at least a trace of
the parse tree, for the program
Two Classes of parsers:
 Top-down
 Bottom-up
Top-Down Parsers
Traces or builds a parse tree in preorder
(leftmost derivation)
 The most common top-down parsing
 Recursive descent
 LL parsers
Bottom-Up Parsers
Produce the parse tree by beginning at
the leaves and progressing towards the
 Most common bottom-up parsers are in
the LR family
Complexity of Parsing
Parsing algorithms that work for any
unambiguous grammar are complex and
inefficient: O(n3)
 Compilers use parsers that only work for
a subset of all unambiguous grammars,
but do it in linear time: O(n)
Recursive-Descent Parsing
Top-Down Parser
 EBNF is ideal for the basis of a
recursive-descent parser
 Each terminal maps to a function
 For a non-terminal with more than one RHS,
look at the next token to determine which
side to choose
 No mapping = syntax error
Recursive-Descent Parsing
Grammar for an expression:
<expr> → <term> {+ <term>}
<term> → <factor> {* <factor>}
<factor> → id | int_constant | ( <expr> )
How do we parse?
Expression: 1 + 2
→ <term> + <term>
→ <factor> + <term>
→ 1 + <term>
Recursive-Descent Parsing
Grammar for an expression:
<expr> → <term> {+ <term>}
<term> → <factor> {* <factor>}
<factor> → id | int_constant | ( <expr> )
What does code look like?
void expr() {
while (nextToken == ADD_OP) {
Recursive-Descent Parsing
The LL (Left Recursion) Problem
<expr> → <expr> + <term>
<expr> → <expr> + <term> + <term>
<expr> → <expr> + <term> + <term> + <term>
How do we fix it?
 Modify grammar to remove left recursion
<expr> → <expr> + <term>
<expr> → <term> + <term>
<term> → id | int_constant | <expr>
Recursive-Descent Parsing
The Pairwise Disjointness Problem
 If the grammar is not pairwise disjoint, how do you know
which RHS to pick based on the next token?
<variable> → identifier | identifier[<expr>]
How do we fix it?
 Left Factoring
<variable> → identifier<new>
<new> → ø | [<expr>]
Bottom-Up Parsing
Parsing is based on reduction
 Reverse of a rightmost derivation
 At each step, find the correct RHS that
reduces to the previous step in the
Example Grammar
<S> → <A>b
<A> → a
<A> → b
Input: ab
Step 1: <A>b
Step 2: <S>
Bottom-Up Parsing
Most bottom-up parsers are shift-reduce
 Shift – move token onto the stack
 Reduce – replace RHS with LHS
Bottom-Up Parsing
 Def:  is the handle of the right sentential
form iff
 = w if and only if S =>*rm Aw =>rm w
 The handle of a right sentential form is its
leftmost simple phrase
 Bottom-Up Parsing is essentially looking for
handles and replacing them with their LHS
Bottom-Up Parsing
Advantages of Shift Reduction Parsers
 They can be built for all programming
 They can detect syntax errors as soon as it
is possible in a left-to-right scan
 They LR class of grammars is a proper
superset of the class parsable by LL parsers
(for example, many left recursive grammars
are LR, but none are LL)
Bottom-Up Parsing
Shift Reduction Algorithms
 Input Sequence – input to be parsed
 Parse Stack – input is shifted onto the
parse stack
 ACTION Table – what the parser does
 GOTO Table – holds state symbols to be
pushed onto the stack when a reduction is
Bottom-Up Parsing
ACTION Table (or Parse Table)
 Rows = State Symbols
 Columns = Terminal symbols
 Shift – push token on stack
 Reduce – replace handle with LHS
 Accept – stack only has start symbol and
input is empty
 Error – original input is invalid
Bottom-Up Parsing
GOTO Table (or Parse Table)
 Rows = State Symbols
 Columns = Nonterminal Symbols
Values indicate which state symbol
should be pushed onto the parse stack
after a reduction has been completed

similar documents