UNIT 2 BASIC PARSING TECHNIQUES
UNIT 2 BASIC PARSING TECHNIQUES
Parser
Parser is a compiler that is used to break the data into smaller
elements coming from lexical analysis phase.
A parser takes input in the form of sequence of tokens and
produces output in the form of parse tree.
Parsing is of two types: top down parsing and bottom up parsing.

Top down paring
- The
     top down parsing is known as recursive parsing or predictive parsing.
- Bottom
     up parsing is used to construct a parse tree for an input string.
- In
     the top down parsing, the parsing starts from the start symbol and
     transform it into the input symbol.
Parse Tree representation of input string "acdb" is as
follows:

Bottom up parsing
- Bottom
     up parsing is also known as shift-reduce parsing.
- Bottom
     up parsing is used to construct a parse tree for an input string.
- In
     the bottom up parsing, the parsing starts with the input symbol and
     construct the parse tree up to the start symbol by tracing out the
     rightmost derivations of string in reverse.
Example
Production
1.     
E → T  
2.     
T → T * F  
3.     
T → id  
4.     
T  → F
5.     
F → id  
Parse Tree representation of input string "id * id" is
as follows:


Bottom up parsing is classified in to various parsing. These are
as follows:
- Shift-Reduce
     Parsing
- Operator
     Precedence Parsing
- Table
     Driven LR Parsing
- LR(
     1 )
- SLR(
     1 )
- CLR
     ( 1 )
- LALR(
     1 )
| Shift reduce parsing 
 
 
 Example: Grammar: 1.     
  S → S+S     2.     
  S → S-S     3.     
  S → (S)   4.     
  S → a   Input string: 1.     
  a1-(a2+a3)   Parsing table: 
 There are two main
  categories of shift reduce parsing as follows: 
 | 
Operator precedence
parsing
Operator precedence grammar is kinds of shift reduce parsing
method. It is applied to a small class of operator grammars.
A grammar is said to be operator precedence grammar if it has
two properties:
- No
     R.H.S. of any production has a∈.
- No
     two non-terminals are adjacent.
Operator precedence can only established between the terminals
of the grammar. It ignores the non-terminal.
There are the three operator precedence
relations:
a ⋗ b means that terminal
"a" has the higher precedence than terminal "b".
a ⋖ b means that terminal
"a" has the lower precedence than terminal "b".
a ≐ b means that the
terminal "a" and "b" both have same precedence.
Precedence table:

Parsing Action
- Both
     end of the given input string, add the $ symbol.
- Now
     scan the input string from left right until the ⋗ is encountered.
- Scan
     towards left over all the equal precedence until the first left most ⋖ is encountered.
- Everything
     between left most ⋖ and right most ⋗
     is a handle.
- $
     on $ means parsing is successful.
Example
Grammar:
1.     
E → E+T/T  
2.     
T → T*F/F  
3.     
F → id  
Given string:
1.     
w = id + id * id  
Let us consider a parse tree for it as follows:

On the basis of above tree, we can design following operator
precedence table:

Now let us process the string with the help of the above
precedence table:

, Advance Java, .
Javatpoint.comHindi100.comLyricsia.comQuoteperson.comJobandplacement.com
LR parser :
LR parser is a bottom-up parser for context-free grammar that is very generally
used by computer programming language compiler and other associated tools. LR
parser reads their input from left to right and produces a right-most
derivation. It is called a Bottom-up parser because it attempts to reduce the
top-level grammar productions by building up from the leaves. LR parsers are
the most powerful parser of all deterministic parsers in practice.

Description of LR parser :
The term parser LR(k) parser, here the L refers to the left-to-right scanning,
R refers to the rightmost derivation in reverse and k refers to the number of
unconsumed “look ahead” input symbols that are used in making parser decisions.
Typically, k is 1 and is often omitted. A context-free grammar is called LR (k)
if the LR (k) parser exists for it. This first reduces the sequence of tokens
to the left. But when we read from above, the derivation order first extends to
non-terminal.
1.      The stack is empty, and we
are looking to reduce the rule by S’→S$.
2.      Using a “.” in the rule
represents how many of the rules are already on the stack.
3.      A dotted item, or simply,
the item is a production rule with a dot indicating how much RHS has so far
been recognized. Closing an item is used to see what production rules can be
used to expand the current structure. It is calculated as follows:
Rules for LR parser :
The rules of LR parser as follows.
1.      The first item from the
given grammar rules adds itself as the first closed set.
2.      If an object is present in
the closure of the form A→ α. β. γ, where the next symbol after the symbol is
non-terminal, add the symbol’s production rules where the dot precedes the
first item.
3.      Repeat steps (B) and (C)
for new items added under (B).
LR
parser algorithm:
LR Parsing algorithm is the same for all the parser, but the parsing table is
different for each parser. It consists following components as follows.
1.      Input Buffer – 
It contains the given string, and it ends with a $ symbol.
 
2.      Stack – 
The combination of state symbol and current input symbol is used to refer to
the parsing table in order to take the parsing decisions.
Parsing Table : 
Parsing table is divided into two parts- Action table and Go-To table.
The action table gives a grammar rule to implement the given
current state and current terminal in the input stream. There are four cases
used in action table as follows.
1.      Shift Action- In shift
action the present terminal is removed from the input stream and the
state n is pushed onto the stack, and it becomes the
new present state.
2.      Reduce Action- The
number m is written to the output stream.
3.      The symbol m mentioned
in the left-hand side of rule m says that state is
removed from the stack.
4.      The symbol m mentioned
in the left-hand side of rule m says that a new state is
looked up in the goto table and made the new current state by pushing it onto
the stack.
An
accept - the string is accepted
No
action - a syntax error is reported
Note –
The go-to table indicates which state should proceed.
LR parser diagram :

LR Parser
LR parsing is one type of bottom up parsing. It is used to parse
the large class of grammars.
In the LR parsing, "L" stands for left-to-right
scanning of the input.
"R" stands for constructing a right most derivation in
reverse.
"K" is the number of input symbols of the look ahead
used to make number of parsing decision.
LR parsing is divided into four parts:
LR (0) parsing, SLR parsing, CLR parsing and LALR parsing.

LR algorithm:
The LR algorithm requires stack, input,
output and parsing table. In all type of LR parsing, input, output and stack
are same but parsing table is different.

Fig: Block diagram of LR parser
Input buffer is used to indicate end of
input and it contains the string to be parsed followed by a $ Symbol.
A stack is used to contain a sequence of
grammar symbols with a $ at the bottom of the stack.
Parsing table is a two dimensional
array. It contains two parts: Action part and Go To part.
LR (1) Parsing
Various steps involved in the LR (1)
Parsing:
- For the given input string write a
     context free grammar.
- Check the ambiguity of the grammar.
- Add Augment production in the given
     grammar.
- Create Canonical collection of LR
     (0) items.
- Draw a data flow diagram (DFA).
- Construct a LR (1) parsing table.
Augment Grammar
Augmented grammar G` will be generated
if we add one more production in the given grammar G. It helps the parser to
identify when to stop the parsing and announce the acceptance of the input.
Example
Given grammar
1.     
S → AA  
2.     
A → aA | b  
The Augment grammar G` is represented by
1.     
S`→ S  
2.     
S → AA  
3.     
A → aA | b  
Canonical Collection
of LR(0) items
An LR (0) item is a production G with dot at some position on
the right side of the production.
LR(0) items is useful to indicate that how much of the input has
been scanned up to a given point in the process of parsing.
In the LR (0), we place the reduce node in the entire row.
Example
Given grammar:
1.     
S → AA  
2.     
A → aA | b  
Add Augment Production and insert '•' symbol at the first
position for every production in G
1.     
S` → •S  
2.     
S → •AA  
3.     
A → •aA   
4.     
A → •b  
I0 State:
Add Augment production to the I0 State and Compute the Closure
I0 = Closure (S` → •S)
Add all productions starting with S in to I0 State because
"•" is followed by the non-terminal. So, the I0 State becomes
I0 = S` → •S
       S → •AA
Add all productions starting with "A" in modified I0
State because "•" is followed by the non-terminal. So, the I0 State
becomes.
I0= S` → •S
       S → •AA
       A → •aA
       A → •b
I1= Go to (I0, S) =
closure (S` → S•) = S` → S•
Here, the Production is reduced so close the State.
I1= S` → S•
I2= Go to (I0, A) =
closure (S → A•A)
Add all productions starting with A in to I2 State because
"•" is followed by the non-terminal. So, the I2 State becomes
I2 =S→A•A
       A → •aA
       A → •b
Go to (I2,a) = Closure (A → a•A) = (same as I3)
Go to (I2, b) = Closure (A → b•) = (same as I4)
I3= Go to (I0,a) =
Closure (A → a•A)
Add productions starting with A in I3.
A → a•A
A → •aA
A → •b
Go to (I3, a) = Closure (A → a•A) = (same as I3)
Go to (I3, b) = Closure (A → b•) = (same as I4)
I4= Go to (I0, b) =
closure (A → b•) = A → b•
I5= Go to (I2, A) = Closure (S → AA•) = SA → A•
I6= Go to (I3, A) = Closure (A → aA•) = A → aA•
Drawing DFA:
The DFA contains the 7 states I0 to I6.

LR(0) Table
- If
     a state is going to some other state on a terminal then it correspond to a
     shift move.
- If
     a state is going to some other state on a variable then it correspond to
     go to move.
- If
     a state contain the final item in the particular row then write the reduce
     node completely.

Explanation:
- I0
     on S is going to I1 so write it as 1.
- I0
     on A is going to I2 so write it as 2.
- I2
     on A is going to I5 so write it as 5.
- I3
     on A is going to I6 so write it as 6.
- I0,
     I2and I3on a are going to I3 so write it as S3 which means that shift 3.
- I0,
     I2 and I3 on b are going to I4 so write it as S4 which means that shift 4.
- I4,
     I5 and I6 all states contains the final item because they contain • in the
     right most end. So rate the production as production number.
Productions are numbered as follows:
1.     
S  →      AA    ... (1)                              
2.     
A   →     aA      ... (2)   
3.     
A    →    b     ... (3)  
- I1
     contains the final item which drives(S` → S•), so action {I1, $} = Accept.
- I4
     contains the final item which drives A → b• and that production
     corresponds to the production number 3 so write it as r3 in the entire
     row.
- I5
     contains the final item which drives S → AA• and that production
     corresponds to the production number 1 so write it as r1 in the entire
     row.
- I6
     contains the final item which drives A → aA• and that production
     corresponds to the production number 2 so write it as r2 in the entire
     row.
SLR (1) Parsing
SLR (1) refers to simple LR Parsing. It is same as LR(0)
parsing. The only difference is in the parsing table.To construct SLR (1)
parsing table, we use canonical collection of LR (0) item.
In the SLR (1) parsing, we place the reduce move only in the
follow of left hand side.
Various steps involved in the SLR (1) Parsing:
- For
     the given input string write a context free grammar
- Check
     the ambiguity of the grammar
- Add
     Augment production in the given grammar
- Create
     Canonical collection of LR (0) items
- Draw
     a data flow diagram (DFA)
- Construct
     a SLR (1) parsing table
SLR (1) Table Construction
The steps which use to construct SLR (1) Table is given below:
If a state (Ii) is going to some other state (Ij)
on a terminal then it corresponds to a shift move in the action part.

If a state (Ii) is going to some other state (Ij)
on a variable then it correspond to go to move in the Go to part.

If a state (Ii) contains the final item like A → ab•
which has no transitions to the next state then the production is known as
reduce production. For all terminals X in FOLLOW (A), write the reduce entry
along with their production numbers.
Example
1.     
S -> •Aa   
2.     
  A->αβ•   
1.     
Follow(S) = {$}  
2.     
Follow (A) = {a}  

SLR ( 1 ) Grammar
S → E
E → E + T | T
T → T * F | F
F → id
Add Augment Production and insert '•' symbol at the first
position for every production in G
S` → •E
E → •E + T
E → •T
T → •T * F
T → •F
F → •id
I0 State:
Add Augment production to the I0 State and Compute the Closure
I0 = Closure (S` →
•E)
Add all productions starting with E in to I0 State because
"." is followed by the non-terminal. So, the I0 State becomes
I0 = S` → •E
        E → •E + T
        E → •T
Add all productions starting with T and F in modified I0 State
because "." is followed by the non-terminal. So, the I0 State
becomes.
I0= S` → •E
       E → •E + T
       E → •T
       T → •T * F
       T → •F
       F → •id
I1= Go to (I0, E) =
closure (S` → E•, E → E• + T)
I2= Go to (I0, T) = closure (E → T•T, T• → * F)
I3= Go to (I0, F) = Closure ( T → F• ) = T → F•
I4= Go to (I0, id) = closure ( F → id•) = F → id•
I5= Go to (I1, +) = Closure (E → E +•T)
Add all productions starting with T and F in I5 State because
"." is followed by the non-terminal. So, the I5 State becomes
I5 = E → E +•T
       T → •T * F
       T → •F
       F → •id
Go to (I5, F) = Closure (T → F•) = (same as I3)
Go to (I5, id) = Closure (F → id•) = (same as I4)
I6= Go to (I2, *) = Closure
(T → T * •F)
Add all productions starting with F in I6 State because
"." is followed by the non-terminal. So, the I6 State becomes
I6 = T → T * •F
         F → •id
Go to (I6, id) = Closure (F → id•) = (same as I4)
I7= Go to (I5, T) =
Closure (E → E + T•) = E → E + T•
I8= Go to (I6, F) = Closure (T → T * F•) = T → T * F•
Drawing DFA:

SLR (1) Table

Explanation:
First (E) = First (E + T) ∪ First (T)
First (T) = First (T * F) ∪ First (F)
First (F) = {id}
First (T) = {id}
First (E) = {id}
Follow (E) = First (+T) ∪ {$} = {+, $}
Follow (T) = First (*F) ∪ First (F)
              
= {*, +, $}
Follow (F) = {*, +, $}
- I1
     contains the final item which drives S → E• and follow (S) = {$}, so
     action {I1, $} = Accept
- I2
     contains the final item which drives E → T• and follow (E) = {+, $}, so
     action {I2, +} = R2, action {I2, $} = R2
- I3
     contains the final item which drives T → F• and follow (T) = {+, *, $}, so
     action {I3, +} = R4, action {I3, *} = R4, action {I3, $} = R4
- I4
     contains the final item which drives F → id• and follow (F) = {+, *, $},
     so action {I4, +} = R5, action {I4, *} = R5, action {I4, $} = R5
- I7
     contains the final item which drives E → E + T• and follow (E) = {+, $},
     so action {I7, +} = R1, action {I7, $} = R1
- I8
     contains the final item which drives T → T * F• and follow (T) = {+, *,
     $}, so action {I8, +} = R3, action {I8, *} = R3, action {I8, $} = R3.
CLR (1) Parsing
CLR refers to canonical lookahead. CLR parsing use the canonical
collection of LR (1) items to build the CLR (1) parsing table. CLR (1) parsing
table produces the more number of states as compare to the SLR (1) parsing.
In the CLR (1), we place the reduce node only in the lookahead
symbols.
Various steps involved in the CLR (1) Parsing:
- For
     the given input string write a context free grammar
- Check
     the ambiguity of the grammar
- Add
     Augment production in the given grammar
- Create
     Canonical collection of LR (0) items
- Draw
     a data flow diagram (DFA)
- Construct
     a CLR (1) parsing table
LR (1) item
LR (1) item is a collection of LR (0) items and a look ahead
symbol.
LR (1) item = LR (0) item + look ahead
The look ahead is used to determine that where we place the
final item.
The look ahead always add $ symbol for the argument production.
Example
CLR ( 1 ) Grammar
1.     
S → AA  
2.     
A → aA  
3.     
A → b  
Add Augment Production, insert '•' symbol at the first position
for every production in G and also add the lookahead.
1.     
S` → •S, $  
2.     
S  → •AA, $  
3.     
A  → •aA, a/b   
4.     
A → •b, a/b  
I0 State:
Add Augment production to the I0 State and Compute the Closure
I0 = Closure (S` →
•S)
Add all productions starting with S in to I0 State because
"." is followed by the non-terminal. So, the I0 State becomes
I0 = S` → •S, $
        S → •AA, $
Add all productions starting with A in modified I0 State because
"." is followed by the non-terminal. So, the I0 State becomes.
I0=  S` → •S, $
        S → •AA, $
        A → •aA, a/b
        A → •b, a/b
I1= Go to (I0, S) =
closure (S` → S•, $) = S` → S•, $
I2= Go to (I0, A) = closure ( S → A•A, $ )
Add all productions starting with A in I2 State because
"." is followed by the non-terminal. So, the I2 State becomes
I2= S → A•A, $
       A → •aA, $
       A → •b, $
I3= Go to (I0, a) =
Closure ( A → a•A, a/b )
Add all productions starting with A in I3 State because
"." is followed by the non-terminal. So, the I3 State becomes
I3= A
→ a•A, a/b
       A → •aA, a/b
       A → •b, a/b
Go to (I3, a) = Closure (A → a•A, a/b) = (same as I3)
Go to (I3, b) = Closure (A → b•, a/b) = (same as I4)
I4= Go to (I0, b) =
closure ( A → b•, a/b) = A → b•, a/b
I5= Go to (I2, A) = Closure (S → AA•, $) =S → AA•, $
I6= Go to (I2, a) = Closure (A → a•A, $)
Add all productions starting with A in I6 State because
"." is followed by the non-terminal. So, the I6 State becomes
I6 = A → a•A, $
       A → •aA, $
       A → •b, $
Go to (I6, a) = Closure (A → a•A, $) = (same as I6)
Go to (I6, b) = Closure (A → b•, $) = (same as I7)
I7= Go to (I2, b) =
Closure (A → b•, $) = A → b•, $
I8= Go to (I3, A) = Closure (A → aA•, a/b) = A → aA•, a/b
I9= Go to (I6, A) = Closure (A → aA•, $) = A → aA•, $
Drawing DFA:

CLR (1) Parsing table:

Productions are numbered as follows:
1.     
S  →  AA      ... (1)                                  
2.     
  A  → aA       ....(2)     
3.     
  A  →  b     ... (3)  
The placement of shift node in CLR (1) parsing table is same as
the SLR (1) parsing table. Only difference in the placement of reduce node.
I4 contains the final item which drives ( A → b•, a/b), so
action {I4, a} = R3, action {I4, b} = R3.
I5 contains the final item which drives ( S → AA•, $), so action {I5, $} = R1.
I7 contains the final item which drives ( A → b•,$), so action {I7, $} = R3.
I8 contains the final item which drives ( A → aA•, a/b), so action {I8, a} =
R2, action {I8, b} = R2.
I9 contains the final item which drives ( A → aA•, $), so action {I9, $} = R2.
LALR (1) Parsing:
LALR refers to the lookahead LR. To construct the LALR (1)
parsing table, we use the canonical collection of LR (1) items.
In
the LALR (1) parsing, the LR (1) items which have same productions but
different look ahead are combined to form a single set of items
LALR (1) parsing is same as the CLR (1) parsing, only difference
in the parsing table.
Example
LALR ( 1 ) Grammar
1.     
S → AA  
2.     
A  → aA  
3.     
A → b  
Add Augment Production, insert '•' symbol at the first position
for every production in G and also add the look ahead.
1.     
S` → •S, $  
2.     
S  → •AA, $  
3.     
A  → •aA, a/b   
4.     
A  → •b, a/b  
I0 State:
Add Augment production to the I0 State and Compute the ClosureL
I0 = Closure (S` →
•S)
Add all productions starting with S in to I0 State because
"•" is followed by the non-terminal. So, the I0 State becomes
I0 = S`
→ •S, $
        S → •AA, $
Add all productions starting with A in modified I0 State because
"•" is followed by the non-terminal. So, the I0 State becomes.
I0= S` → •S, $
       S → •AA, $
       A → •aA, a/b
       A → •b, a/b
I1= Go to (I0, S) =
closure (S` → S•, $) = S` → S•, $
I2= Go to (I0, A) = closure ( S → A•A, $ )
Add all productions starting with A in I2 State because
"•" is followed by the non-terminal. So, the I2 State becomes
I2= S → A•A, $
       A → •aA, $
       A → •b, $
I3= Go to (I0, a) =
Closure ( A → a•A, a/b )
Add all productions starting with A in I3 State because
"•" is followed by the non-terminal. So, the I3 State becomes
I3= A → a•A, a/b
       A → •aA, a/b
       A → •b, a/b
Go to (I3, a) = Closure (A → a•A, a/b) = (same as I3)
Go to (I3, b) = Closure (A → b•, a/b) = (same as I4)
I4= Go to (I0, b) =
closure ( A → b•, a/b) = A → b•, a/b
I5= Go to (I2, A) = Closure (S → AA•, $) =S → AA•, $
I6= Go to (I2, a) = Closure (A → a•A, $)
Add all productions starting with A in I6 State because
"•" is followed by the non-terminal. So, the I6 State becomes
I6 = A → a•A, $
       A → •aA, $
       A → •b, $
Go to (I6, a) = Closure (A → a•A, $) = (same as I6)
Go to (I6, b) = Closure (A → b•, $) = (same as I7)
I7= Go to (I2, b) =
Closure (A → b•, $) = A → b•, $
I8= Go to (I3, A) = Closure (A → aA•, a/b) = A → aA•, a/b
I9= Go to (I6, A) = Closure (A → aA•, $) A → aA•, $
If we analyze then LR (0) items of I3 and I6 are same but they
differ only in their lookahead.
I3 = { A → a•A, a/b
      A → •aA, a/b
      A → •b, a/b
       }
I6= { A → a•A, $
      A → •aA, $
      A → •b, $
      }
Clearly I3 and I6 are same in their LR (0) items but differ in
their lookahead, so we can combine them and called as I36.
I36 = { A → a•A, a/b/$
       A → •aA, a/b/$
       A → •b, a/b/$
        }
The I4 and I7 are same but they differ only in their look ahead,
so we can combine them and called as I47.
I47 = {A → b•, a/b/$}
The I8 and I9 are same but they differ only in their look ahead,
so we can combine them and called as I89.
I89 = {A → aA•, a/b/$}
Drawing DFA:

LALR (1) Parsing table:

              Automatic Parser Generator
YACC is an automatic tool that generates
the parser program.
As we have discussed YACC in the first
unit of this tutorial so you can go through the concepts again to make things
more clear.
- ACC
     stands for Yet Another Compiler Compiler.
- YACC
     provides a tool to produce a parser for a given grammar.
- YACC
     is a program designed to compile a LALR (1) grammar.
- It
     is used to produce the source code of the syntactic analyzer of the
     language produced by LALR (1) grammar.
- The
     input of YACC is the rule or grammar and the output is a C program.
These are some points about YACC:
Input: A CFG- file.y
Output: A parser y.tab.c (yacc)
- The
     output file "file.output" contains the parsing tables.
- The
     file "file.tab.h" contains declarations.
- The
     parser called the yyparse ().
- Parser
     expects to use a function called yylex () to get tokens.
The basic operational sequence is as follows:

This file contains the desired grammar in YACC format.

It shows the YACC program.

It is the c source program created by YACC.

C Compiler

Executable file that will parse grammar given in gram.Y


 
 
Comments
Post a Comment