UNIT-3 Syntax directed translation

UNIT-3 Syntax directed translationIn syntax directed translation, along with the grammar we associate some informal notations and these notations are called as semantic rules.So we can say thatGrammar + semantic rule = SDT (syntax directed translation)o In syntax directed translation, every non-terminal can get one or more than one attribute or someti0 attribute depending on the type of the attribute. The value of these attributes is evaluated by semantic rules associated with the production rule.o In the semantic rule, attribute is VAL and an attribute may hold anything like a string, a numbermemory location and a complex recordExample	1.	esthe, at is
E.val is one of the attributes of E.num.lexval is the attribute returned by the lexical analyzer.Syntax directed translationIn syntax directed translation, along with the grammar we associate some informal notations and these notations are called as semantic rules.So we can say thatGrammar + semantic rule = SDT (syntax directed translation)o In syntax directed translation, every non-terminal can get one or more than one attribute or someti0 attribute depending on the type of the attribute. The value of these attributes is evaluated by semantic rules associated with the production rule.o In the semantic rule, attribute is VAL and an attribute may hold anything like a string, a numbememory location and a complex record	1.	es ther, a

o In Syntax directed translation, whenever a construct encounters in the programming language then itranslated according to the semantic rules define in that particular programming language.

Production	Semantic Rules
E → E + T	E.val := E.val + T.val
E → T	E.val := T.val
T → T * F	T.val := T.val + F.val
T → F	T.val := F.val
F → (F)	F.val := F.val
F → num	F.val := num.lexval

o In Syntax directed translation, whenever a construct encounters in the programming language then itranslated according to the semantic rules define in that particular programming language.Example	t is
E.val is one of the attributes of E.num.lexval is the attribute returned by the lexical analyzer.Implementation of Syntax directed translationSyntax direct translation is implemented by constructing a parse tree and performing the actions in a left right depth first order.SDT is implementing by parse the input and produce a parse tree as a result.Example	to

Production	Semantic Rules
E → E + T	E.val := E.val + T.val
E → T	E.val := T.val
T → T * F	T.val := T.val + F.val
T → F	T.val := F.val
F → (F)	F.val := F.val
F → num	F.val := num.lexval

Production	Semantic Rules
S → E $	{ printE.VAL }
E → E + E	{E.VAL := E.VAL + E.VAL }
E → E * E	{E.VAL := E.VAL * E.VAL }
E → (E)	{E.VAL := E.VAL }
E → I	{E.VAL := I.VAL }
I → I digit	{I.VAL := 10 * I.VAL + LEXVAL }
I → digit	{ I.VAL:= LEXVAL}

Parse tree for SDT:

Fig: Parse tree

Intermediate code

Intermediate code is used to translate the source code into the machine code. Intermediate code lies between the high-level language and the machine language.

Fig: Position of intermediate code generator

o If the compiler directly translates source code into the machine code without generating intermediate

code then a full native compiler is required for each new machine.

o The intermediate code keeps the analysis portion same for all the compilers that's why it doesn't need a

full compiler for every unique machine.

o Intermediate code generator receives input from its predecessor phase and semantic analyzer phase. It

takes input in the form of an annotated syntax tree.

o Using the intermediate code, the second phase of the compiler synthesis phase is changed according to

the target machine.

Intermediate representationIntermediate code can be represented in two ways:1. High Level intermediate code:High level intermediate code can be represented as source code. To enhance performance of source code, can easily apply code modification. But to optimize the target machine, it is less preferred.2. Low Level intermediate codeLow level intermediate code is close to the target machine, which makes it suitable for register and memallocation etc. it is used for machine-dependent optimizations.Postfix Notationo Postfix notation is the useful form of intermediate code if the given language is expressions.o Postfix notation is also called as 'suffix notation' and 'reverse polish'.o Postfix notation is a linear representation of a syntax tree.o In the postfix notation, any expression can be written unambiguously without parentheses.o The ordinary (infix) way of writing the sum of x and y is with operator in the middle: x * y. But in postfix notation, we place the operator at the right end as xy *.o In postfix notation, the operator follows the operand.ExampleProductionE → E1 op E2E → (E1)E → id	we ory the	1. 2. 3.
Parse tree and Syntax treeWhen you create a parse tree then it contains more details than actually needed. So, it is very difficult compiler to parse the parse tree. Take the following parse tree as an example:o In the parse tree, most of the leaf nodes are single child to their parent nodes.o In the syntax tree, we can eliminate this extra information.	to

Semantic Rule	Program fragment
E.code = E1.code \|\| E2.code \|\| op	print op
E.code = E1.code
E.code = id	print id

o Syntax tree is a variant of parse tree. In the syntax tree, interior nodes are operators and leaves are

operands.

o Syntax tree is usually used when represent a program in a tree structure.

A sentence id + id * id would have the following syntax tree:

Abstract syntax tree can be represented as:

Abstract syntax trees are important data structures in a compiler. It contains the least unnecessary information. Abstract syntax trees are more compact than a parse tree and can be easily used by a compiler.

Three address code

o Three-address code is an intermediate code. It is used by the optimizing compilers.

o In three-address code, the given expression is broken down into several separate instructions. These

instructions can easily translate into assembly language.

o Each Three address code instruction has at most three operands. It is a combination of assignment and

a binary operator.

Example

GivenExpression:

1. a := (-c * b) + (-c * d)

Three-address code is as follows: t1 := -c

t2 := b*t1

t3 := -c t4 := d * t3 t5 := t2 + t4 a := t5t is used as registers in the target program.The three address code can be represented in two forms: quadruples and triples.QuadruplesThe quadruples have four fields to implement the three address code. The field of quadruples contains name of the operator, the first source operand, the second source operand and the result respectively.Fig: Quadruples fieldExamplea := -b * c + dThree-address code is as follows: t1 := -b t2 := c + dt3 := t1 * t2a := t3These statements are represented by quadruples as follows:	the	1.
TriplesThe triples have three fields to implement the three address code. The field of triples contains the name of operator, the first source operand and the second source operand.	the

Operator	Source 1	Source 2	Destination
(0)	uminus	b	-	t1
(1)	+	c	d	t2
(2)	*	t1	t2	t3
(3)	:=	t3	-	a

In triples, the results of respective sub-expressions are denoted by the position of expression. Triple equivalent to DAG while representing expressions.Fig: Triples fieldExample:a := -b * c + dThree address code is as follows: t1 := -b t2 := c + dM t3 := t1 * t2 a := t3These statements are represented by triples as follows:	is	1.
Translation of Assignment StatementsIn the syntax directed translation, assignment statement is mainly deals with expressions. The expression can be of type real, integer, array and records.Consider the grammarS → id := EE → E1 + E2E → E1 * E2E → (E1)E → idThe translation scheme of above grammar is given below:	1. 2. 3. 4. 5.

Operator	Source 1	Source 2
(0)	uminus	b	-
(1)	+	c	d
(2)	*	(0)	(1)
(3)	:=	(2)	-

Production rule	Semantic actions
S → id :=E	{p = look_up(id.name); If p ≠ nil then Emit (p = E.place) Else Error; }
E → E1 + E2	{E.place = newtemp(); Emit (E.place = E1.place '+' E2.place) }
E → E1 * E2	{E.place = newtemp(); Emit (E.place = E1.place '*' E2.place) }
E → (E1)	{E.place = E1.place}
E → id	{p = look_up(id.name); If p ≠ nil then Emit (p = E.place) Else Error; }

o The p returns the entry for id.name in the symbol table.

o The Emit function is used for appending the three address code to the output file.

o Otherwise it will report an error.

o The newtemp() is a function used to generate new temporary variables.

o E.place holds the value of E.

Boolean expressions

Boolean expressions have two primary purposes. They are used for computing the logical values. They are also used as conditional expression using if-then-else or while-do.

Consider the grammar

Production rule	Semantic actions
E → E1 OR E2	{E.place = newtemp();
Emit (E.place ':=' E1.place 'OR' E2.place)
}
E → E1 + E2	{E.place = newtemp();
Emit (E.place ':=' E1.place 'AND' E2.place)
}
E → NOT E1	{E.place = newtemp();
Emit (E.place ':=' 'NOT' E1.place)
}
E → (E1)	{E.place = E1.place}
E → id relop id2	{E.place = newtemp();
Emit ('if' id1.place relop.op id2.place 'goto'
nextstar + 3);
EMIT (E.place ':=' '0')
EMIT ('goto' nextstat + 2)
EMIT (E.place ':=' '1')
}
E → TRUE	{E.place := newtemp();
Emit (E.place ':=' '1')
}
E → FALSE	{E.place := newtemp();
Emit (E.place ':=' '0')
}

1. E → E OR E

2. E → E AND E

3. E → NOT E

4. E → (E)

5. E → id relop id

6. E → TRUE

7. E → FALSE

The relop is denoted by <, >, <, >.

The AND and OR are left associated. NOT has the higher precedence then AND and lastly OR.

The EMIT function is used to generate the three address code and the newtemp( ) function is used to generate the temporary variables.

The E → id relop id2 contains the next_state and it gives the index of next three address statements in the output sequence.

Here is the example which generates the three address code using the above translation scheme:

1. p>q AND r<s OR u>r

2. 100: if p>q goto 103

3. 101: t1:=0

4. 102: goto 104

5. 103: t1:=1

6. 104: if r>s goto 107

7. 105: t2:=0

8. 106: goto 108

9. 107: t2:=1

10. 108: if u>v goto 111

11. 109: t3:=0

12. 110: goto 112

13. 111: t3:= 1

14. 112: t4:= t1 AND t2

15. 113: t5:= t4 OR t3

Statements that alter the flow of control

The goto statement alters the flow of control. If we implement goto statements then we need to define a LABEL for a statement. A production can be added for this purpose:

1. S → LABEL : S

2. LABEL → id

In this production system, semantic action is attached to record the LABEL and its value in the symbol table.

Following grammar used to incorporate structure flow-of-control constructs

1. S → if E then S

234567123456789

. S → if E then S else S. S → while E do S. S → begin L end. S→ A. L→ L ; S. L → SHere, S is a statement, L is a statement-list, A is an assignment statement and E is a Boolean-valexpression.Translation scheme for statement that alters flow of controlo We introduce the marker non-terminal M as in case of grammar for Boolean expression.o This M is put before statement in both if then else. In case of while-do, we need to put M before E we need to come back to it after executing S.o In case of if-then-else, if we evaluate E to be true, first S will be executed.o After this we should ensure that instead of second S, the code after the if-then else will be executThen we place another non-terminal marker N after first S.The grammar is as follows:. S → if E then M S. S → if E then M S else M S. S → while M E do M S. S → begin L end. S → A. L→ L ; M SThe translation scheme for this grammar is as follows:

ued ased.

. L → S

. M → ∈

. N → ∈

Production rule	Semantic actions
S → if E then M S1	BACKPATCH (E.TRUE, M.QUAD) S.NEXT = MERGE (E.FALSE, S1.NEXT)

S → if E then M1 S1 else M2 S2	BACKPATCH (E.TRUE, M1.QUAD) BACKPATCH (E.FALSE, M2.QUAD) S.NEXT = MERGE (S1.NEXT, N.NEXT, S2.NEXT)
S → while M1 E do M2 S1	BACKPATCH (S1,NEXT, M1.QUAD) BACKPATCH (E.TRUE, M2.QUAD) S.NEXT = E.FALSE GEN (goto M1.QUAD)
S → begin L end	S.NEXT = L.NEXT
S → A	S.NEXT = MAKELIST ()
L → L ; M S	BACKPATHCH (L1.NEXT, M.QUAD) L.NEXT = S.NEXT
L → S	L.NEXT = S.NEXT
M → ∈	M.QUAD = NEXTQUAD
N→ ∈	N.NEXT = MAKELIST (NEXTQUAD) GEN (goto_)

Postfix TranslationIn a production A → α, the translation rule of A.CODE consists of the concatenation of the COtranslations of the non-terminals in α in the same order as the non-terminals appear in α.Production can be factored to achieve postfix form.Postfix translation of while statementThe production. S → while M1 E do M2 S1Can be factored as:. W → while

1123

. S → C S1

. C → W E do

A suitable transition scheme would be
Postfix translation of for statementThe production. S for L = E1 step E2 to E3 do S1Can be factored a. F → for L. T → F = E1 by E2 to E3 do. S → T S1Array references in arithmetic expressionsElements of arrays can be accessed quickly if the elements are stored in a block of consecutive location. Arcan be one dimensional or two dimensional.For one dimensional array:. A: array[low..high] of the ith elements is at:. base + (i-low)width → iwidth + (base - lowwidth)Multi-dimensional arrays:Row major or column major formso Row major: a[1,1], a[1,2], a[1,3], a[2,1], a[2,2], a[2,3]o Column major: a[1,1], a[2,1], a[1, 2], a[2, 2],a[1, 3],a[2,3] o In raw major form, the address of a[i1, i2] is o Base+((i1-low1)(high2-low2+1)+i2-low2)*widthTranslation scheme for array elementsLimit(array, j) returns nj=highj-lowj+1 place: the temporaryor variables	112312	ray

Production Rule	Semantic Action
W → while	W.QUAD = NEXTQUAD
C → W E do	C W E do
S→ C S1	BACKPATCH (S1.NEXT, S.NEXT = GEN (goto C.QUAD)

offset: offset from the base, null if not an array referenceThe production:. S → L := E. E → E+E. E → (E). E → L. L → Elist ]. L → id. Elist → Elist, E

12345678

. Elist → id[E

A suitable transition scheme for array elements would be:

Production Rule	Semantic Action
S → L := E	{if L.offset = null then emit(L.place ':=' E.place) else EMIT (L.place'['L.offset ']' ':=' E.place); }
E → E+E	{E.place := newtemp; EMIT (E.place ':=' E1.place '+' E2.place); }
E → (E)	{E.place := E1.place;}
E → L	{if L.offset = null then E.place = L.place else {E.place = newtemp; EMIT (E.place ':=' L.place '[' L.offset ']'); } }
L → Elist ]	{L.place = newtemp; L.offset = newtemp; EMIT (L.place ':=' c(Elist.array)); EMIT (L.offset ':=' Elist.place '*' width(Elist.array); }
L → id	{L.place = lookup(id.name); L.offset = null;

}
Elist → Elist, E	{t := newtemp; m := Elist1.ndim + 1; EMIT (t ':=' Elist1.place '*' limit(Elist1.array, m)); EMIT (t, ':=' t '+' E.place); Elist.array = Elist1.array; Elist.place := t; Elist.ndim := m; }
Elist → id[E	{Elist.array := lookup(id.name); Elist.place := E.place Elist.ndim := 1; }

Where:ndim denotes the number of dimensions.limit(array, i) function returns the upper limit along with the dimension of array width(array) returns the number of byte for one element of array.Procedures callProcedure is an important and frequently used programming construct for a compiler. It is used to genergood code for procedure calls and returns.Calling sequence:The translation for a call includes a sequence of actions taken on entry and exit from each procedFollowing actions take place in a calling sequence:o When a procedure call occurs then space is allocated for activation record.o Evaluate the argument of the called procedure.o Establish the environment pointers to enable the called procedure to access data in enclosing blocks.o Save the state of the calling procedure so that it can resume execution after the call.o Also save the return address. It is the address of the location to which the called routine must transafter it is finished.o Finally generate a jump to the beginning of the code for the called procedure.Let us consider a grammar for a simple procedure call statement. S → call id(Elist). Elist → Elist, E. Elist → E

ate re. fer

123

A suitable transition scheme for procedure call would be:
Queue is used to store the list of parameters in the procedure call.DeclarationsWhen we encounter declarations, we need to lay out storage for the declared variables. For every local name in a procedure, we create a ST(Symbol Table) entry containing:1. The type of the name2. How much storage the name requiresThe production:. D → integer, id. D → real, id	123
ENTER is used to make the entry into symbol table and ATTR is used to trace the data type.Case StatementsSwitch and case statement is available in a variety of languages. The syntax of case statement is as follows:

Production Rule	Semantic Action
S → call id(Elist)	for each item p on QUEUE do GEN (param p) GEN (call id.PLACE)
Elist → Elist, E	append E.PLACE to the end of QUEUE
Elist → E	initialize QUEUE to contain only E.PLACE

. D → D1, id

A suitable transition scheme for declarations would be:

Production rule	Semantic action
D → integer, id	ENTER (id.PLACE, integer) D.ATTR = integer
D → real, id	ENTER (id.PLACE, real) D.ATTR = real
D → D1, id	ENTER (id.PLACE, D1.ATTR) D.ATTR = D1.ATTR

1. switch E

2. begin

3. case V1: S1

4. case V2: S2 .

5. case Vn-1: Sn-1

6. default: Sn

7.	endThe translation scheme for this shown below:

Code to evaluate E into T

1. goto TEST

2. L1: code for S1

3. goto NEXT

4. L2: code for S2

5. goto NEXT

6. .Ln-1: code for Sn-1

7. goto NEXT

8. Ln: code for Sn

9. goto NEXT

10. TEST: if T = V1 goto L1

11. if T = V2goto L2.

12. if T = Vn-1 goto Ln-1

13. goto

14. NEXT:

o When switch keyword is seen then a new temporary T and two new labels test and next are generated.

o When the case keyword occurs then for each case keyword, a new label Li is created and entered into

the symbol table. The value of Vi of each case constant and a pointer to this symbol-table entry are

placed on a stack.

Search This Blog

ENGINEERING NOTES

UNIT-3 Syntax directed translation

Comments

Post a Comment

Popular posts from this blog

Compiler Design UNIT-1

COA- Unit -5 Peripheral DeviceS

COA-UNIT-3 Control Unit