SDCC performs a host of standard optimizations in addition to some MCU specific optimizations.
The compiler does local and global common subexpression elimination,
e.g.:
i = x + y + 1;
j = x + y;
will be translated to
iTemp = x + y
i = iTemp + 1
j = iTemp
Some subexpressions are not as obvious as the above example, e.g.:
a->b[i].c = 10;
a->b[i].d = 11;
In this case the address arithmetic a->b[i] will be computed only
once; the equivalent code in C would be.
iTemp = a->b[i];
iTemp.c = 10;
iTemp.d = 11;
The compiler will try to keep these temporary variables in registers.
int global;
void f () {
int i;
i = 1; /* dead store */
global = 1; /* dead store */
global = 2;
return;
global = 3; /* unreachable */
}
will be changed to
int global; void f ()
{
global = 2;
return;
}
int f() {
int i, j;
i = 10;
j = i;
return j;
}
will be changed to
int f() {
int i,j;
i = 10;
j = 10;
return 10;
}
Note: the dead stores created by this copy propagation will be eliminated
by dead-code elimination.
Two types of loop optimizations are done by SDCC loop invariant lifting
and strength reduction of loop induction variables. In addition to
the strength reduction the optimizer marks the induction variables
and the register allocator tries to keep the induction variables in
registers for the duration of the loop. Because of this preference
of the register allocator, loop induction optimization causes an increase
in register pressure, which may cause unwanted spilling of other temporary
variables into the stack / data space. The compiler will generate
a warning message when it is forced to allocate extra space either
on the stack or data space. If this extra space allocation is undesirable
then induction optimization can be eliminated either for the entire
source file (with -noinduction option) or for a given function only
using #pragma NOINDUCTION.
Loop Invariant:
for (i = 0 ; i < 100 ; i ++)
f += k + l;
changed to
itemp = k + l;
for (i = 0; i < 100; i++)
f += itemp;
As mentioned previously some loop invariants are not as apparent,
all static address computations are also moved out of the loop.
Strength Reduction, this optimization substitutes an expression by
a cheaper expression:
for (i=0;i < 100; i++)
ar[i*5] = i*3;
changed to
itemp1 = 0;
itemp2 = 0;
for (i=0;i< 100;i++) {
ar[itemp1] = itemp2;
itemp1 += 5;
itemp2 += 3;
}
The more expensive multiplication is changed to a less expensive addition.
This optimization is done to reduce the overhead of checking loop boundaries for every iteration. Some simple loops can be reversed and implemented using a ``decrement and jump if not zero'' instruction. SDCC checks for the following criterion to determine if a loop is reversible (note: more sophisticated compilers use data-dependency analysis to make this determination, SDCC uses a more simple minded analysis).
SDCC does numerous algebraic simplifications, the following is a small
sub-set of these optimizations.
i = j + 0 ; /* changed to */ i = j;
i /= 2; /* changed to */ i >>= 1;
i = j - j ; /* changed to */ i = 0;
i = j / 1 ; /* changed to */ i = j;
Note the subexpressions given above are generally introduced by macro
expansions or as a result of copy/constant propagation.
SDCC changes switch statements to jump tables when the following conditions are true.
Bit shifting is one of the most frequently used operation in embedded
programming. SDCC tries to implement bit-shift operations in the most
efficient way possible, e.g.:
unsigned char i;
...
i>>= 4;
...
generates the following code:
mov a,_i
swap a
anl a,#0x0f
mov _i,a
In general SDCC will never setup a loop if the shift count is known.
Another example:
unsigned int i;
...
i >>= 9;
...
will generate:
mov a,(_i + 1)
mov (_i + 1),#0x00
clr c
rrc a
mov _i,a
Note that SDCC stores numbers in little-endian format (i.e. lowest
order first).
A special case of the bit-shift operation is bit rotation, SDCC recognizes
the following expression to be a left bit-rotation:
unsigned char i;
...
i = ((i << 1) | (i >>
7));
...
will generate the following code:
mov a,_i
rl a
mov _i,a
SDCC uses pattern matching on the parse tree to determine this operation.Variations
of this case will also be recognized as bit-rotation, i.e.:
i = ((i >> 7) | (i <<
1)); /* left-bit rotation */
It is frequently required to obtain the highest order bit of an integral
type (long, int, short or char types). SDCC recognizes the following
expression to yield the highest order bit and generates optimized
code for it, e.g.:
unsigned int gint;
foo () {
unsigned char hob;
...
hob = (gint >> 15) & 1;
..
}
will generate the following code:
61
; hob.c 7
000A E5*01 62
mov a,(_gint + 1)
000C 33 63
rlc a
000D E4 64
clr a
000E 13 65
rrc a
000F F5*02 66
mov _foo_hob_1_1,a
Variations of this case however will not be recognized. It
is a standard C expression, so I heartily recommend this be the only
way to get the highest order bit, (it is portable). Of course it will
be recognized even if it is embedded in other expressions, e.g.:
xyz = gint + ((gint >> 15) & 1);
will still be recognized.
The compiler uses a rule based, pattern matching and re-writing mechanism
for peep-hole optimization. It is inspired by copt a peep-hole
optimizer by Christopher W. Fraser (cwfraser@microsoft.com). A default
set of rules are compiled into the compiler, additional rules may
be added with the -peep-file <filename> option. The rule language
is best illustrated with examples.
replace {
mov %1,a
mov a,%1
} by {
mov %1,a
}
The above rule will change the following assembly sequence:
mov r1,a
mov a,r1
to
mov r1,a
Note: All occurrences of a %n (pattern variable) must denote
the same string. With the above rule, the assembly sequence:
mov r1,a
mov a,r2
will remain unmodified.
Other special case optimizations may be added by the user (via -peep-file
option). E.g. some variants of the 8051 MCU allow only ajmp
and acall. The following two rules will change all ljmp
and lcall to ajmp and acall
replace { lcall %1 } by { acall %1 }
replace { ljmp %1 } by { ajmp %1 }
The inline-assembler code is also passed through the peep hole
optimizer, thus the peephole optimizer can also be used as an assembly
level macro expander. The rules themselves are MCU dependent whereas
the rule language infra-structure is MCU independent. Peephole optimization
rules for other MCU can be easily programmed using the rule language.
The syntax for a rule is as follows:
rule := replace [ restart ] '{' <assembly sequence> '\n'
'}' by '{' '\n'
<assembly
sequence> '\n'
'}' [if <functionName>
] '\n'
<assembly sequence> := assembly instruction (each instruction including
labels must be on a separate line).
The optimizer will apply to the rules one by one from the top in the
sequence of their appearance, it will terminate when all rules are
exhausted. If the 'restart' option is specified, then the optimizer
will start matching the rules again from the top, this option for
a rule is expensive (performance), it is intended to be used in situations
where a transformation will trigger the same rule again. A good example
of this the following rule:
replace restart {
pop %1
push %1 } by {
; nop
}
Note that the replace pattern cannot be a blank, but can be a comment
line. Without the 'restart' option only the inner most 'pop' 'push'
pair would be eliminated, i.e.:
pop ar1
pop ar2
push ar2
push ar1
would result in:
pop ar1
; nop
push ar1
with the restart option the rule will be applied again to the
resulting code and then all the pop-push pairs will be eliminated
to yield:
; nop
; nop
A conditional function can be attached to a rule. Attaching rules
are somewhat more involved, let me illustrate this with an example.
replace {
ljmp %5
%2:
} by {
sjmp %5
%2:
} if labelInRange
The optimizer does a look-up of a function name table defined in function
callFuncByName in the source file SDCCpeeph.c, with the name
labelInRange. If it finds a corresponding entry the function
is called. Note there can be no parameters specified for these functions,
in this case the use of %5 is crucial, since the function
labelInRange expects to find the label in that particular variable
(the hash table containing the variable bindings is passed as a parameter).
If you want to code more such functions, take a close look at the
function labelInRange and the calling mechanism in source file SDCCpeeph.c.
I know this whole thing is a little kludgey, but maybe some day we
will have some better means. If you are looking at this file, you
will also see the default rules that are compiled into the compiler,
you can add your own rules in the default set there if you get tired
of specifying the -peep-file option.