I have a project in which I try to build a Firmware for a micro controller and try to get better control of the optimization flags used. I would like, instead of using the -O<number> flag specify the different optimization flags individually. Unfortunately there seems to be some optimization magic happening with the -O flag that I can not reproduce with individual optimization flags and I don't understand why.
Here is what I try and what is not working:
I know I can compile the Project with -O1. So I used the -Q and --help flags to output the Flags which are active when I activate the -O1 flag. I use this information to specify the different flags manually in my build process and compiling works fine but in the linking stage it fails because the .bss section does not fit into my RAM anymore (I only have 384 kByte available).
When I increase the RAM size in my linker script, linking works fine but the end of .bss section is placed at 416 kByte and the binary image is 75 % bigger then when using -O1 directly.
When I compare the flags and parameters reported by gcc, there is no difference between the two builds but the one without -O1 is still much bigger.
According to the GCC documentation (GCC Manual) does the -O flag only activate specific optimization flags, therefore it should be possible to do this manually as well (or not?)
Here are my gcc commands:
GCC call with single optimization flags
gcc -std=c99 -msoft-float -fno-inline -fdata-sections -ffunction-sections -Wall -Wextra\
-faggressive-loop-optimizations -fauto-inc-dec -fbranch-count-reg -fcombine-stack-adjustments\
-fcompare-elim -fcprop-registers -fdce -fdefer-pop -fdelayed-branch -fdelete-null-pointer-checks\
-fdse -fearly-inlining -ffast-math -fforward-propagate -ffp-contract=fast -ffp-int-builtin-inexact\
-ffunction-cse -fgcse-lm -fguess-branch-probability -fhandle-exceptions -fif-conversion -fif-conversion2\
-finline-atomics -finline-functions-called-once -fipa-profile -fipa-pure-const -fipa-reference\
-fira-algorithm=CB -fira-hoist-pressure -fira-share-save-slots -fira-share-spill-slots -fivopts\
-fjump-tables -flifetime-dse -flifetime-dse=2 -fmath-errno -fmove-loop-invariants -fomit-frame-pointer\
-fpeephole -fplt -fprefetch-loop-arrays -fprintf-return-value -frename-registers -freorder-blocks
-frtti -fsched-critical-path-heuristic -fsched-dep-count-heuristic -fsched-group-heuristic\
-fsched-interblock -fsched-last-insn-heuristic -fsched-rank-heuristic -fsched-spec -fsched-spec-insn-heuristic\
-fsched-stalled-insns-dep -fschedule-fusion -fshort-enums -fshrink-wrap -fshrink-wrap-separate\
-fsigned-zeros -fsplit-ivs-in-unroller -fsplit-wide-types -fssa-backprop -fssa-phiopt -fstack-reuse=all\
-fstdarg-opt -fstrict-volatile-bitfields -fno-threadsafe-statics -ftrapping-math -ftree-bit-ccp\
-ftree-builtin-call-dce -ftree-ccp -ftree-ch -ftree-coalesce-vars -ftree-copy-prop -ftree-cselim\ 
-ftree-dce -ftree-dominator-opts -ftree-dse -ftree-forwprop -ftree-fre -ftree-loop-if-convert -ftree-loop-im\
-ftree-loop-ivcanon -ftree-loop-optimize -ftree-parallelize-loops=1 -ftree-phiprop -ftree-pta\
-ftree-reassoc -ftree-scev-cprop -ftree-sink -ftree-slsr -ftree-sra -ftree-ter -fvar-tracking -fvar-tracking-assignments\
-fweb -fmerge-constants -fno-associative-math -fno-cx-limited-range -fno-exceptions -fno-finite-math-only\
-fno-reciprocal-math -fno-unsafe-math-optimizations -fexcess-precision=standard -qbsp=leon2 -DCPU_FREQ=CPU_FREQ_125MHz\
-fno-builtin-strtok -c -o timer.o timer.c
GCC with -O1
gcc -O1 -std=c99 -msoft-float -qbsp=leon2 -DCPU_FREQ=CPU_FREQ_125MHz -fno-builtin-strtok -c -o timer.o timer.c
If needed I could also provide the output from gcc to see what flags are active in either case. the only difference I found was, that -fexcess-precision is set to "default" with -O1. I tried both possiblities (fast and standard) but this does not make any difference.
Does anyone know what magic the -O option activates additionally which I overlooked?
According to GCC manual
Most optimizations are only enabled if an -O level is set on the command line.
Otherwise they are disabled, even if individual optimization flags are specified.
so specifying optimization flags alone will not be enough. For example here you can see that certain analysis is enabled only if both -O and -fweb are enabled:
class pass_web : public rtl_opt_pass
{
  ...
  virtual bool gate (function *) { return (optimize > 0 && flag_web); }
Even specifying -O1 and selectively enabling optimizations from higher optimization levels will not work reliably because some passes explicitly rely on -O value. E.g. here you can see that parts of CSE optimization are disabled at -O1:
else if (tem == 1 || optimize > 1)
  cse_cfg_altered |= cleanup_cfg (0);
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With