This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: Optimize nonoptimizing compilation IV
> Jan Hubicka wrote:-
>
> > Hi,
> > this patch avoids optimize_mode_swithcing when no instructions needing
> > it are present in the insn stream. It saves 7% out of insn-attrtab
> > build time at -O0. Together with the tidy_fallthru_eges we are already
> > faster than gcc-3.0 (and with this patch only we are faster than
> > gcc-3.2)
>
> Cool. Is CPP beginning to be a significant part of the time at -O0?
> We should be able to get it to at least 30% I expect.
In fact I am comparing on preprocessed file, so cpp times are not taken
into acount. Moment... In my tree figures are like this.
cfg construction : 0.26 ( 7%) usr 0.09 (33%) sys 0.35 ( 9%) wall
cfg cleanup : 0.19 ( 5%) usr 0.00 ( 0%) sys 0.19 ( 5%) wall
trivially dead code : 0.03 ( 1%) usr 0.00 ( 0%) sys 0.03 ( 1%) wall
life analysis : 0.20 ( 5%) usr 0.00 ( 0%) sys 0.20 ( 5%) wall
life info update : 0.08 ( 2%) usr 0.00 ( 0%) sys 0.08 ( 2%) wall
register scan : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
rebuild jump labels : 0.07 ( 2%) usr 0.00 ( 0%) sys 0.07 ( 2%) wall
preprocessing : 0.19 ( 5%) usr 0.02 ( 7%) sys 0.21 ( 5%) wall
lexical analysis : 0.17 ( 5%) usr 0.06 (22%) sys 0.23 ( 6%) wall
parser : 0.26 ( 7%) usr 0.02 ( 7%) sys 0.28 ( 7%) wall
expand : 0.23 ( 6%) usr 0.02 ( 7%) sys 0.25 ( 6%) wall
varconst : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
integration : 0.03 ( 1%) usr 0.00 ( 0%) sys 0.03 ( 1%) wall
jump : 0.03 ( 1%) usr 0.00 ( 0%) sys 0.03 ( 1%) wall
flow analysis : 0.04 ( 1%) usr 0.00 ( 0%) sys 0.04 ( 1%) wall
mode switching : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
local alloc : 0.54 (15%) usr 0.00 ( 0%) sys 0.54 (14%) wall
global alloc : 0.60 (16%) usr 0.02 ( 7%) sys 0.63 (16%) wall
flow 2 : 0.08 ( 2%) usr 0.02 ( 7%) sys 0.10 ( 3%) wall
shorten branches : 0.10 ( 3%) usr 0.00 ( 0%) sys 0.10 ( 3%) wall
reg stack : 0.05 ( 1%) usr 0.00 ( 0%) sys 0.05 ( 1%) wall
final : 0.22 ( 6%) usr 0.02 ( 7%) sys 0.24 ( 6%) wall
rest of compilation : 0.25 ( 7%) usr 0.00 ( 0%) sys 0.26 ( 7%) wall
TOTAL : 3.66 0.27 3.95
We are not quite here - you made CPP way too fast :) 2.95 does it in
2.1s, so there is still way to go, but originally we needed 5:38 seconds
>From the dumps, cfg cleanup and construction is about the same time
needed, register allocator takes only 0.48 sec on 2.95 as it uses stupid
without liveness. Shorten branches is not executed in 2.95 and it is
useless for 3.0 too as it is used only for loop instruction emit only
when optimizing for K6. Perhaps we should add a hook to disable it
otherwise.
3.2 timmings are:
garbage collection : 0.29 ( 6%) usr 0.00 ( 0%) sys 0.30 ( 6%) wall
cfg construction : 0.17 ( 4%) usr 0.00 ( 0%) sys 0.16 ( 3%) wall
cfg cleanup : 0.74 (16%) usr 0.00 ( 0%) sys 0.75 (15%) wall
life analysis : 0.27 ( 6%) usr 0.00 ( 0%) sys 0.27 ( 5%) wall
life info update : 0.14 ( 3%) usr 0.00 ( 0%) sys 0.14 ( 3%) wall
preprocessing : 0.16 ( 3%) usr 0.03 (18%) sys 0.17 ( 4%) wall
lexical analysis : 0.19 ( 4%) usr 0.04 (24%) sys 0.27 ( 5%) wall
parser : 0.28 ( 6%) usr 0.07 (41%) sys 0.30 ( 6%) wall
expand : 0.20 ( 4%) usr 0.01 ( 6%) sys 0.22 ( 4%) wall
varconst : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall
integration : 0.08 ( 2%) usr 0.00 ( 0%) sys 0.11 ( 2%) wall
jump : 0.10 ( 2%) usr 0.00 ( 0%) sys 0.09 ( 2%) wall
flow analysis : 0.07 ( 1%) usr 0.00 ( 0%) sys 0.06 ( 1%) wall
mode switching : 0.28 ( 6%) usr 0.00 ( 0%) sys 0.27 ( 6%) wall
local alloc : 0.30 ( 6%) usr 0.00 ( 0%) sys 0.30 ( 6%) wall
global alloc : 0.76 (16%) usr 0.00 ( 0%) sys 0.77 (16%) wall
flow 2 : 0.03 ( 1%) usr 0.00 ( 0%) sys 0.04 ( 1%) wall
shorten branches : 0.11 ( 2%) usr 0.00 ( 0%) sys 0.10 ( 2%) wall
reg stack : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall
final : 0.17 ( 4%) usr 0.02 (12%) sys 0.17 ( 4%) wall
rest of compilation : 0.34 ( 7%) usr 0.00 ( 0%) sys 0.34 ( 7%) wall
TOTAL : 4.71 0.17 4.88
Rest is recognized as "parser" for 2.95 (about 0.61 sec)
For more sane testcase (combine.c):
cfg construction : 0.01 ( 1%) usr 0.00 ( 0%) sys 0.01 ( 1%) wall
cfg cleanup : 0.01 ( 1%) usr 0.00 ( 0%) sys 0.01 ( 1%) wall
trivially dead code : 0.02 ( 1%) usr 0.00 ( 0%) sys 0.02 ( 1%) wall
life analysis : 0.11 ( 7%) usr 0.00 ( 0%) sys 0.11 ( 6%) wall
life info update : 0.01 ( 1%) usr 0.00 ( 0%) sys 0.01 ( 1%) wall
register scan : 0.01 ( 1%) usr 0.00 ( 0%) sys 0.01 ( 1%) wall
rebuild jump labels : 0.01 ( 1%) usr 0.00 ( 0%) sys 0.01 ( 1%) wall
preprocessing : 0.14 ( 8%) usr 0.04 (40%) sys 0.18 (10%) wall
lexical analysis : 0.09 ( 5%) usr 0.00 ( 0%) sys 0.09 ( 5%) wall
parser : 0.11 ( 7%) usr 0.02 (20%) sys 0.13 ( 7%) wall
expand : 0.12 ( 7%) usr 0.02 (20%) sys 0.14 ( 8%) wall
integration : 0.01 ( 1%) usr 0.00 ( 0%) sys 0.01 ( 1%) wall
local alloc : 0.37 (22%) usr 0.00 ( 0%) sys 0.37 (21%) wall
global alloc : 0.30 (18%) usr 0.01 (10%) sys 0.32 (18%) wall
shorten branches : 0.08 ( 5%) usr 0.00 ( 0%) sys 0.08 ( 4%) wall
reg stack : 0.01 ( 1%) usr 0.00 ( 0%) sys 0.01 ( 1%) wall
final : 0.11 ( 7%) usr 0.00 ( 0%) sys 0.11 ( 6%) wall
symout : 0.00 ( 0%) usr 0.01 (10%) sys 0.01 ( 1%) wall
rest of compilation : 0.15 ( 9%) usr 0.00 ( 0%) sys 0.15 ( 8%) wall
TOTAL : 1.67 0.10 1.78
And 2.95 requires 1.13
We are already faster than 3.2:
garbage collection : 0.13 ( 7%) usr 0.00 ( 0%) sys 0.12 ( 7%) wall
cfg construction : 0.01 ( 1%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
life analysis : 0.07 ( 4%) usr 0.00 ( 0%) sys 0.06 ( 3%) wall
life info update : 0.04 ( 2%) usr 0.00 ( 0%) sys 0.04 ( 2%) wall
preprocessing : 0.14 ( 8%) usr 0.01 (12%) sys 0.16 ( 8%) wall
lexical analysis : 0.06 ( 3%) usr 0.01 (12%) sys 0.10 ( 5%) wall
parser : 0.17 ( 9%) usr 0.02 (25%) sys 0.19 (10%) wall
expand : 0.16 ( 9%) usr 0.00 ( 0%) sys 0.16 ( 9%) wall
integration : 0.02 ( 1%) usr 0.00 ( 0%) sys 0.02 ( 1%) wall
jump : 0.03 ( 2%) usr 0.00 ( 0%) sys 0.02 ( 1%) wall
flow analysis : 0.08 ( 4%) usr 0.00 ( 0%) sys 0.06 ( 3%) wall
mode switching : 0.03 ( 2%) usr 0.00 ( 0%) sys 0.03 ( 2%) wall
local alloc : 0.20 (11%) usr 0.00 ( 0%) sys 0.20 (11%) wall
global alloc : 0.28 (15%) usr 0.03 (38%) sys 0.30 (16%) wall
shorten branches : 0.05 ( 3%) usr 0.00 ( 0%) sys 0.06 ( 3%) wall
reg stack : 0.01 ( 1%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
final : 0.15 ( 8%) usr 0.01 (13%) sys 0.16 ( 8%) wall
rest of compilation : 0.20 (11%) usr 0.00 ( 0%) sys 0.20 (11%) wall
TOTAL : 1.84 0.08 1.92
>
> Neil.