This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Optimize nonoptimizing compilation IV


> Jan Hubicka wrote:-
> 
> > Hi,
> > this patch avoids optimize_mode_swithcing when no instructions needing
> > it are present in the insn stream.  It saves 7% out of insn-attrtab
> > build time at -O0.  Together with the tidy_fallthru_eges we are already
> > faster than gcc-3.0 (and with this patch only we are faster than
> > gcc-3.2)
> 
> Cool.  Is CPP beginning to be a significant part of the time at -O0?
> We should be able to get it to at least 30% I expect.
In fact I am comparing on preprocessed file, so cpp times are not taken
into acount.  Moment... In my tree figures are like this.

 cfg construction      :   0.26 ( 7%) usr   0.09 (33%) sys   0.35 ( 9%) wall
 cfg cleanup           :   0.19 ( 5%) usr   0.00 ( 0%) sys   0.19 ( 5%) wall
 trivially dead code   :   0.03 ( 1%) usr   0.00 ( 0%) sys   0.03 ( 1%) wall
 life analysis         :   0.20 ( 5%) usr   0.00 ( 0%) sys   0.20 ( 5%) wall
 life info update      :   0.08 ( 2%) usr   0.00 ( 0%) sys   0.08 ( 2%) wall
 register scan         :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall
 rebuild jump labels   :   0.07 ( 2%) usr   0.00 ( 0%) sys   0.07 ( 2%) wall
 preprocessing         :   0.19 ( 5%) usr   0.02 ( 7%) sys   0.21 ( 5%) wall
 lexical analysis      :   0.17 ( 5%) usr   0.06 (22%) sys   0.23 ( 6%) wall
 parser                :   0.26 ( 7%) usr   0.02 ( 7%) sys   0.28 ( 7%) wall
 expand                :   0.23 ( 6%) usr   0.02 ( 7%) sys   0.25 ( 6%) wall
 varconst              :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall
 integration           :   0.03 ( 1%) usr   0.00 ( 0%) sys   0.03 ( 1%) wall
 jump                  :   0.03 ( 1%) usr   0.00 ( 0%) sys   0.03 ( 1%) wall
 flow analysis         :   0.04 ( 1%) usr   0.00 ( 0%) sys   0.04 ( 1%) wall
 mode switching        :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall
 local alloc           :   0.54 (15%) usr   0.00 ( 0%) sys   0.54 (14%) wall
 global alloc          :   0.60 (16%) usr   0.02 ( 7%) sys   0.63 (16%) wall
 flow 2                :   0.08 ( 2%) usr   0.02 ( 7%) sys   0.10 ( 3%) wall
 shorten branches      :   0.10 ( 3%) usr   0.00 ( 0%) sys   0.10 ( 3%) wall
 reg stack             :   0.05 ( 1%) usr   0.00 ( 0%) sys   0.05 ( 1%) wall
 final                 :   0.22 ( 6%) usr   0.02 ( 7%) sys   0.24 ( 6%) wall
 rest of compilation   :   0.25 ( 7%) usr   0.00 ( 0%) sys   0.26 ( 7%) wall
 TOTAL                 :   3.66             0.27             3.95

We are not quite here - you made CPP way too fast :) 2.95 does it in
2.1s, so there is still way to go, but originally we needed 5:38 seconds
>From the dumps, cfg cleanup and construction is about the same time
needed, register allocator takes only 0.48 sec on 2.95 as it uses stupid
without liveness.  Shorten branches is not executed in 2.95 and it is
useless for 3.0 too as it is used only for loop instruction emit only
when optimizing for K6.  Perhaps we should add a hook to  disable it
otherwise.

3.2 timmings are:

 garbage collection    :   0.29 ( 6%) usr   0.00 ( 0%) sys   0.30 ( 6%) wall
 cfg construction      :   0.17 ( 4%) usr   0.00 ( 0%) sys   0.16 ( 3%) wall
 cfg cleanup           :   0.74 (16%) usr   0.00 ( 0%) sys   0.75 (15%) wall
 life analysis         :   0.27 ( 6%) usr   0.00 ( 0%) sys   0.27 ( 5%) wall
 life info update      :   0.14 ( 3%) usr   0.00 ( 0%) sys   0.14 ( 3%) wall
 preprocessing         :   0.16 ( 3%) usr   0.03 (18%) sys   0.17 ( 4%) wall
 lexical analysis      :   0.19 ( 4%) usr   0.04 (24%) sys   0.27 ( 5%) wall
 parser                :   0.28 ( 6%) usr   0.07 (41%) sys   0.30 ( 6%) wall
 expand                :   0.20 ( 4%) usr   0.01 ( 6%) sys   0.22 ( 4%) wall
 varconst              :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall
 integration           :   0.08 ( 2%) usr   0.00 ( 0%) sys   0.11 ( 2%) wall
 jump                  :   0.10 ( 2%) usr   0.00 ( 0%) sys   0.09 ( 2%) wall
 flow analysis         :   0.07 ( 1%) usr   0.00 ( 0%) sys   0.06 ( 1%) wall
 mode switching        :   0.28 ( 6%) usr   0.00 ( 0%) sys   0.27 ( 6%) wall
 local alloc           :   0.30 ( 6%) usr   0.00 ( 0%) sys   0.30 ( 6%) wall
 global alloc          :   0.76 (16%) usr   0.00 ( 0%) sys   0.77 (16%) wall
 flow 2                :   0.03 ( 1%) usr   0.00 ( 0%) sys   0.04 ( 1%) wall
 shorten branches      :   0.11 ( 2%) usr   0.00 ( 0%) sys   0.10 ( 2%) wall
 reg stack             :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall
 final                 :   0.17 ( 4%) usr   0.02 (12%) sys   0.17 ( 4%) wall
 rest of compilation   :   0.34 ( 7%) usr   0.00 ( 0%) sys   0.34 ( 7%) wall
 TOTAL                 :   4.71             0.17             4.88

Rest is recognized as "parser" for 2.95 (about 0.61 sec)
For more sane testcase (combine.c):

 cfg construction      :   0.01 ( 1%) usr   0.00 ( 0%) sys   0.01 ( 1%) wall
 cfg cleanup           :   0.01 ( 1%) usr   0.00 ( 0%) sys   0.01 ( 1%) wall
 trivially dead code   :   0.02 ( 1%) usr   0.00 ( 0%) sys   0.02 ( 1%) wall
 life analysis         :   0.11 ( 7%) usr   0.00 ( 0%) sys   0.11 ( 6%) wall
 life info update      :   0.01 ( 1%) usr   0.00 ( 0%) sys   0.01 ( 1%) wall
 register scan         :   0.01 ( 1%) usr   0.00 ( 0%) sys   0.01 ( 1%) wall
 rebuild jump labels   :   0.01 ( 1%) usr   0.00 ( 0%) sys   0.01 ( 1%) wall
 preprocessing         :   0.14 ( 8%) usr   0.04 (40%) sys   0.18 (10%) wall
 lexical analysis      :   0.09 ( 5%) usr   0.00 ( 0%) sys   0.09 ( 5%) wall
 parser                :   0.11 ( 7%) usr   0.02 (20%) sys   0.13 ( 7%) wall
 expand                :   0.12 ( 7%) usr   0.02 (20%) sys   0.14 ( 8%) wall
 integration           :   0.01 ( 1%) usr   0.00 ( 0%) sys   0.01 ( 1%) wall
 local alloc           :   0.37 (22%) usr   0.00 ( 0%) sys   0.37 (21%) wall
 global alloc          :   0.30 (18%) usr   0.01 (10%) sys   0.32 (18%) wall
 shorten branches      :   0.08 ( 5%) usr   0.00 ( 0%) sys   0.08 ( 4%) wall
 reg stack             :   0.01 ( 1%) usr   0.00 ( 0%) sys   0.01 ( 1%) wall
 final                 :   0.11 ( 7%) usr   0.00 ( 0%) sys   0.11 ( 6%) wall
 symout                :   0.00 ( 0%) usr   0.01 (10%) sys   0.01 ( 1%) wall
 rest of compilation   :   0.15 ( 9%) usr   0.00 ( 0%) sys   0.15 ( 8%) wall
 TOTAL                 :   1.67             0.10             1.78

And 2.95 requires 1.13
We are already faster than 3.2:

 garbage collection    :   0.13 ( 7%) usr   0.00 ( 0%) sys   0.12 ( 7%) wall
 cfg construction      :   0.01 ( 1%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall
 life analysis         :   0.07 ( 4%) usr   0.00 ( 0%) sys   0.06 ( 3%) wall
 life info update      :   0.04 ( 2%) usr   0.00 ( 0%) sys   0.04 ( 2%) wall
 preprocessing         :   0.14 ( 8%) usr   0.01 (12%) sys   0.16 ( 8%) wall
 lexical analysis      :   0.06 ( 3%) usr   0.01 (12%) sys   0.10 ( 5%) wall
 parser                :   0.17 ( 9%) usr   0.02 (25%) sys   0.19 (10%) wall
 expand                :   0.16 ( 9%) usr   0.00 ( 0%) sys   0.16 ( 9%) wall
 integration           :   0.02 ( 1%) usr   0.00 ( 0%) sys   0.02 ( 1%) wall
 jump                  :   0.03 ( 2%) usr   0.00 ( 0%) sys   0.02 ( 1%) wall
 flow analysis         :   0.08 ( 4%) usr   0.00 ( 0%) sys   0.06 ( 3%) wall
 mode switching        :   0.03 ( 2%) usr   0.00 ( 0%) sys   0.03 ( 2%) wall
 local alloc           :   0.20 (11%) usr   0.00 ( 0%) sys   0.20 (11%) wall
 global alloc          :   0.28 (15%) usr   0.03 (38%) sys   0.30 (16%) wall
 shorten branches      :   0.05 ( 3%) usr   0.00 ( 0%) sys   0.06 ( 3%) wall
 reg stack             :   0.01 ( 1%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall
 final                 :   0.15 ( 8%) usr   0.01 (13%) sys   0.16 ( 8%) wall
 rest of compilation   :   0.20 (11%) usr   0.00 ( 0%) sys   0.20 (11%) wall
 TOTAL                 :   1.84             0.08             1.92
> 
> Neil.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]