This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: Time PPro tunning patch
- To: Richard Henderson <rth at cygnus dot com>
- Subject: Re: Time PPro tunning patch
- From: Jan Hubicka <hubicka at atrey dot karlin dot mff dot cuni dot cz>
- Date: Thu, 9 Mar 2000 23:55:41 +0100
- Cc: egcs-patches at egcs dot cygnus dot com
- References: <20000228150954.C18800@atrey.karlin.mff.cuni.cz> <20000308141935.A13922@cygnus.com>
> On Mon, Feb 28, 2000 at 03:09:54PM +0100, Jan Hubicka wrote:
> > The i386.c use cost of multiply "1" on PPro. Multiply takes 4 cycles,
> > so I suggest to use cost 4.
>
> Where did you get 4 cycles? Uli and I measured 1 cycle and
> also have documentation to that effect for imul.
I read:
Integer multiplication takes 4 clocks, floating point multiplication 5, and MMX
multiplication 3
clocks. Integer and MMX multiplication is pipelined so that it can receive a ne
w instruction
every clock cycle. Floating point multiplication is partially pipelined: The ex
ecution unit can
receive a new FMUL instruction two clocks after the preceding one, so that th
e maximum
throughput is one FMUL per two clock cycles. The holes between the FMUL's can
not be
filled by integer multiplications because they use the same circuitry.
The function unit overview also mentions, that multiply unit is attached to the
execution ports and is pipelined with 4 stages I believe.
The troughput is 1, but latency 4. I believe that latency is important
for the costs.
Note that I am also getting better code by this replacement
Honza
>
> > Mon Feb 28 15:07:05 MET 2000 Jan Hubicka <jh@suse.cz>
> > * i386.md (movhi_1): Promote movw imm, reg to movl imm, reg and
> > movw reg, reg to movzwl reg, reg on PARTIAL_REGISTER_STALL machines.
> > * i386.c (pentiumpro_cost): Set mul cost to 4.
> > (x86_use_movx): Set for PPro.
>
> The rest of the patch is fine.
>
>
> r~