This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

[PATCH] Sparc tail calls (take 4)


Hi!

Attached patch finally bootstraps on sparc-redhat-linux and testsuite was ok
too.
As per Richard's note, this patch does remove unused labels even in the
minimal jump but checks CALL_PLACEHOLDER in mark_jump_label.

A few questions though:

First:

in calls.c I had to add following code:

+      if (args_size.constant > current_function_args_size)
+       {
+         /* If this function requires more stack slots than the current
+            function, we cannot change it into a sibling call.  */
+         sibcall_failure = 1;
+       }
+

because if caller's parent allocated say 24 bytes on the stack for arguments
then we really cannot fill 32 bytes of arguments.
Example of routine which is miscompiled otherwise on SPARC is:
int foo(int, int, int, int, int, int, int, int);
int bar(int x, int y) { return foo(x, 0, 0, 0, y, 0, 0, 0); }
It works well on SPARC but I wonder if the test is correct on all platforms,
if I'm not comparing apples with oranges.

Second question:

in mark_jump_label I just scan the normal call chain, is it necessary to
scan the other chains as well? I mean if the normal chain does not reference
some label outside of that chain than neither should do the sibcall chain.

Third question:

calls.c generic code requires fndecl to be non-NULL. Is fndecl NULL only for
indirect calls? If yes, then IMHO that test should go into the arch specific
FUNCTION_OK_FOR_SIBCALL macro, because at least on SPARC (not implemented in
this patch) indirect sibcalls might be very useful (provided we have
prototype with called function arguments, allow it only if there are no regs
call-saved and make sure the pattern does not allow the function becoming a
leaf (it would be too much hassle).

2000-03-24  Jakub Jelinek  <jakub@redhat.com>

	* sibcall.c (skip_copy_to_return_value): Use OUTGOING_REGNO for
	comparison if regno's are equal.
	* jump.c (mark_jump_label): Dive into CALL_PLACEHOLDERs.
	* calls.c (initialize_argument_informat): Add ecf_flags argument.
	Use FUNCTION_INCOMING_ARG if available and ECF_SIBCALL.
	(expand_call): Update caller.
	Avoid making a sibling call if argument size of the callee is larger
	than argument size of the caller.
	Call hard_function_value with outgoing set if in sibcall pass.
	Use FUNCTION_INCOMING_ARG if available and ECF_SIBCALL.

	* final.c (permitted_reg_in_leaf_functions, only_leaf_regs_used):
	Change LEAF_REGISTERS from an array initializer to actual array
	identifier. Move static global variable into the function.
	(leaf_function_p): Allow SIBLING_CALL_P calls even outside of
	sequences for leaf functions.
	* global.c (global_alloc): Likewise.
	* tm.texi (LEAF_REGISTERS): Update documentation.

	* config/sparc/sparc.h (CONDITIONAL_REGISTER_USAGE): Remove the ugly
	TARGET_FLAT leaf disabling hack.
	(LEAF_REGISTERS): Changed from an array initializer to actual array
	identifier to avoid duplication and remove the above hack.
	(FUNCTION_OK_FOR_SIBCALL): Define.
	* config/sparc/sparc.md (sibcall): New attr type. Use it almost
	always like call attribute.
	(eligible_for_sibcall_delay): New attribute.
	(sibcall): New delay type.
	(sibcall, sibcall_value, sibcall_epilogue): New expands.
	(sibcall_symbolic_sp32, sibcall_symbolic_sp64,
	sibcall_value_symbolic_sp32, sibcall_value_symbolic_sp64): New insns.
	* config/sparc/sparc.c (sparc_leaf_regs): New array.
	(eligible_for_sibcall_delay, output_restore_regs, output_sibcall):
	New functions.
	(output_function_epilogue): Move part of the code into
	output_restore_regs.
	(ultra_code_from_mask, ultrasparc_sched_reorder): Handle
	TYPE_SIBCALL.
	* sparc-protos.h (output_sibcall, eligible_for_sibcall_delay): New
	prototypes.

--- gcc/config/sparc/sparc.h.jj	Fri Mar 24 09:13:57 2000
+++ gcc/config/sparc/sparc.h	Fri Mar 24 09:17:10 2000
@@ -1066,8 +1066,8 @@ do								\
 	   %fp, but output it as %i7.  */			\
 	fixed_regs[31] = 1;					\
 	reg_names[FRAME_POINTER_REGNUM] = "%i7";		\
-	/* ??? This is a hack to disable leaf functions.  */	\
-	global_regs[7] = 1;					\
+	/* Disable leaf functions */				\
+	bzero (sparc_leaf_regs, FIRST_PSEUDO_REGISTER);		\
       }								\
     if (profile_block_flag)					\
       {								\
@@ -1373,26 +1373,8 @@ extern enum reg_class sparc_regno_reg_cl
   
 #define ORDER_REGS_FOR_LOCAL_ALLOC order_regs_for_local_alloc ()
 
-/* ??? %g7 is not a leaf register to effectively #undef LEAF_REGISTERS when
-   -mflat is used.  Function only_leaf_regs_used will return 0 if a global
-   register is used and is not permitted in a leaf function.  We make %g7
-   a global reg if -mflat and voila.  Since %g7 is a system register and is
-   fixed it won't be used by gcc anyway.  */
-
-#define LEAF_REGISTERS \
-{ 1, 1, 1, 1, 1, 1, 1, 0,	\
-  0, 0, 0, 0, 0, 0, 1, 0,	\
-  0, 0, 0, 0, 0, 0, 0, 0,	\
-  1, 1, 1, 1, 1, 1, 0, 1,	\
-  1, 1, 1, 1, 1, 1, 1, 1,	\
-  1, 1, 1, 1, 1, 1, 1, 1,	\
-  1, 1, 1, 1, 1, 1, 1, 1,	\
-  1, 1, 1, 1, 1, 1, 1, 1,	\
-  1, 1, 1, 1, 1, 1, 1, 1,	\
-  1, 1, 1, 1, 1, 1, 1, 1,	\
-  1, 1, 1, 1, 1, 1, 1, 1,	\
-  1, 1, 1, 1, 1, 1, 1, 1,	\
-  1, 1, 1, 1, 1}
+extern char sparc_leaf_regs[];
+#define LEAF_REGISTERS sparc_leaf_regs
 
 extern char leaf_reg_remap[];
 #define LEAF_REG_REMAP(REGNO) (leaf_reg_remap[REGNO])
@@ -2144,6 +2126,10 @@ LFLGRET"ID":\n\
    For the v9 we want NAMED to mean what it says it means.  */
 
 #define STRICT_ARGUMENT_NAMING TARGET_V9
+
+/* We do not allow sibling calls if -mflat, nor
+   we do not allow indirect calls to be optimized into sibling calls.  */
+#define FUNCTION_OK_FOR_SIBCALL(DECL) (DECL && ! TARGET_FLAT)
 
 /* Generate RTL to flush the register windows so as to make arbitrary frames
    available.  */
--- gcc/config/sparc/sparc.md.jj	Mon Mar 13 18:05:46 2000
+++ gcc/config/sparc/sparc.md	Fri Mar 24 09:17:10 2000
@@ -88,7 +88,7 @@
 ;; type "call_no_delay_slot" is a call followed by an unimp instruction.
 
 (define_attr "type"
-  "move,unary,binary,compare,load,sload,store,ialu,shift,uncond_branch,branch,call,call_no_delay_slot,return,address,imul,fpload,fpstore,fp,fpmove,fpcmove,fpcmp,fpmul,fpdivs,fpdivd,fpsqrts,fpsqrtd,cmove,multi,misc"
+  "move,unary,binary,compare,load,sload,store,ialu,shift,uncond_branch,branch,call,sibcall,call_no_delay_slot,return,address,imul,fpload,fpstore,fp,fpmove,fpcmove,fpcmp,fpmul,fpdivs,fpdivd,fpsqrts,fpsqrtd,cmove,multi,misc"
   (const_string "binary"))
 
 ;; Set true if insn uses call-clobbered intermediate register.
@@ -131,7 +131,7 @@
 ;; Attributes for instruction and branch scheduling
 
 (define_attr "in_call_delay" "false,true"
-  (cond [(eq_attr "type" "uncond_branch,branch,call,call_no_delay_slot,return,multi")
+  (cond [(eq_attr "type" "uncond_branch,branch,call,sibcall,call_no_delay_slot,return,multi")
 	 	(const_string "false")
 	 (eq_attr "type" "load,fpload,store,fpstore")
 	 	(if_then_else (eq_attr "length" "1")
@@ -148,6 +148,12 @@
 (define_delay (eq_attr "type" "call")
   [(eq_attr "in_call_delay" "true") (nil) (nil)])
 
+(define_attr "eligible_for_sibcall_delay" "false,true"
+  (symbol_ref "eligible_for_sibcall_delay(insn)"))
+
+(define_delay (eq_attr "type" "sibcall")
+  [(eq_attr "eligible_for_sibcall_delay" "true") (nil) (nil)])
+
 (define_attr "leaf_function" "false,true"
   (const (symbol_ref "current_function_uses_only_leaf_regs")))
 
@@ -179,19 +185,19 @@
 ;; because it prevents us from moving back the final store of inner loops.
 
 (define_attr "in_branch_delay" "false,true"
-  (if_then_else (and (eq_attr "type" "!uncond_branch,branch,call,call_no_delay_slot,multi")
+  (if_then_else (and (eq_attr "type" "!uncond_branch,branch,call,sibcall,call_no_delay_slot,multi")
 		     (eq_attr "length" "1"))
 		(const_string "true")
 		(const_string "false")))
 
 (define_attr "in_uncond_branch_delay" "false,true"
-  (if_then_else (and (eq_attr "type" "!uncond_branch,branch,call,call_no_delay_slot,multi")
+  (if_then_else (and (eq_attr "type" "!uncond_branch,branch,call,sibcall,call_no_delay_slot,multi")
 		     (eq_attr "length" "1"))
 		(const_string "true")
 		(const_string "false")))
 
 (define_attr "in_annul_branch_delay" "false,true"
-  (if_then_else (and (eq_attr "type" "!uncond_branch,branch,call,call_no_delay_slot,multi")
+  (if_then_else (and (eq_attr "type" "!uncond_branch,branch,call,sibcall,call_no_delay_slot,multi")
 		     (eq_attr "length" "1"))
 		(const_string "true")
 		(const_string "false")))
@@ -453,7 +459,7 @@
 
 (define_function_unit "ieuN" 2 0
   (and (eq_attr "cpu" "ultrasparc")
-    (eq_attr "type" "ialu,binary,move,unary,shift,compare,call,call_no_delay_slot,uncond_branch"))
+    (eq_attr "type" "ialu,binary,move,unary,shift,compare,call,sibcall,call_no_delay_slot,uncond_branch"))
   1 1)
 
 (define_function_unit "ieu0" 1 0
@@ -468,7 +474,7 @@
 
 (define_function_unit "ieu1" 1 0
   (and (eq_attr "cpu" "ultrasparc")
-    (eq_attr "type" "compare,call,call_no_delay_slot,uncond_branch"))
+    (eq_attr "type" "compare,call,sibcall,call_no_delay_slot,uncond_branch"))
   1 1)
 
 (define_function_unit "cti" 1 0
@@ -8569,6 +8575,59 @@
 
   DONE;
 }")
+
+;;- tail calls
+(define_expand "sibcall"
+  [(parallel [(call (match_operand 0 "call_operand" "") (const_int 0))
+	      (return)])]
+  ""
+  "")
+
+(define_insn "*sibcall_symbolic_sp32"
+  [(call (mem:SI (match_operand:SI 0 "symbolic_operand" "s"))
+	 (match_operand 1 "" ""))
+   (return)]
+  "! TARGET_PTR64"
+  "* return output_sibcall(insn, operands[0]);"
+  [(set_attr "type" "sibcall")])
+
+(define_insn "*sibcall_symbolic_sp64"
+  [(call (mem:SI (match_operand:DI 0 "symbolic_operand" "s"))
+	 (match_operand 1 "" ""))
+   (return)]
+  "TARGET_PTR64"
+  "* return output_sibcall(insn, operands[0]);"
+  [(set_attr "type" "sibcall")])
+
+(define_expand "sibcall_value"
+  [(parallel [(set (match_operand 0 "register_operand" "=rf")
+		(call (match_operand:SI 1 "" "") (const_int 0)))
+	      (return)])]
+  ""
+  "")
+
+(define_insn "*sibcall_value_symbolic_sp32"
+  [(set (match_operand 0 "" "=rf")
+	(call (mem:SI (match_operand:SI 1 "symbolic_operand" "s"))
+	      (match_operand 2 "" "")))
+   (return)]
+  "! TARGET_PTR64"
+  "* return output_sibcall(insn, operands[1]);"
+  [(set_attr "type" "sibcall")])
+
+(define_insn "*sibcall_value_symbolic_sp64"
+  [(set (match_operand 0 "" "")
+	(call (mem:SI (match_operand:DI 1 "symbolic_operand" "s"))
+	      (match_operand 2 "" "")))
+   (return)]
+  "TARGET_PTR64"
+  "* return output_sibcall(insn, operands[1]);"
+  [(set_attr "type" "sibcall")])
+
+(define_expand "sibcall_epilogue"
+  [(const_int 0)]
+  ""
+  "DONE;")
 
 ;; UNSPEC_VOLATILE is considered to use and clobber all hard registers and
 ;; all of memory.  This blocks insns from being moved across this point.
--- gcc/config/sparc/sparc-protos.h.jj	Thu Feb 17 16:31:05 2000
+++ gcc/config/sparc/sparc-protos.h	Fri Mar 24 09:17:10 2000
@@ -96,6 +96,7 @@ extern int sparc_splitdi_legitimate PARA
 extern int sparc_absnegfloat_split_legitimate PARAMS ((rtx, rtx));
 extern char *output_cbranch PARAMS ((rtx, int, int, int, int, rtx));
 extern const char *output_return PARAMS ((rtx *));
+extern const char *output_sibcall PARAMS ((rtx, rtx));
 extern char *output_v9branch PARAMS ((rtx, int, int, int, int, int, rtx));
 extern void emit_v9_brxx_insn PARAMS ((enum rtx_code, rtx, rtx));
 extern void output_double_int PARAMS ((FILE *, rtx));
@@ -121,6 +122,7 @@ extern int cc_arithopn PARAMS ((rtx, enu
 extern int data_segment_operand PARAMS ((rtx, enum machine_mode));
 extern int eligible_for_epilogue_delay PARAMS ((rtx, int));
 extern int eligible_for_return_delay PARAMS ((rtx));
+extern int eligible_for_sibcall_delay PARAMS ((rtx));
 extern int emit_move_sequence PARAMS ((rtx, enum machine_mode));
 extern int extend_op PARAMS ((rtx, enum machine_mode));
 extern int fcc_reg_operand PARAMS ((rtx, enum machine_mode));
--- gcc/config/sparc/sparc.c.jj	Wed Mar 22 08:49:19 2000
+++ gcc/config/sparc/sparc.c	Fri Mar 24 09:17:11 2000
@@ -99,6 +99,24 @@ char leaf_reg_remap[] =
   88, 89, 90, 91, 92, 93, 94, 95,
   96, 97, 98, 99, 100};
 
+/* Vector, indexed by hard register number, which contains 1
+   for a register that is allowable in a candidate for leaf
+   function treatment.  */
+char sparc_leaf_regs[] =
+{ 1, 1, 1, 1, 1, 1, 1, 1,
+  0, 0, 0, 0, 0, 0, 1, 0,
+  0, 0, 0, 0, 0, 0, 0, 0,
+  1, 1, 1, 1, 1, 1, 0, 1,
+  1, 1, 1, 1, 1, 1, 1, 1,
+  1, 1, 1, 1, 1, 1, 1, 1,
+  1, 1, 1, 1, 1, 1, 1, 1,
+  1, 1, 1, 1, 1, 1, 1, 1,
+  1, 1, 1, 1, 1, 1, 1, 1,
+  1, 1, 1, 1, 1, 1, 1, 1,
+  1, 1, 1, 1, 1, 1, 1, 1,
+  1, 1, 1, 1, 1, 1, 1, 1,
+  1, 1, 1, 1, 1};
+
 #endif
 
 /* Name of where we pretend to think the frame pointer points.
@@ -2458,6 +2476,98 @@ eligible_for_epilogue_delay (trial, slot
   return 0;
 }
 
+/* Return nonzero if TRIAL can go into the sibling call
+   delay slot.  */
+
+int
+eligible_for_sibcall_delay (trial)
+     rtx trial;
+{
+  rtx pat, src;
+
+  if (GET_CODE (trial) != INSN || GET_CODE (PATTERN (trial)) != SET)
+    return 0;
+
+  if (get_attr_length (trial) != 1 || profile_block_flag == 2)
+    return 0;
+
+  pat = PATTERN (trial);
+
+  if (current_function_uses_only_leaf_regs)
+    {
+      /* If the tail call is done using the call instruction,
+	 we have to restore %o7 in the delay slot.  */
+      if (TARGET_ARCH64 && ! TARGET_CM_MEDLOW)
+	return 0;
+
+      /* %g1 is used to build the function address */
+      if (reg_mentioned_p (gen_rtx_REG (Pmode, 1), pat))
+	return 0;
+
+      return 1;
+    }
+
+  /* Otherwise, only operations which can be done in tandem with
+     a `restore' insn can go into the delay slot.  */
+  if (GET_CODE (SET_DEST (pat)) != REG
+      || REGNO (SET_DEST (pat)) < 24
+      || REGNO (SET_DEST (pat)) >= 32)
+    return 0;
+
+  /* If it mentions %o7, it can't go in, because sibcall will clobber it
+     in most cases.  */
+  if (reg_mentioned_p (gen_rtx_REG (Pmode, 15), pat))
+    return 0;
+
+  src = SET_SRC (pat);
+
+  if (arith_operand (src, GET_MODE (src)))
+    {
+      if (TARGET_ARCH64)
+        return GET_MODE_SIZE (GET_MODE (src)) <= GET_MODE_SIZE (DImode);
+      else
+        return GET_MODE_SIZE (GET_MODE (src)) <= GET_MODE_SIZE (SImode);
+    }
+
+  else if (arith_double_operand (src, GET_MODE (src)))
+    return GET_MODE_SIZE (GET_MODE (src)) <= GET_MODE_SIZE (DImode);
+
+  else if (! TARGET_FPU && restore_operand (SET_DEST (pat), SFmode)
+	   && register_operand (src, SFmode))
+    return 1;
+
+  else if (GET_CODE (src) == PLUS
+	   && arith_operand (XEXP (src, 0), SImode)
+	   && arith_operand (XEXP (src, 1), SImode)
+	   && (register_operand (XEXP (src, 0), SImode)
+	       || register_operand (XEXP (src, 1), SImode)))
+    return 1;
+
+  else if (GET_CODE (src) == PLUS
+	   && arith_double_operand (XEXP (src, 0), DImode)
+	   && arith_double_operand (XEXP (src, 1), DImode)
+	   && (register_operand (XEXP (src, 0), DImode)
+	       || register_operand (XEXP (src, 1), DImode)))
+    return 1;
+
+  else if (GET_CODE (src) == LO_SUM
+	   && ! TARGET_CM_MEDMID
+	   && ((register_operand (XEXP (src, 0), SImode)
+	        && immediate_operand (XEXP (src, 1), SImode))
+	       || (TARGET_ARCH64
+		   && register_operand (XEXP (src, 0), DImode)
+		   && immediate_operand (XEXP (src, 1), DImode))))
+    return 1;
+
+  else if (GET_CODE (src) == ASHIFT
+	   && (register_operand (XEXP (src, 0), SImode)
+	       || register_operand (XEXP (src, 0), DImode))
+	   && XEXP (src, 1) == const1_rtx)
+    return 1;
+
+  return 0;
+}
+
 static int
 check_return_regs (x)
      rtx x;
@@ -3423,6 +3533,40 @@ output_function_prologue (file, size, le
     }
 }
 
+/* Output code to restore any call saved registers.  */
+
+static void
+output_restore_regs (file, leaf_function)
+     FILE *file;
+     int leaf_function;
+{
+  int offset, n_regs;
+  const char *base;
+
+  offset = -apparent_fsize + frame_base_offset;
+  if (offset < -4096 || offset + num_gfregs * 4 > 4096 - 8 /*double*/)
+    {
+      build_big_number (file, offset, "%g1");
+      fprintf (file, "\tadd\t%s, %%g1, %%g1\n", frame_base_name);
+      base = "%g1";
+      offset = 0;
+    }
+  else
+    {
+      base = frame_base_name;
+    }
+
+  n_regs = 0;
+  if (TARGET_EPILOGUE && ! leaf_function)
+    /* ??? Originally saved regs 0-15 here.  */
+    n_regs = restore_regs (file, 0, 8, base, offset, 0);
+  else if (leaf_function)
+    /* ??? Originally saved regs 0-31 here.  */
+    n_regs = restore_regs (file, 0, 8, base, offset, 0);
+  if (TARGET_EPILOGUE)
+    restore_regs (file, 32, TARGET_V9 ? 96 : 64, base, offset, n_regs);
+}
+
 /* Output code for the function epilogue.  */
 
 void
@@ -3457,35 +3601,8 @@ output_function_epilogue (file, size, le
       goto output_vectors;                                                    
     }
 
-  /* Restore any call saved registers.  */
   if (num_gfregs)
-    {
-      int offset, n_regs;
-      const char *base;
-
-      offset = -apparent_fsize + frame_base_offset;
-      if (offset < -4096 || offset + num_gfregs * 4 > 4096 - 8 /*double*/)
-	{
-	  build_big_number (file, offset, "%g1");
-	  fprintf (file, "\tadd\t%s, %%g1, %%g1\n", frame_base_name);
-	  base = "%g1";
-	  offset = 0;
-	}
-      else
-	{
-	  base = frame_base_name;
-	}
-
-      n_regs = 0;
-      if (TARGET_EPILOGUE && ! leaf_function)
-	/* ??? Originally saved regs 0-15 here.  */
-	n_regs = restore_regs (file, 0, 8, base, offset, 0);
-      else if (leaf_function)
-	/* ??? Originally saved regs 0-31 here.  */
-	n_regs = restore_regs (file, 0, 8, base, offset, 0);
-      if (TARGET_EPILOGUE)
-	restore_regs (file, 32, TARGET_V9 ? 96 : 64, base, offset, n_regs);
-    }
+    output_restore_regs (file, leaf_function);
 
   /* Work out how to skip the caller's unimp instruction if required.  */
   if (leaf_function)
@@ -3575,6 +3692,139 @@ output_function_epilogue (file, size, le
  output_vectors:
   sparc_output_deferred_case_vectors ();
 }
+
+/* Output a sibling call.  */
+
+const char *
+output_sibcall (insn, call_operand)
+     rtx insn, call_operand;
+{
+  int leaf_regs = current_function_uses_only_leaf_regs;
+  rtx operands[3];
+  int delay_slot = dbr_sequence_length () > 0;
+
+  if (num_gfregs)
+    {
+      /* Call to restore global regs might clobber
+	 the delay slot. Instead of checking for this
+	 output the delay slot now.  */
+      if (delay_slot)
+	{
+	  rtx delay = NEXT_INSN (insn);
+
+	  if (! delay)
+	    abort ();
+
+	  final_scan_insn (delay, asm_out_file, 1, 0, 1);
+	  PATTERN (delay) = gen_blockage ();
+	  INSN_CODE (delay) = -1;
+	  delay_slot = 0;
+	}
+      output_restore_regs (asm_out_file, leaf_regs);
+    }
+
+  operands[0] = call_operand;
+
+  if (leaf_regs)
+    {
+      int spare_slot = (TARGET_ARCH32 || TARGET_CM_MEDLOW);
+      int size = 0;
+
+      if ((actual_fsize || ! spare_slot) && delay_slot)
+	{
+	  rtx delay = NEXT_INSN (insn);
+
+	  if (! delay)
+	    abort ();
+
+	  final_scan_insn (delay, asm_out_file, 1, 0, 1);
+	  PATTERN (delay) = gen_blockage ();
+	  INSN_CODE (delay) = -1;
+	  delay_slot = 0;
+	}
+      if (actual_fsize)
+	{
+	  if (actual_fsize <= 4096)
+	    size = actual_fsize;
+	  else if (actual_fsize <= 8192)
+	    {
+	      fputs ("\tsub\t%sp, -4096, %sp\n", asm_out_file);
+	      size = actual_fsize - 4096;
+	    }
+	  else if ((actual_fsize & 0x3ff) == 0)
+	    fprintf (asm_out_file,
+		     "\tsethi\t%%hi(%d), %%g1\n\tadd\t%%sp, %%g1, %%sp\n",
+		     actual_fsize);
+	  else
+	    {
+	      fprintf (asm_out_file,
+		       "\tsethi\t%%hi(%d), %%g1\n\tor\t%%g1, %%lo(%d), %%g1\n",
+		       actual_fsize, actual_fsize);
+	      fputs ("\tadd\t%%sp, %%g1, %%sp\n", asm_out_file);
+	    }
+	}
+      if (spare_slot)
+	{
+	  output_asm_insn ("sethi\t%%hi(%a0), %%g1", operands);
+	  output_asm_insn ("jmpl\t%%g1 + %%lo(%a0), %%g0", operands);
+	  if (size)
+	    fprintf (asm_out_file, "\t sub\t%%sp, -%d, %%sp\n", size);
+	  else if (! delay_slot)
+	    fputs ("\t nop\n", asm_out_file);
+	}
+      else
+	{
+	  if (size)
+	    fprintf (asm_out_file, "\tsub\t%%sp, -%d, %%sp\n", size);
+	  output_asm_insn ("mov\t%%o7, %%g1", operands);
+	  output_asm_insn ("call\t%a0, 0", operands);
+	  output_asm_insn (" mov\t%%g1, %%o7", operands);
+	}
+      return "";
+    }
+
+  output_asm_insn ("call\t%a0, 0", operands);
+  if (delay_slot)
+    {
+      rtx delay = NEXT_INSN (insn), pat;
+
+      if (! delay)
+	abort ();
+
+      pat = PATTERN (delay);
+      if (GET_CODE (pat) != SET)
+	abort ();
+
+      operands[0] = SET_DEST (pat);
+      pat = SET_SRC (pat);
+      switch (GET_CODE (pat))
+	{
+	case PLUS:
+	  operands[1] = XEXP (pat, 0);
+	  operands[2] = XEXP (pat, 1);
+	  output_asm_insn (" restore %r1, %2, %Y0", operands);
+	  break;
+	case LO_SUM:
+	  operands[1] = XEXP (pat, 0);
+	  operands[2] = XEXP (pat, 1);
+	  output_asm_insn (" restore %r1, %%lo(%a2), %Y0", operands);
+	  break;
+	case ASHIFT:
+	  operands[1] = XEXP (pat, 0);
+	  output_asm_insn (" restore %r1, %r1, %Y0", operands);
+	  break;
+	default:
+	  operands[1] = pat;
+	  output_asm_insn (" restore %%g0, %1, %Y0", operands);
+	  break;
+	}
+      PATTERN (delay) = gen_blockage ();
+      INSN_CODE (delay) = -1;
+    }
+  else
+    fputs ("\t restore\n", asm_out_file);
+  return "";
+}
 
 /* Functions for handling argument passing.
 
@@ -7014,6 +7264,7 @@ ultra_code_from_mask (type_mask)
     return IEU0;
   else if (type_mask & (TMASK (TYPE_COMPARE) |
 			TMASK (TYPE_CALL) |
+			TMASK (TYPE_SIBCALL) |
 			TMASK (TYPE_UNCOND_BRANCH)))
     return IEU1;
   else if (type_mask & (TMASK (TYPE_IALU) | TMASK (TYPE_BINARY) |
@@ -7486,6 +7737,7 @@ ultrasparc_sched_reorder (dump, sched_ve
 	/* If we are not in the process of emptying out the pipe, try to
 	   obtain an instruction which must be the first in it's group.  */
 	ip = ultra_find_type ((TMASK (TYPE_CALL) |
+			       TMASK (TYPE_SIBCALL) |
 			       TMASK (TYPE_CALL_NO_DELAY_SLOT) |
 			       TMASK (TYPE_UNCOND_BRANCH)),
 			      ready, this_insn);
--- gcc/tm.texi.jj	Fri Mar 24 09:13:52 2000
+++ gcc/tm.texi	Fri Mar 24 09:17:11 2000
@@ -1652,7 +1652,7 @@ accomplish this.
 @table @code
 @findex LEAF_REGISTERS
 @item LEAF_REGISTERS
-A C initializer for a vector, indexed by hard register number, which
+Name of a char vector, indexed by hard register number, which
 contains 1 for a register that is allowable in a candidate for leaf
 function treatment.
 
--- gcc/sibcall.c.jj	Sun Mar 19 06:26:47 2000
+++ gcc/sibcall.c	Fri Mar 24 09:17:11 2000
@@ -140,9 +140,13 @@ skip_copy_to_return_value (orig_insn, ha
      called function's return value was copied.  Otherwise we're returning
      some other value.  */
 
+#ifndef OUTGOING_REGNO
+#define OUTGOING_REGNO(N) (N)
+#endif
+
   if (SET_DEST (set) == current_function_return_rtx
       && REG_P (SET_DEST (set))
-      && REGNO (SET_DEST (set)) == REGNO (hardret)
+      && OUTGOING_REGNO (REGNO (SET_DEST (set))) == REGNO (hardret)
       && SET_SRC (set) == softret)
     return insn;
 
@@ -352,7 +356,6 @@ replace_call_placeholder (insn, use)
   NOTE_SOURCE_FILE (insn) = 0;
   NOTE_LINE_NUMBER (insn) = NOTE_INSN_DELETED;
 }
-
 
 /* Given a (possibly empty) set of potential sibling or tail recursion call
    sites, determine if optimization is possible.
--- gcc/final.c.jj	Sun Mar 19 20:31:03 2000
+++ gcc/final.c	Fri Mar 24 09:17:11 2000
@@ -4015,7 +4015,8 @@ leaf_function_p ()
 
   for (insn = get_insns (); insn; insn = NEXT_INSN (insn))
     {
-      if (GET_CODE (insn) == CALL_INSN)
+      if (GET_CODE (insn) == CALL_INSN
+	  && ! SIBLING_CALL_P (insn))
 	return 0;
       if (GET_CODE (insn) == INSN
 	  && GET_CODE (PATTERN (insn)) == SEQUENCE
@@ -4025,7 +4026,8 @@ leaf_function_p ()
     }
   for (insn = current_function_epilogue_delay_list; insn; insn = XEXP (insn, 1))
     {
-      if (GET_CODE (XEXP (insn, 0)) == CALL_INSN)
+      if (GET_CODE (XEXP (insn, 0)) == CALL_INSN
+	  && ! SIBLING_CALL_P (insn))
 	return 0;
       if (GET_CODE (XEXP (insn, 0)) == INSN
 	  && GET_CODE (PATTERN (XEXP (insn, 0))) == SEQUENCE
@@ -4048,8 +4050,6 @@ leaf_function_p ()
 
 #ifdef LEAF_REGISTERS
 
-static char permitted_reg_in_leaf_functions[] = LEAF_REGISTERS;
-
 /* Return 1 if this function uses only the registers that can be
    safely renumbered.  */
 
@@ -4057,6 +4057,7 @@ int
 only_leaf_regs_used ()
 {
   int i;
+  char *permitted_reg_in_leaf_functions = LEAF_REGISTERS;
 
   for (i = 0; i < FIRST_PSEUDO_REGISTER; i++)
     if ((regs_ever_live[i] || global_regs[i])
--- gcc/global.c.jj	Mon Mar  6 18:37:42 2000
+++ gcc/global.c	Fri Mar 24 09:17:11 2000
@@ -374,7 +374,7 @@ global_alloc (file)
      a leaf function.  */
   {
     char *cheap_regs;
-    static char leaf_regs[] = LEAF_REGISTERS;
+    char *leaf_regs = LEAF_REGISTERS;
 
     if (only_leaf_regs_used () && leaf_function_p ())
       cheap_regs = leaf_regs;
--- gcc/jump.c.jj	Fri Mar 24 09:13:52 2000
+++ gcc/jump.c	Fri Mar 24 09:17:11 2000
@@ -3879,6 +3879,13 @@ mark_jump_label (x, insn, cross_jump, in
                     cross_jump, in_mem);
 	}
       return;
+
+  /* Look at the Normal call sequence attached to the CALL_PLACEHOLDER.  */
+    case CALL_PLACEHOLDER:
+      for (insn = XEXP (x, 0); insn; insn = NEXT_INSN (insn))
+	if (GET_RTX_CLASS (GET_CODE (insn)) == 'i')
+	  mark_jump_label (PATTERN (insn), NULL_RTX, cross_jump, 0);
+      return;
       
     default:
       break;
--- gcc/calls.c.jj	Fri Mar 24 09:13:51 2000
+++ gcc/calls.c	Fri Mar 24 09:17:11 2000
@@ -165,7 +165,7 @@ static void initialize_argument_informat
 							 int, tree, tree,
 							 CUMULATIVE_ARGS *,
 							 int, rtx *, int *,
-							 int *, int *));
+							 int *, int *, int));
 static void compute_argument_addresses		PARAMS ((struct arg_data *,
 							 rtx, int));
 static rtx rtx_for_function_call		PARAMS ((tree, tree));
@@ -980,7 +980,8 @@ static void
 initialize_argument_information (num_actuals, args, args_size, n_named_args,
 				 actparms, fndecl, args_so_far,
 				 reg_parm_stack_space, old_stack_level,
-				 old_pending_adj, must_preallocate, is_const)
+				 old_pending_adj, must_preallocate, is_const,
+				 ecf_flags)
      int num_actuals ATTRIBUTE_UNUSED;
      struct arg_data *args;
      struct args_size *args_size;
@@ -993,6 +994,7 @@ initialize_argument_information (num_act
      int *old_pending_adj;
      int *must_preallocate;
      int *is_const;
+     int ecf_flags;
 {
   /* 1 if scanning parms front to back, -1 if scanning back to front.  */
   int inc;
@@ -1150,8 +1152,19 @@ initialize_argument_information (num_act
 
       args[i].unsignedp = unsignedp;
       args[i].mode = mode;
-      args[i].reg = FUNCTION_ARG (*args_so_far, mode, type,
-				  argpos < n_named_args);
+
+#ifdef FUNCTION_INCOMING_ARG
+      /* If this is a sibling call and the machine has register windows, the
+	 register window has to be unwinded before calling the routine, so
+	 arguments have to go into the incoming registers.  */
+      if (ecf_flags & ECF_SIBCALL)
+	args[i].reg = FUNCTION_INCOMING_ARG (*args_so_far, mode, type,
+					     argpos < n_named_args);
+      else
+#endif
+	args[i].reg = FUNCTION_ARG (*args_so_far, mode, type,
+				    argpos < n_named_args);
+
 #ifdef FUNCTION_ARG_PARTIAL_NREGS
       if (args[i].reg)
 	args[i].partial
@@ -2131,7 +2144,7 @@ expand_call (exp, target, ignore)
 	 call expansion.  */
       int save_pending_stack_adjust;
       rtx insns;
-      rtx before_call;
+      rtx before_call, next_arg_reg;
 
       if (pass == 0)
 	{
@@ -2284,7 +2297,8 @@ expand_call (exp, target, ignore)
 				       n_named_args, actparms, fndecl,
 				       &args_so_far, reg_parm_stack_space,
 				       &old_stack_level, &old_pending_adj,
-				       &must_preallocate, &is_const);
+				       &must_preallocate, &is_const,
+				       (pass == 0) ? ECF_SIBCALL : 0);
 
 #ifdef FINAL_REG_PARM_STACK_SPACE
       reg_parm_stack_space = FINAL_REG_PARM_STACK_SPACE (args_size.constant,
@@ -2305,6 +2319,13 @@ expand_call (exp, target, ignore)
 	  sibcall_failure = 1;
 	}
 
+      if (args_size.constant > current_function_args_size)
+	{
+	  /* If this function requires more stack slots than the current
+	     function, we cannot change it into a sibling call.  */
+	  sibcall_failure = 1;
+	}
+
       /* Compute the actual size of the argument block required.  The variable
 	 and constant sizes must be combined, the size may have to be rounded,
 	 and there may be a minimum required size.  When generating a sibcall
@@ -2569,9 +2590,9 @@ expand_call (exp, target, ignore)
 	{
 	  if (pcc_struct_value)
 	    valreg = hard_function_value (build_pointer_type (TREE_TYPE (exp)),
-					  fndecl, 0);
+					  fndecl, (pass == 0));
 	  else
-	    valreg = hard_function_value (TREE_TYPE (exp), fndecl, 0);
+	    valreg = hard_function_value (TREE_TYPE (exp), fndecl, (pass == 0));
 	}
 
       /* Precompute all register parameters.  It isn't safe to compute anything
@@ -2665,14 +2686,24 @@ expand_call (exp, target, ignore)
 	 later safely search backwards to find the CALL_INSN.  */
       before_call = get_last_insn ();
 
+      /* Set up next argument register.  For sibling calls on machines
+	 with register windows this should be the incoming register.  */
+#ifdef FUNCTION_INCOMING_ARG
+      if (pass == 0)
+	next_arg_reg = FUNCTION_INCOMING_ARG (args_so_far, VOIDmode,
+					      void_type_node, 1);
+      else
+#endif
+	next_arg_reg = FUNCTION_ARG (args_so_far, VOIDmode,
+				     void_type_node, 1);
+
       /* All arguments and registers used for the call must be set up by
 	 now!  */
 
       /* Generate the actual call instruction.  */
       emit_call_1 (funexp, fndecl, funtype, unadjusted_args_size,
 		   args_size.constant, struct_value_size,
-		   FUNCTION_ARG (args_so_far, VOIDmode, void_type_node, 1),
-		   valreg, old_inhibit_defer_pop, call_fusage,
+		   next_arg_reg, valreg, old_inhibit_defer_pop, call_fusage,
 		   ((is_const ? ECF_IS_CONST : 0)
 		    | (nothrow ? ECF_NOTHROW : 0)
 		    | (pass == 0 ? ECF_SIBCALL : 0)));

Cheers,
    Jakub
___________________________________________________________________
Jakub Jelinek | jakub@redhat.com | http://sunsite.mff.cuni.cz/~jj
Linux version 2.3.99-pre2 on a sparc64 machine (1343.49 BogoMips)
___________________________________________________________________

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]