GNU C Compiler Internals/Function calls 4 1
Global Control-Flow Analysis
The functions of a file are used to generate a callgraph in file cgraphunit.c. The two relevant functions are cgraph_finalize_compilation_unit() which is called from function pop_file_scope() after the file has been parsed and cgraph_finalize_function() which is called from finish_function().
The effect of each function depends on the compilation mode. The unit-at-a-time mode instructs the compiler to build the callgraph only after each function has been parsed. When this option is not present a function is converted as soon as it is parsed.
cgraph_finalize_function() calls cgraph_analyze_function() that converts it to RTL. Otherwise, the function is queued in cgraph_nodes_queue. At the end cgraph_finalize_compilation_unit() takes care of the queue. cgraph_nodes is the global variable representing the callgraph. Function dump_cgraph allows one to print out the callgraph.
In this chapter we will find out how functions call each other. Typically, a function passes a number of parameters when it makes a call. A stack frame is created when the function begins. It is possible, however, that the stack frame of the previous function gets reused. This type of function call is called a sibling call. When the function body is not large enough the run-time overhead of setting up a stack frame is too high. In this case the callee function gets inlined into the parent function.
Function expand_call() takes a CALL_EXPR tree and generates RTL expression. It has to decide argument passing mode. struct arg_data contains necessary information for each argument.
|tree tree_value||Tree node for this argument|
|enum machine_mode mode||Mode for value|
|rtx value||Current RTL value for argument, or 0 if it isn't precomputed|
|rtx initial_value||Initially-compute RTL value for argument; only for const functions.|
|rtx reg||Register to pass this argument in, 0 if passed on stack|
|rtx tail_call_reg||Register to pass this argument in when generating tail call sequence|
|rtx parallel_value||If REG is a PARALLEL, this is a copy of VALUE pulled into the correct form for emit_group_move.|
|int unsignedp||If REG was promoted from the actual mode of the argument expression, indicates whether the promotion is sign- or zero-extended|
|int partial||Number of bytes to put in registers. 0 means put the whole are in registers or not passed in registers.|
|int pass_on_stack||Nonzero if argument must be passed on stack. Note that some arguments may be passed on the stack even though pass_on_stack is zero, just because FUNCTION_ARG says so|
|struct locate_and_pad_arg_data locate||Some fields packaged up for locate_and_pad_parm|
|rtx stack||Location on the stack at which parameter should be stored|
|rtx stack_slot||Location on the stack of the start of this argument slot|
|rtx save_area||Place that this stack area has been saved, if needed|
|rtx *aligned_regs||If an argument's alignment does not permit direct copying into registers, copy in smaller-sized pieces into pseudos. These are stored in a block pointed to by this field.|
|int n_aligned_regs||says how many word-sized pseudos we made|
Generating a function call is impossible without certain machine-specific information, for example the number of hardware registers of different types. A number of macros defined in each architecture .h file take care of connecting the middle-end with the back-end:
|INIT_CUMULATIVE_ARGS||Initialize CUMULATIVE_ARGS data structure for a call to a function whose data type is FNTYPE.|
|FUNCTION_ARG||Define where to put the arguments to a function. Value is zero to push the argument on the stack, or a hard register in which to store the argument.|
|FUNCTION_ARG_ADVANCE||Update the data in CUM to advance over an argument.|
Function init_cumulative_args() in file i386.c takes care of this in case of x86 architecture. It takes into account function attributes regparm and fastcall that a user might specify in which case the number of available registers is set accordingly. However, if the function takes a variable number of arguments then all parameters are passed in the stack.
Parameters' location is decided in function initialize_argument_information(). A machine-specific function_arg() returns the rtl for an argument if it goes into a register:
ret = gen_rtx_REG (mode, regno);
It is possible that a parameter is passed both on the stack and in a register, for example if the parameter's type is addressable.
Depending on certain conditions, sibling call instruction chain is generated in addition to the normal chain. Let us consider the case when only normal chain is generated. Variable rtx argblock is the address of space preallocated for stack parameters (on machines that lack push insns), or 0 if space not preallocated.
A number of machine-specific variables shape up the stack. ACCUMULATE_OUTOING_ARGS instructs the compiler to preallocate the sufficient number of bytes for all arguments of any function call in the function prolog. After that, function arguments are saved in that area without modifying stack frame size. ACCUMULATE_OUTOING_ARGS depends on variable target_flags. It depends on machine configuration and command-line options. In case of ACCUMULATE_OUTOING_ARGS, i386-specific variable
const int x86_accumulate_outgoing_args = m_ATHLON_K8 | m_PENT4 | m_NOCONA | m_PPRO;
and a command-line option -maccumulate-outgoing-args enable this feature. This means that it is enabled on a Pentium4 and that push/pop instructions are not used to pass funcitons' parameters.
If we preallocated stack space, compute the address of each argument and store it into the ARGS array.
Precompute parameters as needed for a function call. this routine fills in the INITIAL_VALUE and VALUE fields for each precomputed argument. precompute_arguments()
Given a FNDECL and EXP, return an rtx suitable for use as a target address in a call instruction
funexp = rtx_for_function_call (fndecl, addr);
Precompute all register parameters. It isn't safe to compute anything once we have started filling any specific hard regs. precompute_register_parameters (num_actuals, args, ®_parm_seen);
Now store (and compute if necessary) all non-register parms. These come before register parms, since they can require block-moves, which could clobber the registers used for register parms. Parms which have partial registers are not stored here, but we do preallocate space here if they want that.
store_one_arg (&args[i], argblock, flags, adjusted_args_size.var != 0, reg_parm_stack_space)
Do the register loads required for any wholly-register parms or any parms which are passed both on the stack and in a register. Their expressions were already evaluated.
load_register_parameters (args, num_actuals, &call_fusage, flags, pass == 0, &sibcall_failure);
Finally, emit_call_1() generates instructions to call function FUNEXP, and optionally pop the results. The CALL_INSN is the first insn generated.
When the placement of the arguments is decided variable struct args_size args_size saves the total size of stack arguments. It records the size of a sequence of arguments as the sum of a tree-expression and a constant. The tree part is necessary to handle arguments of variable size, for example arrays arguments which size is not known at compile-time. The C language does not allow variable-sized arguments.
One might wonder how the callee function finds out where the arguments arrive. It also uses machine-specific information that the caller used. Rerunning INIT_CUMULATIVE_ARGS, FUNCTION_ARG, and FUNCTION_ARG_ADVANCE in the callee decides whether an argument should arrive in a register or on the stack identically to expand_call.