If a program has many subprogram calls, you can use the -Q option to turn on inlining, which reduces the overhead of such calls. Consider using the -p or -pg option with prof or gprof, respectively, to determine which subprograms are called most frequently and to list their names on the command line.
To make inlining apply to calls where the calling and called subprograms are in different source files, include the -qipa option also.
# Let the compiler decide (relatively cautiously) what to inline xlf95 -O3 -Q inline.f # Encourage the compiler to inline particular subprograms xlf95 -O3 -Q -Q+called_100_times:called_1000_times inline.f # Extend the inlining to calls across files xlf95 -O3 -Q -Q+called_100_times:called_1000_times -qipa inline.f
Related Information: | See -Q Option and -qipa Option. |
Getting the right amount of inlining for a particular program may require some work on your part. The compiler has a number of safeguards and limits to avoid doing an excessive amount of inlining. Otherwise, it might perform less overall optimization because of storage constraints during compilation, or the resulting program might be much larger, and run slower because of more frequent cache misses and page faults. However, these safeguards may prevent the compiler from inlining subprograms that you do want inlined. If this happens, you will need to do some analysis or rework or both to get the performance benefit.
As a general rule, consider identifying a few subprograms that are called most often, and inline only those subprograms.
Some common conditions that prevent -Q from inlining particular subprograms are:
Consider an example with three procedures: A is the caller, B and C at the upper size limit for automatic inlining. They are all in the same file, which is compiled like this:
xlf -Q -Q+c file.f
The -Q option means that calls to B or C can be inlined. -Q+c means that calls to C are more likely to be inlined. If B and C were twice as large, calls to B would not be inlined at all, while some calls to C could still be inlined.
Although these limits might prevent some calls from A to B or A to C from being inlined, the process starts over after the compiler finishes processing A.
To change the size limits that control inlining, you can specify -qipa=limit=n, where n is 0 through 9. Larger values allow more inlining.