controlling symbol ordering

So as an experiment I wanted to be able to control the order of the symbol addresses in the shared libraries we create in OpenOffice.org, e.g. the major component of writer libsw680li.so (for i386 linux). To see if placing the methods used during a standard startup together made any difference to startup performance.

Firstly I need to know what those methods are:

gcc provides -finstrument-functions which will instrument the code to force a call to __cyg_profile_func_enter on enter and __cyg_profile_func_exit on exit and pass them the address of the function in question. In this case I made a little shared library for use with LD_PRELOAD which would collect what was called and use libbfd to get their names and output them in order of execution. There are various other possibilities mooted around more sophisticated optimal orderings elsewhere, but lets keep it simple and output in call order.

Secondly I need to be able to somehow control their ordering in the final shared object:

gcc provides -ffunction-sections to stick each function in a section of its own which, from some random googling, apparently enables one to create a custom linker script which can specify the order of the functions. Firefox has some simple tooling to take the output of ld –verbose to get the default linker script and then munge the desired ordering of the sections through it so as to generate the custom linker script which can be passed to ld with -T/–script.

So, taking the basic instrumenter and its output for a startup of writer built with -finstrument-functions and munging that output through the mozilla mklinkscipt to ld –script on a recompile with -ffunction-sections, does it achieve anything ?

The results indicate a .12 second warm start improvement, which appears quite promising.

instrumenting and ldscript tools

3 Responses to “controlling symbol ordering”

  1. Scott Lamb says:

    Interesting project. There was a project (GNU Rope) to do something similar, but it’s apparently been abandoned.

    0.12 sec is what percent of the original startup time? How does this ordering compare to that produced by -freorder-functions when fed -fprofile-arcs output generated from app startup?

  2. caolan says:

    Yeah, indeed gnu rope, but as you say it seems quite dead and theres no sign its code anywhere for a casual look-see

    0.12 seconds reduction from startup of 2.29 secs to 2.17 secs, so knocked 5.2% off the startup time

    The catch with -fprofile-generate is that its really intended for a sort of build with it, make tests, make clean, rebuild with -fprofile-use cycle which is unwieldy for something like OOo. Additionally running OOo as part of the build to generate the data is tricky, so I haven’t done such a build yet. Though with my headless plugin (http://blogs.linux.ie/caolan/2007/05/04/headless-ooo) I’ll give it a whirl at some point.

  3. [...] Found some mentions of GNU Rope unfinishedware and a relatively recent blog post relevant the subject. Posted by tglek Filed in Uncategorized No Comments [...]

Leave a Reply