Harshdeep 2.0

May 14, 2007

Thou Shalt Rebase Thy DLL

Filed under: Dev — harshdeep @ 8:36 pm

A friend promised me a treat today if I could find a way to tell the base address at which a DLL has been loaded for a given process. (Dragon’s Den in Sector 15-A, Noida serves great chinese 🙂 )

The short answer is – Use ListDlls from SysInternals.

But thankfully, it took me some time before I could locate this nifty little tool. And while looking for the solution, I stumbled upon a lot of interesting information about DLL loading.

For starters, this codeproject article concisely explains the preferred base address of a DLL, and how it affects the load time.

Every executable and DLL module has a preferred base address, which identifies the ideal memory address where the module should get mapped into a process’ address space. When you build an executable module, the linker sets the module’s preferred base address to 0x00400000. For a DLL module, the linker sets a preferred base address of 0x10000000.

The default preferred base addresses mentioned here are for Microsoft’s VC++ linker. They will most probably be different with Borland and other compilers. You can use the Dependency Walker to check the preferred base address of a DLL/exe.

When the DLL is built, the addresses of its functions and global/static variables are hard-coded relative to the preferred base address. This works as long as the DLL can be loaded at that address.

But what if two DLLs used by your executable have the same base address?

If your application needs to load a DLL whose preferred load address conflicts with memory that’s already in use (such as by a previously-loaded DLL that had the same preferred load address), the operating system “rebases” the conflicting DLL by loading it at a different address that does not overlap and then by adjusting all addresses. The physical format of a .dll file includes relocation information that points to, for example, the target addresses of CALL and JMP instructions, and addresses that reference global/static variables (such as literal strings). All these addresses have to get revised if the operating system cannot load the DLL at its preferred load address.

These address fixups slow down the loading of the DLL. And they put a penalty in pagefile usage as well. An old but highly relevant article by Ruediger Asche, Rebasing Win32 DLLs: The Whole Story, explains this very lucidly.

Whenever a page of the DLL is removed from an application’s working set, the operating system will reload that page from the DLL executable file the next time the page is accessed.

Of course, when a DLL is rebased, this scheme no longer works because the pages that contain relocated addresses differ from the corresponding pages in the DLL executable image. Thus, as soon as the operating system attempts to fix up an address when loading an executable file, the corresponding page is copied (because the section was opened with the COPY_ON_WRITE flag), all the changes are made to the copy, and the operating system makes a note that from now on the page is to be swapped from and to the system pagefile instead of the executable image.

There are two potential performance hits in this setup: First, each page that contains an address to be relocated takes up a page on the system pagefile (which will, in effect, reduce the amount of virtual memory available to all applications); and second, as the operating system performs the first fixup in a DLL’s page, a new page must be allocated from the pagefile, and the entire page is copied.

So base address conflicts are evil. How do you avoid them?

One way is to manually assign suitable preferred base addresses to all DLLs at build time using the /BASE linker option.

You can even take the strict approach to build the DLL with the /FIXED flag. Now, if it can’t load at its preferred base address, it won’t load at all.

But that’s not it. You can change the base address of a compiled DLL as well. The RebaseImage function in Imagehlp.dll lets you do just that.

Thiadmer has used this to calculate the base address of a DLL by hashing its name. This technique has a good probability of assigning non-conflicting base addresses to the DLLs, but they are still being rebased in isolation and there is no guarantee that their base addresses won’t conflict.

This is where the EDITBIN utility provided by the Platform SDK comes in. You can use it (with /REBASE option) to rebase a set of DLLs to non-conflicting base addresses. It also uses the size of the DLLs to allot the base addresses, thereby ensuring an optimal distribution.

There are a couple more things that you can do to make sure that your application fires up in no time. In his investigations on the costs of DLL loading, Ruediger Asche came up with some interesting conclusions.

  • All other things being equal, the size of the DLL does not matter; that is, the costs for loading a small DLL and a large DLL are pretty much equal. Thus, if possible, you should avoid writing a lot of small DLLs and instead write fewer large DLLs if load time is an issue for you. Note that this observation holds true over a very wide range of DLL sizes—when I ran the test on the huge binary DLL I mentioned earlier (the one with 15,000 pages), the load time did not differ very much from the load time for the small DLL that contains six pages total.
  • Rebasing the DLL incurs an overhead of about 600 percent on Windows NT and around 400 percent on Windows 95. Note, however, that this implies a great number of fixups (34,000 in the sample suite). For a typical DLL, the number is much smaller on the average; for example, in the debug version of MFC30D.DLL, which ships with Visual C++ version 2.x, there are about 1700 fixups, which is about 5 percent of the 34,000 fixups in the sample suite.
  • The single biggest factor that slows down the loading of DLLs is the location of the DLL. The documentation for LoadLibrary describes the algorithm that the operating system uses for locating the DLL image; a DLL located at the first search position (the current directory) loads in typically 20 percent or less of the time as the same DLL located deep down in the path loads. It is fairly obvious that the exact load time difference depends a lot on the length of the path, the efficiency of the underlying file system, and the number of files and directories that need to be searched.”

Wrapping up the discussion finally with an interesting tidbit from Old New Thing about how Windows 95 used to rebase its DLLs in the memory starved conditions of the mid-90s.

Create a free website or blog at WordPress.com.