Regenerating the symbol table for a statically linked and stripped executable.

The Problem

Closed source executables that are to be widely distributed are typically statically linked to avoid problems with the end user not having a required library installed. Static linking increases the executable size, so the executable is usually stripped in an attempt to reduce the size. While static linking and stripping is not a program protection technique per se, it does make reverse engineering more difficult.

Reverse engineering a statically linked and stripped executable is more difficult than reverse engineering a dynamically linked one because there is no easy way to distinguish between application code and library code. Before the analyst can begin statically analysing the executable, the library code must be positively identified. This is a large task as there is typically more library code than application code when statically linking.

The Solution

The desired solution is to automate the identification of the statically linked library code. This is achieved by generating a signature representing the library code and then searching for that signature inside the executable. If the signature is found, then we know where that particular library code exists inside the executable.

The obvious downside to this is that copies of the library code must be available to do the analysis. This is not such as big problem as most developers obtain their libraries prebuilt from a vendor. Not many people compile their own version of libc. We (as analysts) just need to optain a database of libraries from different vendors. (The gentoo linux distribution is an exception to this as its users download all packages as source and compile using compiler options optimised for their machines).


rsymtab

The rsymtab suite is a collection of tools to assist in the identification of library code inside statically linked executables, and to automatically regenerate symbol table entries for that library code.

It uses the libbfd library, so should be portable to all platforms that libbfd supports. However, it also suffers from the same problems as libbfd. For ELF executables, this means that they must have a valid section header table.

The suite consists of two tools so far, objgrep and gensymtab.

objgrep

The purpose of this tool is determine if an executable contains any of a specified collection of object files. These object files are the library files that may have been statically linked into the executable.

gensymtab

Once objgrep has determined which object files were used to link the executable, gensymtab is used to copy the symbol tables of these object files into the executable. In this way the symbol table for the executable is partially rebuilt.

From the above descriptions, we can see that the symbol table can be rebuilt only if the exact libraries used for linking can be obtained. This limitation can be offset by obtaining many different candidate libraries and testing each until a suitable match is found. An example of this is shown in the example below.

Download

Latest version 0.7 (2004-07-01).

rsymtab-0.7.tar.gz (source)

Example - from the README

This example is based on the executable the-binary from the Honeynet Project's Reverse Challenge.

After preliminary analysis, we have determined that the statically linked and stripped (linux i386) executable was compiled with gcc 2.7.2.1.2 and linked against libc-5.3.12

Examining linux distributions shows that slackware-3.1 and redhat-4.x shipped with these versions of libc5. Here we examine each distribution to find the most likely candidate:

$ objgrep --summary the-binary redhat-4.0/libc-5.3.12-8/libc.a 
found 155 matches.

$ objgrep --summary the-binary redhat-4.1/libc-5.3.12-17/libc.a 
found 111 matches.

$ objgrep --summary the-binary redhat-4.2/libc-5.3.12-18/libc.a 
found 111 matches.

$ objgrep --summary the-binary redhat-4.2/libc-5.3.12-18.2/libc.a
found 27 matches.

$ objgrep --summary the-binary redhat-4.2/libc-5.3.12-18.5/libc.a
found 27 matches.

$ objgrep --summary the-binary slackware-3.1/libc-5.3.12/libc.a 
found 173 matches.

From this we see that slackware-3.1 is the most likely candidate for the compile platform as it has the most matches.

Now we do an exhaustive search for matches in all sections of the executable. The results of the search are stored in the file out:

$ objgrep --verbose the-binary slackware-3.1/gcc-2.7.2/libgcc.a slackware-3.1/libc-5.3.12/* --output out
processing slackware-3.1/gcc-2.7.2/libgcc.a ... found 4 matches.
processing slackware-3.1/libc-5.3.12/crt1.o ... found 1 matches.
processing slackware-3.1/libc-5.3.12/crtbegin.o ... found 2 matches.
processing slackware-3.1/libc-5.3.12/crtbeginS.o ... found 0 matches.
processing slackware-3.1/libc-5.3.12/crtend.o ... found 2 matches.
processing slackware-3.1/libc-5.3.12/crtendS.o ... found 0 matches.
processing slackware-3.1/libc-5.3.12/crti.o ... found 0 matches.
processing slackware-3.1/libc-5.3.12/crtn.o ... found 2 matches.
processing slackware-3.1/libc-5.3.12/gcrt1.o ... found 0 matches.
processing slackware-3.1/libc-5.3.12/libbsd.a ... found 0 matches.
processing slackware-3.1/libc-5.3.12/libc.a ... found 234 matches.
processing slackware-3.1/libc-5.3.12/libcurses.a ... found 0 matches.
processing slackware-3.1/libc-5.3.12/libdb.a ... found 0 matches.
processing slackware-3.1/libc-5.3.12/libgmon.a ... found 0 matches.
processing slackware-3.1/libc-5.3.12/libieee.a ... found 0 matches.
processing slackware-3.1/libc-5.3.12/libm.a ... found 0 matches.
processing slackware-3.1/libc-5.3.12/libmcheck.a ... found 0 matches.
processing slackware-3.1/libc-5.3.12/libtermcap.a ... found 0 matches.

Now the matches have been found, generate the symbol table:

$ gensymtab out

So the symbol table should have been partially regenerated. Lets check this by disassembling the executable. The results below show that it worked:

$ objdump -d the-binary
[snip]
 80485bc:       6a 0a                   push   $0xa
 80485be:       e8 09 d1 00 00          call   80556cc <sleep>
 80485c3:       6a 09                   push   $0x9
 80485c5:       a1 70 e7 07 08          mov    0x807e770,%eax
 80485ca:       50                      push   %eax
 80485cb:       e8 e0 ec 00 00          call   80572b0 <__libc_kill>
 80485d0:       6a 00                   push   $0x0
 80485d2:       e8 e5 d9 00 00          call   8055fbc <exit>
[snip]