What You Need To Know About ELF for CS452

ELF stands for the Executable and Linkable Format. It was originally developed by Sun Microsystems for use in their operating system, but is now in widespread use in many other operating systems, such as Linux and FreeBSD.

Two views of the world

There are two views of an ELF file. The section view sees the file as a bunch of sections, which are to be linked or loaded in some manner. The program view sees the file as a bunch of ELF segments (not to be confused with Intel segments) which are to be loaded into memory in order to execute the program.

This split is designed to allow someone writing a linker to easily get the information they need (using the section view) and someone writing a loader (that's you) easily get the information they need without worrying about a lot of the complications of linking (using the program view).

Because you are writing a loader, not a linker, you can completely ignore the section view. You only care about the program view. This throws away around 80% of the ELF spec. Doesn't that make you feel good?

Getting Started

The first thing you need to find is the ELF header. This header is pretty easy to find, since it will be at location zero of the ELF file you're attempting to load.

The ELF header is exposed in elf.h as the Elf32_Ehdr structure. The first 16 bytes of the structure are used to identify the ELF file. You should check that the first four bytes of these 16 bytes correspond to the values ELFMAG0 through ELFMAG3. These bytes are the ELF "magic number". This allows you to make sure that the ELF file is really a genuine ELF file.

You can more or less ignore all the other entries in the header (if you want, you can include other sanity checks, like making sure the machine type is correct) except for the e_phoff entry. This entry gives the location in the file (in bytes) of the program headers. It's the program headers that we're going to use to load the program into memory. The e_phnum entry is also important, as it tells you how many program headers are present.

The Program Headers

The linker script that we provide for user programs (programs which your kernel loads) generates exactly two program headers. One contains the code for the program (coming from the text section) and the other contains the program's static data. Both program headers will have a segment type of PT_LOAD (the other types are only used for linking, not loading). The p_offset member tells you where the segment's data is located in the file. There are two sizes in each program header. p_filesz specifies how many bytes of data are in the file that need to be copied to memory. p_memsz specifies how much memory you should allocated to the segment. Note that p_memsz is always greater than or equal to p_filesz. For the segment containing the program's code, these values will be equal. Because programs may contain data that is zero-initialized (and hence doesn't need to take up any space in the file) p_memsz may be greater than p_filesz for the segment containing the program data. In that case you should copy p_filesz bytes to memory, and set the rest to zero.

Now you only need to know which program header corresponds to code, and which corresponds to data. To do this, look at the p_flags field. For the code segment this will be PF_R+PF_X (indicating a read-execute segment). For the data segment this will be PF_R+PF_W, indicating a read-write segment.

This should be all you need to get the information you want from the ELF file. I haven't talked about exactly what you do with this information, as this was covered in class.

References

Stefanus Du Toit, September 26 2004. CS452 Homepage.