ELF stands for the Executable and Linkable Format. It was originally developed by Sun Microsystems for use in their operating system, but is now in widespread use in many other operating systems, such as Linux and FreeBSD.
There are two views of an ELF file. The section view sees the file as a bunch of sections, which are to be linked or loaded in some manner. The program view sees the file as a bunch of ELF segments (not to be confused with Intel segments) which are to be loaded into memory in order to execute the program.
This split is designed to allow someone writing a linker to easily get the information they need (using the section view) and someone writing a loader (that's you) easily get the information they need without worrying about a lot of the complications of linking (using the program view).
Because you are writing a loader, not a linker, you can completely ignore the section view. You only care about the program view. This throws away around 80% of the ELF spec. Doesn't that make you feel good?
The first thing you need to find is the ELF header. This header is pretty easy to find, since it will be at location zero of the ELF file you're attempting to load.
The ELF header is exposed in elf.h
as the
Elf32_Ehdr
structure. The first 16 bytes of the structure
are used to identify the ELF file. You should check that the first
four bytes of these 16 bytes correspond to the values
ELFMAG0
through ELFMAG3
. These bytes are the
ELF "magic number". This allows you to make sure that the
ELF file is really a genuine ELF file.
You can more or less ignore all the other entries in the header (if
you want, you can include other sanity checks, like making sure the
machine type is correct) except for the e_phoff
entry. This entry gives the location in the file (in bytes) of the
program headers. It's the program headers that we're going to use
to load the program into memory. The e_phnum
entry is
also important, as it tells you how many program headers are present.
The linker script that we provide for user programs (programs which
your kernel loads) generates exactly two program headers. One contains
the code for the program (coming from the text
section)
and the other contains the program's static data. Both program headers
will have a segment type of PT_LOAD
(the other types
are only used for linking, not loading). The p_offset
member tells you where the segment's data is located in the
file. There are two sizes in each program
header. p_filesz
specifies how many bytes of data are in
the file that need to be copied to memory. p_memsz
specifies how much memory you should allocated to the segment. Note
that p_memsz
is always greater than or equal to
p_filesz
. For the segment containing the program's code,
these values will be equal. Because programs may contain data that is
zero-initialized (and hence doesn't need to take up any space in the
file) p_memsz
may be greater than p_filesz
for the segment containing the program data. In that case you should
copy p_filesz
bytes to memory, and set the rest to
zero.
Now you only need to know which program header corresponds to code,
and which corresponds to data. To do this, look at the
p_flags
field. For the code segment this will be
PF_R+PF_X
(indicating a read-execute segment). For the
data segment this will be PF_R+PF_W
, indicating a
read-write segment.
This should be all you need to get the information you want from the ELF file. I haven't talked about exactly what you do with this information, as this was covered in class.
/u/cs452/i586-3.3.3/include/cs452/elf.h
contains a
header file defining all the ELF structures./u/cs452/i586-3.3.3/examples/useful/loader.app.x
contains the user program linker script.