BSD DevCenter
oreilly.comSafari Books Online.Conferences.


IRIX Binary Compatibility, Part 1

by Emmanuel Dreyfus

Author's Note: This article details the IRIX binary compatibility implementation for the NetBSD operating system. This includes the creation of a new emulation subsystem inside the NetBSD kernel and a lot of reverse engineering to understand and reproduce how IRIX internals work.

Because this article includes an introduction to all kernel subsystems involved with IRIX binary compatibility, we assume the reader has some experience in user-land Unix programming.

An Introduction to Binary Compatibility


Throughout this article, we reference various NetBSD kernel source files and NetBSD manual pages.

Kernel and User-Mode Overview

Unix systems have two distinct modes of operation, known as user mode and kernel (or system) mode. In user mode, the operating system (OS) executes code provided by users. It could be a Web browser, a computer-science-student's project, a Web server (in this case, the user running the program is usually the system administrator), and so on. This code is run with limited privileges. It has limited access to the computer's memory, and usually no access at all to the hardware.

When running in kernel mode, the OS is only executing trusted code, which was loaded at boot time. This code is known as the OS kernel. The kernel has full access to the memory and hardware. It is here to provide services to user programs:

  • It gives user programs access to the hardware. It provides an abstraction layer, presenting files and terminals to user programs where in fact only zeros and ones exist on hard disk and display I/O controllers.
  • It periodically switches execution between several user programs (which are called processes), maintaining the illusion of multitasking.
  • It ensures that a user accesses resources which correspond to the user's privileges.

User processes call kernel code by issuing a trap. A trap is a hardware or software exception that suspends user process execution, and gives control to kernel code. The kernel will handle the exception, after which it may return to user mode and resume the execution of the user process, or it may destroy the user process. Example of traps are division by zero, memory faults (accessing any virtual addresses where no physical memory is mapped), timer interrupts (that are used to switch between user processes), or requests by the user process to access some resource controlled by the kernel.

These requests can be opening a file, reading from a network connection, or creating a new process. The process does this by issuing a system call, like open(2), read(2), or fork(2). The system call is in fact a CPU instruction that causes a trap.

Here is an example of MIPS assembly to call the fork(2) system call on NetBSD:

li  $v0,2   # 2 is the system call number for fork()
            # v0 is the register holding the system call number
syscall     # syscall is the CPU instruction to do a system call

On the syscall instruction execution, the kernel executes a particular trap handler, which is known as the system call handler. For NetBSD/mips, it can be found in sys/arch/mips/mips/syscall.c:syscall_plain(). The system call handler expects an argument, which is the system call number. The system call handler uses a table, called the system call table, to look up a kernel function that will be called in order to complete the system call. On NetBSD, the system call table for native processes is generated from sys/kern/syscalls.master.

System calls are the way a user process requests action from the kernel, but there is also a mechanism used by the kernel to notify the user process of unusual conditions: signals. Signals are issued by various traps and system calls, to notify the process that it raised an exception: memory fault (the famous segmentation fault, well known to students learning C), division by zero and so on.

For each signal, the user process can decide to take default action on some signals (by default, some signals cause program abortion, other are simply ignored), to ignore it, or to execute a function called a signal handler. This choice is made using the signal(3) library call or the sigaction(2) system call.

Binary Compatibility at a Glance

There is a clean separation between user mode and kernel mode. User processes run on top of the kernel with very little knowledge of what is inside a system call. All they do is issuing system calls, expecting a behavior documented by kernel developers in a set of man pages. Most programs do not care about kernel internals and will just work if you change the kernel, as long as the system call behavior is left unchanged.

This is how NetBSD binary compatibility works. When launching a new program, the kernel is able to distinguish between native NetBSD binaries and, for example Linux or FreeBSD binaries on NetBSD/i386. It will hence choose an alternative system call table for this program, which will contain appropriate entries for the emulated OS. For instance, NetBSD/i386 uses sys/compat/linux/arch/i386/syscalls.master to provide the system call table for Linux binaries.

When a Linux binary running on NetBSD does a system call, the NetBSD kernel will run the appropriate function in the Linux system call table. This function emulates the behavior of the Linux system call so that the user program is fooled into thinking that it is running on the Linux kernel whereas it is in fact running on the NetBSD kernel.

Some system calls have the same behavior in NetBSD and in the emulated OS; in this case, the emulation system call table just uses the same corresponding function. Sometime the behavior is a bit different. For instance some flags have different values, or there are different system call semantics. In this case, the system call table references an emulation function, which will call the native function after adapting the arguments and/or behavior. This is done, for instance, in sys/compat/linux/common/linux_misc.c:linux_sys_uname() for Linux uname(2) emulation. Last but not least, the emulated system call may have no native equivalent. The emulation function that implements the system calls must hence do all the work, or just act as the work has been done and just return, hoping that the user process will not notice the broken behavior (yes, sometimes it works).

The other part of the job is implementing signal emulation. Care should be taken in order to ensure the system call handler is called in the same way the emulated OS would have done it. This job leads to the manipulation of machine registers and assembly language, and hence it is quite machine dependent.

Pages: 1, 2

Next Pagearrow

Sponsored by: