ONLamp.com
oreilly.comSafari Books Online.Conferences.

advertisement


Linux Compatibility on BSD for the PPC Platform: Part 5
Pages: 1, 2, 3

Peek into the traced program

In this section we will explain the bug that prevented Linux's gdb from getting a backtrace on a traced program. In fact, ptrace() emulation was even more broken than this, because gdb was not even able to tell where the program stopped when it received a signal. The ouptut was:



Program received signal SIGIO, I/O possible.
0x0 in ?? ()
gdb>

Here is a kernel trace of what gdb attempted to do in order to display the address and the name of the function where the traced process was stopped.

161 gdb   RET write 1
161 gdb   CALL  write(0x1,0x50374000,0x2d)
161 gdb   GIO   fd 1 wrote 45 bytes
   "Program received signal SIGIO, I/O possible.
   "
161 gdb   RET   write 45/0x2d
161 gdb   CALL  ptrace(PTRACE_PEEKUSER,0xa2,0x4,0x7fffdc3c)
161 gdb   RET   ptrace 2147477168/0x7fffe6b0
161 gdb   CALL  ptrace(PTRACE_PEEKUSER,0xa2,0x90,0x7fffdc0c)
161 gdb   RET   ptrace 268437452/0x100007cc
161 gdb   CALL  ptrace(PTRACE_PEEKTEXT,0xa2,0xfffffffc,0x7fffdc3c)
161 gdb   RET   ptrace 0
161 gdb   CALL  ptrace(PTRACE_PEEKTEXT,0xa2,0,0x7fffdc3c)
161 gdb   RET   ptrace -1 errno 22 Invalid argument
161 gdb   CALL  ptrace(PTRACE_PEEKTEXT,0xa2,0xfffffffc,0x7fffdc3c)
161 gdb   RET   ptrace 0
161 gdb   CALL  ptrace(PTRACE_PEEKTEXT,0xa2,0,0x7fffdc3c)
161 gdb   RET   ptrace -1 errno 22 Invalid argument
161 gdb   CALL  ptrace(PTRACE_PEEKTEXT,0xa2,0xffffbca0,0x7fffdc34)
161 gdb   RET   ptrace 0

The first PEEKUSER operation reads the register GPR4 (address 0x4 in Linux's U-dot zone). The returned value (0x7fffe6b0) is an address in the user stack -- it seems valid. The second PEEKUSER call reads the Link register (address 0x90 in Linux's U-dot zone). Here we get an address (0x100007cc) which is obviously located in the process text segment -- it also seems valid. Then gdb attempts to read the function names from the program text with PEEKTEXT commands, but there is obvously something wrong because the requested address (0xfffffffc) is not located in the user address space (user addresses range from 0x00000000 to 0x7fffffff on NetBSD/PowerPC). The next PEEKTEXT attempts are even more malformed, and they fail with an invalid argument error.

The surprising thing was that the first two PEEKUSER calls seemed correct, and the next PEEKTEXT call using the PEEKUSER results was obvously wrong. Using printf() in the kernel to display the correct values confirmed that the PEEKUSER results were right.

It seems that something is wrong in PEEKTEXT or PEEKUSER requests. Here we try the following sample program to check the PEEKTEXT operation.

#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <signal.h>
#include <sys/types.h>
#include <sys/ptrace.h>
#include <sys/errno.h>
#include <sys/wait.h>

void handler (void) {
    printf ("in handler\n");
    return;
}
int main (int argc, char** argv) {
    int spot = 0x88888888;
    int err;
    int pid;
    int status;
    long data;

    pid = fork();
    switch (pid) {
        case -1:
            perror ("fork failed");
            exit (-1);
            break;
        case 0:
            spot = 0x77777777;
            signal (SIGUSR1, (void*)(*handler));
            err = ptrace (PTRACE_TRACEME, 0, 0, 0);
            printf ("ptrace PTRACE_TRACEME returned %d, errno=%d\n",
                err, errno);
            kill (getpid(), SIGUSR1);
            sleep (1);
            printf ("child quitting\n");
            break;
        default:
            spot = 0x99999999;
            wait (&status);
            printf ("parent: PTRACE_PEEK on %d\n", pid);
            errno = 0;
            data = ptrace (PTRACE_PEEKTEXT, pid, &spot, 0);
            if (errno != 0)
                printf ("ptrace returned %d, errno=%d\n", data, errno);
            printf ("readen 0x%lx\n", data);
            printf ("data=0x%lx\n&data=0x%lx\n", data, &data);
            break;
    }
    return 0;
}

This sample program was interesting because it exhibited result values for PEEKTEXT operations that were different in the program output and in the kernel trace. On the kernel trace, we had the correct value, and in the program output, the wrong value. The explanation of this kind of phenomenon is that the system-call wrapper in glibc altered the return value.

Looking at glibc sources, the answer was obvious. The system call wrapper for ptrace() is defined in glibc/sysdeps/unix/sysv/linux/ptrace.c. Here are the sources of this wrapper

  if (request > 0 && request < 4)
  data = &ret;

res = INLINE_SYSCALL (ptrace, 4, request, pid, addr, data);
if (res >= 0 && request > 0 && request < 4)
  {
    __set_errno (0);
    return ret;
  }

return res;

The test on the request (between 0 and 4) selects the PEEKUSER, PEEKTEXT, and PEEKDATA operations. For these three operations, glibc replaces the return value by the value of the data argument. For other operations, the result is just the return value of the ptrace() system call. It is also interesting to look at the ptrace() implementation in Linux kernel sources, in linux/arch/ppc/kernel/ptrace.c:sys_ptrace(), where we discover the same trick:

  /* when I and D space are separate, these will need to be fixed. */
case PTRACE_PEEKTEXT: /* read word at location addr. */
case PTRACE_PEEKDATA: {
  unsigned long tmp;
  int copied;
  copied = access_process_vm(child, addr, &tmp, sizeof(tmp), 0);
  ret = -EIO;
  if (copied != sizeof(tmp))
    break;
  ret = put_user(tmp,(unsigned long *) data);
  break;
}

Here, for PEEKTEXT and PEEKDATA, the value that will be returned to the calling program is copied at the location of the data argument, and the address of this data argument is returned to userland. As we saw, glibc will bring back the expected return value in the value returned to the calling program.

The reason why Linux does this is probably that on most platforms, the Linux kernel uses negative return values when there is an error. We already had a look to this problem in part three of this series. Hence, on the i386, if ptrace() was returning a value such as 0xfffffffe, glibc would see a negative value and would assume it is the opposite of an error code. It would therefore set errno to the opposite of 0xfffffffe, which is 2, and we would see an ENOENT error (ENOENT is errno 2). To avoid the problem, Linux must use this kludge with the data argument.

The bug here was that Linux emulation of ptrace() operations PEEKTEXT, PEEKDATA, and PEEKUSER, were not emulating this Linux-specific behavior correctly. It was just returning the requested value to userland instead of copying it at the location of the data argument and returning the address of the data argument. This problem needed two fixes. One in machine-independent code, for PEEKTEXT and PEEKDATA operations, and one in machine-dependent code for PEEKUSER. Here is the fix for PEEKTEXT/PEEKDATA, in sys/compat/linux/common/ptrace.c:linux_sys_ptrace()

  error = sys_ptrace(p, &pta, retval);
if (!error) 
  switch (request) {
    case LINUX_PTRACE_PEEKTEXT:
    case LINUX_PTRACE_PEEKDATA:
      error = copyout (retval,
        (caddr_t)SCARG(&pta, data),
        sizeof retval);
        *retval = SCARG(&pta, data);
      break;
    default:    
      break;
  }
return error;

The fix to the PEEKUSER operation stands in sys/compat/linux/arch/powerpc/linux_ptrace.c, in linux_sys_ptrace_arch(), and it is similar.

With this fix done, Linux's gdb was fully functional. It was able to trace Linux programs, and get a backtrace when a signal was caught. This functionality is especially useful because it helps us understand how Opera, or the JVM with native threads, failed getting bus errors or segmentation fault signals.

Conclusion

In this series, we examined all the different problems involved in porting Linux compatibility to NetBSD/PowerPC. Most of the problems described here were completely unexpected when I started to work on this project. My understanding of the problem was quite basic: It was just about remapping system calls. The conclusion may be that it is not mandatory to fully understand a kernel subsystem prior starting work on it, you just need an idea of how it works so that you know where you are heading. There are a lot of things that can be learned. Actually, there are number of things about kernels I learned while working on Linux compatibility.

Acknowledgements

I would like to thank Manuel Bouyer for giving me the first clue on Linux compatibility ("it works by remapping system calls"); the NetBSD tech-kern and port-powerpc mailing lists contributors for supporting me when I was integrating the Linux compatibility code for NetBSD/PowerPC; Carl Alexander, for providing me an account to a LinuxPPC machine; Kevin B. Hendricks, for his valuable help on tracking bugs that broke the JVM; Hubert Feyrer; Vincent Guillard; and Thomas Klausner for reviewing this paper; and of course, the NetBSD community, without whom this paper would not even exist.

References

Emmanuel Dreyfus is a system and network administrator in Paris, France, and is currently a developer for NetBSD.

Previously in this series

Linux Compatibility on BSD for the PPC Platform: Part 4 -- Emmanuel Dreyfus explains difficulties discovered in porting the Linux compatibility layer to run the Java Virtual Machine.

Linux Compatibility on BSD for the PPC Platform: Part 3 -- Signals are the interactions between the kernel and the user program -- a program can't run without them. Emmanuel Dreyfus explains how to make your signals Linux-compatible.

Linux Compatibility on BSD for the PPC Platform: Part 2 -- Emmanuel Dreyfus takes a look at how to prevent dynamic Linux binary compatibility problems on the NetBSD/PowerPC platform.

Linux Compatibility on BSD for the PPC Platform -- The Linux compatibility layer allows BSD to run Linux binary applications. Emmanuel Dreyfus explains how he implemented this on NetBSD for the PowerPC.


Return to ONLamp.com.





Sponsored by: