Coder Perfect

Interpreting segfault messages


Which of the following segfault messages should be interpreted correctly?

segfault at 10 ip 00007f9bebcca90d sp 00007fffb62705f0 error 4 in[7f9beb83a000+f6f000]
segfault at 10 ip 00007fa44d78890d sp 00007fff43f6b720 error 4 in[7fa44d2f8000+f6f000]
segfault at 11 ip 00007f2b0022acee sp 00007fff368ea610 error 4 in[7f2aff9f7000+f6f000]
segfault at 11 ip 00007f24b21adcee sp 00007fff7379ded0 error 4 in[7f24b197a000+f6f000]

Asked by knorv

Solution #1

This is a segfault caused by trying to run code while following a null pointer (that is, during an instruction fetch).

To find out where the error is occurring, run addr2line -e yourSegfaultingProgram 00007f9bebcca90d (and repeat for the other instruction pointer values supplied). Get a debug-instrumented build and use a debugger like gdb to reproduce the problem.

Unfortunately, you’re screwed; there’s no way to tell where the dynamic linker put the libraries in memory after the fact. Reproduce the issue using gdb.

The following is a list of the fields:

Answered by Charles Duffy

Solution #2

“The reason was a user-mode read that resulted in no page being located,” says Error 4. It may be decoded using this tool.

The kernel’s definition is as follows. Remember that 4 indicates that bit 2 is set and all other bits are unset. That becomes evident when you convert it to binary.

 * Page fault error code bits
 *      bit 0 == 0 means no page found, 1 means protection fault
 *      bit 1 == 0 means read, 1 means write
 *      bit 2 == 0 means kernel, 1 means user-mode
 *      bit 3 == 1 means use of reserved bit detected
 *      bit 4 == 1 means fault was an instruction fetch
#define PF_PROT         (1<<0)
#define PF_WRITE        (1<<1)
#define PF_USER         (1<<2)
#define PF_RSVD         (1<<3)
#define PF_INSTR        (1<<4)

The instruction pointer was at 0x00007f9bebcca90d when the segfault occurred, therefore “ip 00007f9bebcca90d” indicates the instruction pointer was at 0x00007f9bebcca90d when the segfault occurred.

“[7f9beb83a000+f6f000]” tells you:

The offset into the object is obtained by subtracting the base address from the ip:

0x00007f9bebcca90d - 0x7f9beb83a000 = 0x49090D

You can then run addr2line on it:

addr2line -e /usr/lib64/qt45/lib/ -fCi 0x49090D

It didn’t work for me since either the copy I installed wasn’t identical to yours or it was stripped.

Answered by Tim

Solution #3

Let’s go to the source — 2.6.32, for example. The message is printed by show_signal_msg() function in arch/x86/mm/fault.c if the show_unhandled_signals sysctl is set.

The term “error” does not refer to an errno or a signal number; rather, it refers to a “page fault error code” as defined by the enum x86 pf error code.

The starting address and size of the virtual memory area where the problematic item was mapped at the time of the crash are “[7fa44d2f8000+f6f000].” The value of “ip” should be within this range. With this information, finding the faulty code in gdb should be simple.

Answered by sendmoreinfo

Post is based on