« Debugging Cocoa Apps Using GDB, Assert, and More in Mac OS X

Posted by Submission on October 13, 2001 [Feedback (1) & TrackBack (0)]

by H. Lally Singh

(website | email)

BSD has a long history of software development, and along the way some powerful debugging tools have developed along with it. The most powerful debugging tools available are (in order):

  1. Simple safeguards
  2. printf and its derivatives
  3. gdb and friends

 

// Simple Safeguards

We're all experienced programmers here, and I'm not going to patronize anybody by telling them 'program well,' or 'think structured.' That's all been said a hundred times. Instead of theory, here are some useful techniques.

Assert Every Damn Thing You Can Find
Once you've gone down the painful road that only a bad pointer can bring, the best thing you can do is to abort gracefully. What that means is try not to do any more damage and get out ASAP. NSAssert() and its C library equivalent, assert(), are very useful ways to check reality and get out if things don't look right. What NSAssert() does is check a condition and abort the program with a message in case the condition is false.

When I say assert every damn thing you can find, I mean assert every condition your code requires to run correctly (within reason). It's a good way to do two things: first, find out when your code is being called incorrectly, bad parameters, bad system state, etc., or when your interfaces don't match - clients of your code assume different things than the implementation allows (e.g., whether NULL is allowed as a parameter).

Example code is always good:

- example: (int)index withPointer: (NSString *)foo
{
    // make sure we don't go beyond our array.. 
    // (yes, a very trivial example)
    NSAssert(0 <= index && 
            index < m_arraylen && 
            foo != 0, 
            "Bad Parameters in example");
    [foo access: m_array[index]];
}

What the NSAssert above does is make sure that the value index is within the bounds of zero and m_arraylen, and that the NSString pointer foo is nonzero. Note that even if foo was zero, the code wouldn't crash, but the function would do nothing, which nonetheless isn't good.

Put Tripwires on Suspect Modifications
Sometimes you've got some variables that are getting changed against your will. This can happen in all kinds of odd places: local variables, object member data, structures in arrays, etc. To track down who's messing with your memory, put some additional variables before and after your data, and set them to a constant, and check them at suspect spots. Good suspect spots are at the beginning of a method call and after suspect calls to other methods.

And again, some example code:

struct dontmesswithmymemory {
#   define TRIPVAL 0xefbeadde
    unsigned long m_tripwire_inner;
    char data[128];
    unsigned long m_tripwire_outer;
};

void blah( struct dontmesswithmymemory * p ) {
    assert(p && "Bad p for blah");
    assert( p->m_tripwire_inner == p->m_tripwire_outer 
        && p->m_tripwire_outer == TRIPVAL );
    p->data[0]=0;
}

Note that this won't work for debugging issues where another thread gets into your memory - that's why multithreaded code is challenging :-)

 

// printf and Its Derivatives (Like NSLog)

There's one thing I can't stress enough as a good debugging aid: a few well placed debug printouts can save a world of pain. Put it in key routines and add it into places that you suspect are giving you problems. Print out a few variables, method entry and exits, and a few other key pieces of data and you have a good running log of what your program is doing.

They also don't have to cost anything after you're done debugging. A simple level of indirection can be very useful: here's a wrapper function to printf, fprintf, or NSLogv:

#ifdef DEBUG
void DPrintf(NSString *fmt,...) {
    va_list ap;
    va_start(ap,fmt);
    NSLogv(fmt,ap);
}
#endif

The function would have a prototype defined as follows:

#ifdef DEBUG
void DPrintf(NSString *fmt,...);
#else
inline void DPrintf(NSString *fmt,...) {}
#endif

So that when you're not debugging (and DEBUG isn't defined), there is no cost for the function - it's inlined and optimized out.

 

// Debugging With gdb

I can guarantee you that the tips above (and any others that help identify and trap bugs) will be far more useful than any debugger will ever be - you control the code, and you have the ability for it to tell you what's going on. I don't mean to patronize you with all these ways to avoid good old-fashioned debugging, but I find myself and many of my collegues forgetting these simple truths.

With all the warnings and alternatives mentioned, we can get on to the debugging.

Let's get ourselves an interesting example to work with, something worth our time. I like to play with cryptography, and in cryptography, a dictionary is useful. Some subroutines that let me check if certain strings are English words can be very useful in filtering 2GB of output.

Luckily, Mac OS X ships with a dictionary: /usr/share/dict/words. It's an alphabetized list of 234,937 words with one per line. So, all we have to do is parse it into something quickly accessible. Instead of reading in and allocating 2.4 megabytes of data, there's a much, much easier way: mmap.

mmap lets you have a file content's get inserted into your address space directly - consider it like a really, really efficient way of reading in a file into an array that you never allocate and doesn't actually take up memory (in the normal sense).

We've got just more than a trivial amount of code with a touch of complexity: important system calls that may fail (mmap is touchy about permissions), loops, pointer arithmetic, and even a favorite CS algorithm. We'll touch it a little bit with some trivial examples here, but it's a decent place to test different debugging techniques - try some at home, this code works (at least in the good 1/2 hour I tested it :-) and can be used as a good playground for learning gdb. Call the routine init_dictionary once at the beginning of your program and then find_word with a few strings to see what's in the dictionary.

Here's some source:

#import <string.h>
#import <stdlib.h>
#import <stdio.h>
#import <sys/stat.h>
#import <sys/types.h>
#import <sys/mman.h>
#import <ctype.h>
#import <fcntl.h>
#import <errno.h>

// output of 'wc -l /usr/share/dict/words'
#define NWORDS 234937

// a copy of the dictionary with looser permissions
#define DICT "/usr/share/dict/words"

// pointers into an mmap'd word file
static char * s_words[NWORDS];
static int nr_words;

// handle of the mmap'd file
static int s_fd;

// opens the dictionary, mmap's it, and then stores
// pointers into it in our list. Note: the dictionary
// is already sorted, and we keep it that way
int init_dictionary() {
    char *p, *e;    // ptrs to begin and end of file region
    int i;          // loop index
    struct stat st; // file status (for getting file length)

    // open the file
    if ((s_fd = open(DICT, O_RDONLY))<0) {
        perror(DICT);
        return errno;
    } 

    // get its length
    fstat(s_fd,&st);

    // map it into our address space
    p = (char*) mmap(0, st.st_size, 
               PROT_READ | PROT_WRITE, 
               MAP_FILE | MAP_PRIVATE, 
               s_fd, 0);

    // check return of mmap
    if (p==(char*)-1) {
        perror("mmap");
        return errno;
    }

    // calculate our end address
    e = p + st.st_size;

    // add words into our dictionary.
    for(i=0; i < NWORDS && p < e; i++) {
        while (isspace(*p))
            p++;
        nr_words++;
        s_words[i]=p;
        while (!isspace(*p))
            p++;
        *p++=0;
    }
    printf("(%d words loaded)\n", nr_words);
    return 0;
}

// binary search for a word w in the dictionary.
int find_word( NSString * s ) {
    int lo, mid, hi,r;
    char * w;
    w = [s cString];
    lo = 0;
    hi = NWORDS-1;
    while (lo < hi) {
        mid=(hi-lo)/2 + lo;
        r = strcasecmp(w,s_words[mid]);
        if (r==0) {
            return 1;
        } else if (r < 0) {
            hi = mid-1;
        } else {
            lo=mid+1;
        }
    }
    return 0;
}

I'll assume that most people want to start off in Project Builder. I'll also assume that most people can figure out how to use the basic features of the (rather sparse) debugging abilities within it. Basically, you can set breakpoints and follow the code's execution. You get a quick view of the stack frame, some local vars and arguments, and also the ability to switch threads.

This is really a disgusting underutilization of gdb's abilities. Click on the "Debug" tab, and then click on the "Console" tab. This is gdb's command line - beyold the beauty of the BSD core seeping underneath OS X. It is your new best friend (if you're here, then your old best friends assert() and printf() have failed you, and you should demote them to acquaintances).

Mad gdb Skillz
In order to convince you of why gdb is worthy of your acquaintance, I'll show one of the advanced features of gdb: conditional breakpoints. Other advanced features include remote debugging over a serial line and the ability to debug a running kernel (very platform dependant..). Let's get on with the demonstration.

After hitting the 'debug' button, gdb's console has:

GNU gdb 5.0-20001113 (Apple version gdb-186.1) (Sun Feb 18 01:18:32 GMT 2001) (UI_OUT)
run
Copyright 2000 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "powerpc-apple-macos10".
Reading symbols for shared libraries ... done
[Switching to thread 1 (process 1383 thread 0x1a03)]
Reading symbols for shared libraries .................................... done
(gdb) 

It's ready to do your bidding. Feel free to use the normal code windows to set up breakpoints. Let's start off debugging the binary search routine: find_word. Let's put a breakpoint on the line mid=(hi-lo)/2 + lo;. Just click on the left margin to the side of the line and the breakpoint is set. Then, find out what the breakpoint number is with the 'info breakpoints' command:

(gdb) info breakpoints
Number Type           Disposition Enabled Address    WhatStackFrame Condition IgnoreCount Commands 
3      breakpoint     keep        y   0x00003ac0 in find_word at score.m    :90        
(gdb)

In my case this breakpoint is breakpoint #3. I had two others that I have since removed. Now, here is one of the more interesting gdb commands: 'condition'. What it lets you do is make a breakpoint conditional. For example, I want to see how the loop runs when hi-lo is 1. So, I tell gdb that I only want to break when lo==(hi-1). I do this with:

(gdb) condition 3 lo==(hi-1)

And another info breakpoints reports:

(gdb) info breakpoints
Number Type           Disposition Enabled Address    WhatStackFrame Condition IgnoreCount Commands 
3      breakpoint     keep        y   0x00003ac0 in find_word at score.m    :90        
    stop only if lo == hi - 1 
(gdb) 

So, now I tell it to continue on, either by hitting the continue button in the toolbar, or by typing 'continue' (or just 'c' - you can abbreviate in gdb whenever it's unambiguous).

The breakpoint only gets hit if lo and hi differ by just one, which doesn't always happen (sometimes the middle string is the right one). This really helps when I have programs that have to go through many iterations before the problems appear.

Using gdb
Now that you've seen how powerful gdb can be, let's go over what's necessary to use it in normal, everyday life. gdb is obviously a text based interface, but that's not a bad thing at all. You can abbreviate any command to any prefix of it, as long as it's not ambiguous (e.g. 'run' can be abbreviated as 'r'). You have a full history of your commands, and can scroll through them with the up and down arrow keys. You can use Project Builder to peruse your code and set breakpoints, and then do your real examination with the gdb console.

Let's talk about the stack. As you most likely already know, the stack is the record of a procedure's state and local variables when it calls a procedure. One of the first reflexes I have when a program crashes is to run it in gdb and do a 'backtrace' (abbreviated 'bt'), which prints out the stack from the innermost procedure call to the outermost one. Here's an example...

Breakpoint 1, find_word (s=0xea000) at score.m:87
87              w = [s cString];
(gdb) bt
#0  find_word (s=0xea000) at score.m:87
#1  0x000036a0 in main (argc=1, argv=0xbffffb2c) at main.m:8
#2  0x000035ac in _start ()
#3  0x000033ec in start ()
#4  0x00000001 in ?? ()
(gdb) 

So breakpoint 1 was hit in find_word line 87 of score.m, which was called by main in line 8 of main.m. I can hit the 'list' command (abbreviated with 'l') to see what line I'm stopped on...

(gdb) l
83      // binary search for a word w in the dictionary.
84      int find_word( NSString * s ) {
85              int lo, mid, hi,r;
86              char * w;
87              w = [s cString];
88              lo = 0;
89              hi = NWORDS-1;
90              while (lo<hi) {
91                      mid=(hi-lo)/2 + lo;
(gdb) 

Line 87, 'w = [s cString];' is where my breakpoint went off. If I want to go up the stack, I just type 'up', and I go to find_word's caller: main.

(gdb) up
#1  0x000036a0 in main (argc=1, argv=0xbffffb2c) at main.m:8
8           b = find_word(@"hello");
(gdb) 

Say I'm curious to see what s passes back from it's cString method. I just hit 'next' (or 'n') to move past line 87 of score.m, and then I can just check what w's value is:

(gdb) n
88              lo = 0;
(gdb) p w
$1 = 0x3ce0 "hello\000"...
(gdb) 

That looks right, a null terminated string with the same parameter I pass it from main ("hello").

I know you're getting bored at this point, and so am I. Debugging isn't too much fun most of the time, and that's why I recommend the other techniques to avoid going through this drudgery. So, I've given you a little taste of what interacting with gdb is like, and a glimpse of what it can do for you. I'll finish up my discussion of it with a quick reference to the most commonly used features:

  • help - Just say 'help' and it'll tell you what commands are available, and can provide in depth help on specific topics.

  • print - Print out specific variables or the values of various expressions. Also, you can assign expressions to convenience variables, and then just use the name of the convenience variable instead of the expression. You can abbreviate it with 'p'. Here are some examples:
    • print foo - prints out the value of variable foo
    • p *foo - prints out the value pointed to by foo
    • print *(int*) foo - prints out an int that foo points to (even if foo isn't an int*)
    • print $x=*foo - saves the expression *foo to $x, and prints it out.
    • print $x - prints out the value pointed to by foo (assuming you've assigned $x=*foo as above)

  • list - Lists some code. You can either list a function or a source file. To list a function, simply supply the function name. To list a file, provide the filename, a colon, and a line number. You can abbreviate it with an 'l'. Here are some examples:
    • list find_word - prints out the first few lines of find_word
    • l main.m:1 - prints out the first few lines of main.m

  • break - Sets up a breakpoint. Give it a line number if the current file is the right one (listing a source file sets the current file), or prefix it with a filename and a colon. Abbreviate it with just a 'b'. And again, some examples:
    • b 98 - puts a breakpoint on line 98 of the current file
    • break foo.m:9 - puts a breakpoint on line 9 of foo.m

  • set args arg1 arg2 .. argN - Set the program arguments for future executions.

  • run - Run the program.

  • continue - Continue after a break.

  • kill - Kill the running program.

  • next - Go onto the next line of code.

  • step - Step into the function.

  • condition - Set a breakpoint conditional upon an expression.

  • info breakpoints - Show breakpoints and their status.

  • <enter> - Repeat the last command (useful for next, step, list, and a few others).

 

// Just the Beginning...

To conclude, some common practices and simple tricks can save us a lot of debugging time. However, when those tricks have run out, we have one great debugger on our side. Also, OS X is just beginning to see the development support coming in from its UNIX roots. Soon we'll see debugging memory allocators (to help find heap problems and leaks), specialized debuggers, and profiling tools. I haven't seen much yet past what comes on the Developer's CD for OS X, so we'll have to wait a bit before they come out.

DDD Screenshot
There is one UNIX tool I have to mention specifically: DDD.

DDD is a wonderful tool for debugging programs with large datasets or complex data structures. For that reason, it's a great debugging tool for those just getting started with programming. DDD: The Data Display Debugger lets you graphically display your data structures and arrays. This means that you can see the structure of your binary trees as they exist in memory at any point in your program!

Fink has DDD all packaged up for your downloading pleasure (note: if you haven't installed fink before, save up yourself some time, disk space, and bandwidth!).

And finally, Max Horn pointed out a very handy Foundation header: /System/Library/Frameworks/Foundation.framework/Headers/NSDebug.h. It's completely self-documenting & has many tools to help you track allocation, reference counting, and stack information.


Comments

Hi,
i did read today your article, and thanks for writting this. You wrote also about ddd as a completed package. But I cannot findit. Ok we are now in 2003 but maybe you know where i can find a binairy, without loading fink and learning fink before I can start using ddd :-)
hope you can help me out.

thnx

René

Posted by: rene Amerongen on March 9, 2003 04:21 PM
Post a comment