« Running Multiple Threads in Cocoa

Posted by Submission on November 04, 2001 [Feedback (0) & TrackBack (0)]

by H. Lally Singh

(website | email)

Multithreaded programming is an essential resource to modern programming. Its use permeates through all types of programming, from embedded systems to the desktop all the way to supercomputing applications. Unfortunately, it's also got a reputation for being hard, which really isn't the case. All that it requires is some careful forethought and a good understanding of what's going on.

A thread is essentially another place where your code is running in the same program. Every program has at least one thread, and some programs have more. They come up all the time and are honestly pretty common practice for most nontrivial programming tasks.

There are many reasons to have multiple threads in your application:

  1. Support multiple processors
  2. Long running background computational tasks
  3. Slow I/O
  4. Many I/O tasks
  5. Separate tasks that need to run simultaneously

All of these reasons boil down to the same one of three: I want to use more than one CPU, my code makes much more sense as independant tasks, or I could do this better if this operation didn't block. Many of the reasons listed above fall into the last category. Much of a program's time is spent waiting for something to happen. Most user applications sit there and wait for the user to do something, like a mouse moving, or a click. Until then, they block. Blocking is sitting idle while other programs get to use the CPU and you don't until whatever you're blocked waiting for happens.


// Memory Layout

For a single-threaded program, the kind you write when you're not multithreaded (duh), your memory layout looks like this:

Single Threaded Program Memory Layout
Memory layout for a single-threaded process.
An address space and one thread with all its memory mapped to it.

For a multithreaded program, the memory layout looks like this:

Multithreaded Program Memory Layout
Memory layout for a multithreaded process.
An address space and multiple threads sharing it.

The process (the running program that the threads belong to) contains all the memory shared between all the threads. The threads share all of the process's memory at the same addresses except for their stacks. Within this shared memory lies both the power and the difficulty with multithreaded programming. You can easily access data & share it between threads, but you can also completely screw up data structures. So, you have to be very careful. For example, say you had some linked list code:

// Obviously simplified linked list code
void push_back(LIST *l, int v) {
   NODE * n = malloc(sizeof(NODE));   // line 1
   n->value = v;                      // line 2
   n->next = 0;                       // line 3
   l->tail->next = n;                 // line 4
   n->tail = n;                       // line 5

Say that I had two threads running the same code on the same list at the same time. Thread 1 just finishes line 3 and Thread 2 is at line 4. Say that Thread 1 continues on to finish push_back and then Thread 2 runs next. So, now we have something like this mess:

One screwed up linked list
What happens when you let two threads mess with the same linked list.

Obviously enough, we're heading towards a program crash, data loss, or both without some kind of protection against this kind of problem. All you really have to do is keep the two threads from accessing the same data at the same time. For that reason, Cocoa provides us with some classes to help keep things straight. They're called the Synchronization Classes: NSLock, NSConditionLock, and NSRecursiveLock. They all conform to the NSLocking formal protocol. This protocol has two methods: lock and unlock. The idea is that each thread 'locks' the data they're about to modify, so that no other thread will try to modify it or read it while it's being modified.

A thread-safe implementation of our obviously simplified linked-list code would be:

// Obviously simplified linked list code
// (now thread-safe)
void push_back(LIST *l, int v) {
   NODE * n = malloc(sizeof(NODE)); // line 1
   n->value = v;                    // line 2
   n->next = 0;                     // line 3
   [l->lock lock];                  // line 4
   l->tail->next = n;               // line 5
   n->tail = n;                     // line 6
   [l->lock unlock];                // line 7

Basically, all modifications to the actual linked list are done one at a time, or are serialized if you like big words. So no matter which order the threads run the different parts of push_back, they'll have to access the locked section one at a time, so no corruption is possible. Let's talk more about these Synchronization Classes and what they can do for you.


// Synchronization Classes

Cocoa provides us with several ways to let our threads interact with data and with each other without all of the corruption pain listed above. In reality, they are just simple classes that are completely thread safe, but they can be used as parts of interaction protocols that you design between your threads (not to be confused with informal or formal Objective-C protocols). You have to decide (and document!) the order that the different threads operate on the same code or data, and these Synchronization Classes provide you with the means to implement these protocols.

For example, in the code fragment above, the protocol was for each thread to go ahead & create their own NODEs, but then use the lock variable within the LIST to serialize access to the list.

Cocoa provides classes for more than just locking; you can also signal other threads and even send messages to objects in other threads (more on that later). The basic classes were listed above, but here they are again, with actual descriptive text:

  • NSLock - Basic serialized locking mechanism
  • NSConditionLock - Lets one thread block until another one tells it that some condition has occurred.
  • NSRecursiveLock - Lets a thread lock this object multiple times without blocking on itself

All of these classes let you control the interaction between threads. NSLock can be thought of as a token (even though the name doesn't seem to say it) - one thread has it at a time (by locking it), and every other thread that wants it has to wait for it (they sit blocked in their attempt to lock it). NSLock is a very basic but versatile tool. It can be used for regions of code, variables, library access, or anything else you want serialized.

NSConditionLock is in essence a signalling system to let one thread know when some condition has been satisfied (hence the name). For example, say I'm generating input in one thread that another one needs to consume (I apologize for the contrived example. NSConditionLocks are useful in real life, I promise!), NSConditionLock stores a state value within it, called the 'condition,' which lets the consuming thread block until that state is equal to the one it's waiting for. In this case, blocking is good because the consuming thread can't do anything useful, so it's best that it doesn't take any CPU time. Here's the setup:

// Global data (visible from both generating &consuming threads)
int g_data[128];                // global data
NSConditionLock * g_dataLock;   // lock on g_data
enum { HAS_DATA, NO_DATA };
// Initialization - done before the two threads are set up.
g_dataLock = [[NSConditionLock alloc] initWithCondition: NO_DATA];

Now that we have our data buffer, and an NSConditionLock to lock it, initialized to NO_DATA, we can write our generating thread:

// Generating thread -- creates data that another thread consumes
while (true) {
   int newdata[128];
   generate(newdata);                           // generate new data
   [g_dataLock lockWhenCondition: NO_DATA];     // wait until the current 
                                                //  data has been consumed,
                                                //  then lock.
      memcpy(g_data, newdata, sizeof(g_data));  // copy new data in
   [g_dataLock unlockWithCondition: HAS_DATA];  // unlock the data, with it 
                                                //  in the HAS_DATA state.

Here, the generator's role in the protocol is essentially this: generate new data, lock the global data when there isn't any unconsumed data left (state is NO_DATA), copy the new data over the old global data, and then unlock it with its state set to HAS_DATA. So, the consumer's role seems pretty simple: wait until there's some data, consume it, and then set the state to NO_DATA:

// Consuming thread - uses data generated by generating thread
while (true) {
   int data[128];
   [g_dataLock lockWhenCondition: HAS_DATA];   // wait till we have data
      memcpy(data,g_data,sizeof(data));        // get it
   [g_dataLock unlockWithCondition: NO_DATA];  // unlock and set state to 
                                               //  NO_DATA

And there we go, we have a perfectly synchronized generator & consumer.

NSRecursiveLock lets you write code that may call other code using the same lock as you, but shouldn't block because the thread already owns the lock. Or in other words, a lock that doesn't block on itself when you lock a variable that you've already locked in that thread. Say we had the following code:

// Recursive deadlock

int state;
NSLock * stateLock;

void increment() {
   [stateLock lock];
   [stateLock unlock];

void add2() {
   [stateLock lock];
   [stateLock unlock];

I again apologize for the trivial examples. I had a very trivial childhood.

The problem with this code is in add2: it calls increment. The reason it's a problem is that stateLock is already locked when increment tries to lock it, and it will block. Unfortunately, it will never unblock, because it's in the same thread as the one that has the lock. This situation of being blocked with no hope of ever running again is so common that it has its own (ominous) name: deadlock. It's a problem in real code whenever you have routines that may call each other that share the same lock.

The way to fix it is simple: replace NSLock with NSRecursiveLock. NSRecursiveLock doesn't have any new methods, it just has different semantics than NSLock. Here's code that will work correctly (never mind the fact that this code does nothing useful; at least it's thread-safe :-):

// Recursive deadlock

int state;
NSRecursiveLock * stateLock;

void increment() {
   [stateLock lock];
   [stateLock unlock];

void add2() {
   [stateLock lock];
   [stateLock unlock];

Notice that only one line changed, and that was the type declaration for stateLock. Simple, easy, elegant; the way all libraries should be.


// Thread Safe Libraries

Knowing which libraries you use are thread-safe is critical to successfully programming with multiple threads. Not all libraries and frameworks are thread-safe. In fact, many of them aren't. Cocoa, for example, is generally not thread-safe. Most of the C library is thread safe, except where specifically noted in the documentation (strtok comes to mind). The rule is fairly simple: any library code that depends on some form of internal state (like strtok's static pointers) is not thread-safe. Also, libraries that have their own data structures are generally to be considered not thread-safe unless they tell you otherwise. A good place to look is in the documentation for the functions you are using.

Just to let you rest at ease, all the standard C library I/O routines are thread-safe, so use printf with joy! Even though they have their own internal state (the I/O buffers), they protect their state through the use of thread-safe locks like the ones mentioned above.

For libraries and frameworks that you have to use that aren't thread-safe (like Cocoa), you have to figure out some kind of locking protocol for it. If for example you need to use strtok, use an NSLock around it's use in all your code, so that only one thread accesses it at a time (or just use strrtok). For complex libraries like Cocoa which have their own ever-changing state, for which only one thread (the main one) will ever be notified about, you'll have to do something different. Most of the time, it's best to just have one thread that does all the user-interface work and have other threads do all the other work in the background; this way the program always feels responsive to the user.


// Creating A Thread

Now that you know how to keep your code thread safe, and how to make sure you're keeping your accesses to libraries and frameworks safe, you can now finally learn the secret to making threads:

// Creates a new Generator class and runs its generatorThreadFunc: routine in 
// a new thread.
Generator * gen = [[Generator alloc] init];

[NSThread detachNewThreadSelector: @selector(generatorThreadFunc:)
   toTarget: gen withObject: nil];

What, you thought it would be hard? Come on, this is Cocoa! Co-Coa! Who's your daddy?

NSThread's detachNewSelector:toTarget:withObject: method basically creates a new thread that does this (from the parameters given above):

[gen generatorThreadFunc:nil];

The last parameter of detachNewSelector:toTarget:withObject: is data that's passed on untouched to the method whose selector you gave. Use it as you wish, or leave it nil if you don't need it.

When generatorThreadFunc: returns, the thread dies a natural death. The thread can die at any time if generatorThreadFunc: (or some method or function that it calls) calls [[NSThread currentThread] exit]. This will exit the current thread. Interestingly enough, you don't get any form of handle to the thread you create - just set up some protocol to do it yourself, such as adding another element to an array of NSThreads (with some form of thread-safe locking, of course).


// Killing a Thread

Now that you've created these threads, how do you destroy them? Well there are several ways. The easiest one is to just to have the main thread (the one that runs main()) exit, which automatically kills any other threads in the process. Now if you don't want to go out kamikaze style, there are two other ways:

  • Have the thread's primary method return - The method whose selector you have in detachNewSelector:toTarget:withObject: will automatically kill its thread when it returns, just like main kills the program when it returns.
  • Call the thread's exit method - If you're in a thread that you want to die, you can just call NSThread's exit method. This will kill the current thread. NSThread doesn't provide a way to kill it when it's not the current thread.


// Conclusion

Oh yeah, did I promise to show you how to call methods in objects in other threads? I will, just not in this article :-). Look for the upcoming one on Distributed Objects and I'll show you. Trust me :-)

Other than some shattered dreams of inter-object inter-thread communication, we went through most everything one needs to know about getting started with multithreaded applications. I strongly recommend (really even!) that you read the documentation on the classes I mentioned above, because they're full of examples and in depth documentation that I can't cover here without writing a book on the topic.

Enjoy multithreading & remember to use printf liberally when debugging!

Post a comment