Introspection: January 2012

Monday, January 30, 2012

Multiple Reader Single Writer Locks

A simple bare-bone version of a "multiple-reader-single-writer" program.

Note, that a single writer here implies, that only a single thread can perform a write at a time. There can be multiple writer threads though. Here, we're not modifying any shared buffers. This code is just added to understand the nuances of the locking mechanism involved when there are multiple read / write threads. This code handles both "writer starvation" and "reader starvation" cases. The output pasted below should be an interesting part for introspection:

#include <stdio.h>
#include <pthread.h>

typedef int bool;

/* Current lock status:
   Valid values are one of: [ -1 0 1 2 3 4 5 .. numReaders ]
   -1 indicates a single writer is currently active.
    0 indicates no writer and readers are currently active.
   >0 indicates the number of readers currently active.

   Primary usage:

   This is the indirect single dimension lock, which tells us whether
   we need to wait, or it's ok to proceed.
*/
signed rwIndex = 0;

/* Number of waiting writers:
   Valid values: >= 0

   =0 indicates that no writers are waiting. This means one of the following:
   - no writers and readers are active.
   - 1 writer is active, and there is no second writer now.
   - there are >0 readers, but no writers currently active.

   >0 indicates that atleast 1 writer is waiting. This means one of the following:
   - Few readers are active, thereby making the writers to wait till they're done.
   - 1 writer is active, thereby making the other writers wait.

   Primary usage:

   This variable is used to find, whether we need to signal a single
   waiting writer or signal all waiting readers. Waiting writers are
   signalled before any waiting readers.
 */
unsigned numWaitingWriters = 0;

/* These 2 TSDs ( Thread specific data ) are just used for convenience
   sake. 

   * simpleThreadKey - This is used to hold thread id's starting from
   value 1. This is simple and useful rather than printing some
   platform specific thread ID like 1231294959.

   * rwKey - This indicates the type of the thread ( reader / writer
   ). 

   Primary Usage:
   
   Both are useful in analyzing the lock flow among the threads. Refer
   program output. */
pthread_key_t rwKey;
pthread_key_t simpleThreadKey;

/* Starting thread id used in simpleThreadKey */
unsigned initialThreadId = 1;

/* We synchronize on a single lock. */
pthread_mutex_t rwLock = PTHREAD_MUTEX_INITIALIZER;

/* We need separate condition variables for readers and writers.  */
pthread_cond_t readerWaitingForSignal = PTHREAD_COND_INITIALIZER;
pthread_cond_t writerWaitingForSignal = PTHREAD_COND_INITIALIZER;




/* Prints the current lock status of the program wrt to the current
   thread */
void printStatus( const char *function, const char *marker ) {
  char type = ( ( unsigned ) pthread_getspecific( rwKey ) ) ? 'W' : 'R';
  unsigned threadId = ( unsigned ) pthread_getspecific( simpleThreadKey );

  printf( "%c%-2d %18s::%-40s rwIndex( %2d ) waitingWriters( %2u )\n", 
      type, threadId, function, marker, rwIndex, numWaitingWriters );
}


/* This function is normally a dummy, if there are no waiting writers
   and the first reader already has obtained a lock. Else, we have to
   wait if there is an active writer or waiting writers. */
void getReadLock() {
  pthread_mutex_lock( &rwLock );
  
  /* 1. If there is a single writer already writing ( rwIndex == -1 )  */
  /* 2. [1] + more writers waiting ( rwIndex < -1 ) */
  /* 3. If there is writer waiting for readers to clear up ( numWaitingWriters > 0 )  */
  while ( rwIndex < 0 || numWaitingWriters ) {
    printStatus( __FUNCTION__, "Waiting for signal" );
    pthread_cond_wait( &readerWaitingForSignal, &rwLock );
    printStatus( __FUNCTION__, "Got signal" );
  }

  rwIndex++;
  printStatus( __FUNCTION__, "Got read access" );

  pthread_mutex_unlock( &rwLock );
}


/* We have to wait in the case of active readers or a single active
   writer. Free to proceed otherwise. */
void getWriteLock() {
  pthread_mutex_lock( &rwLock );
  
  while ( rwIndex != 0 ) {
    numWaitingWriters++;
    printStatus( __FUNCTION__, "Waiting for signal" );
    pthread_cond_wait( &writerWaitingForSignal, &rwLock );
    printStatus( __FUNCTION__, "Got signal" );
    numWaitingWriters--;
  }

  /* Cool ... Got the write lock. All other readers and writers can
     wait. */
  rwIndex = -1;
  printStatus( __FUNCTION__, "Got write access" );

  pthread_mutex_unlock( &rwLock );
}


/* Nothing big here. If you're the last reader, notify any waiting
   writers ( even if there are waiting readers ). The writers will
   notify the waiting readers, when they're done. */
void releaseReadLock() {
  pthread_mutex_lock( &rwLock );

  rwIndex--;
  printStatus( __FUNCTION__, "I'm done reading" );

  if ( rwIndex == 0 ) {
    if ( numWaitingWriters ) {
      printStatus( __FUNCTION__, "Signal waiting writers first" );
      /* Only 1 writer can be active at a time. So, no point wasting a
     broadcast. Just signal here. */
      pthread_cond_signal( &writerWaitingForSignal );
    } else {
      printStatus( __FUNCTION__, "I'm the last reader. No waiting writers" );
    }
  }

  pthread_mutex_unlock( &rwLock );
}


/* Signal waiting writers first. Signal waiting readers next. */
void releaseWriteLock() {
  pthread_mutex_lock( &rwLock );

  rwIndex = 0;
  printStatus( __FUNCTION__, "I'm done writing" );

  /* Notify writers first */
  if ( numWaitingWriters  ) {
    printStatus( __FUNCTION__, "Wake writers now. Readers later" );
    /* Only 1 writer can be active at a time. So, no point wasting a
       broadcast. Just signal here. */
    pthread_cond_signal( &writerWaitingForSignal );
  } else {
    /* Signal all readers, if there are no waiting writers */
    printStatus( __FUNCTION__, "No writers. Signalling all readers" );
    pthread_cond_broadcast( &readerWaitingForSignal );
  }

  pthread_mutex_unlock( &rwLock );
}


/* Common init code for readers and writers */
void threadCommon( bool isWriter ) {
  pthread_setspecific( rwKey, ( void * ) isWriter );
  pthread_setspecific( simpleThreadKey, ( void * ) initialThreadId );

  initialThreadId++;

  /* This sleep is here to make sure that all 'lock fighting' among
     readers and writers start almost at the same time.  */
  sleep( 1 );
}


void* writeToBuffer( void *arg ) {
  threadCommon( 1 );

  getWriteLock();

  /* Write something to buffer */

  releaseWriteLock();
}


void* readFromBuffer( void *arg ) {
  threadCommon( 0 );

  getReadLock();

  /* Read something from buffer */

  releaseReadLock();
}


int main() {
  const unsigned numReaders = 5;
  const unsigned numWriters = 5;

  pthread_t reader[ numReaders ];
  pthread_t writer[ numWriters ];

  pthread_attr_t attr;
  pthread_attr_init( &attr );
  pthread_key_create( &rwKey, NULL );
  pthread_key_create( &simpleThreadKey, NULL );

  int i = 0;

  for ( i = 0 ; i < numReaders; i++ )
    pthread_create( &reader[ i ], &attr, readFromBuffer, NULL );

  for ( i = 0 ; i < numWriters; i++ )
    pthread_create( &writer[ i ], &attr, writeToBuffer, NULL );


  for ( i = 0 ; i < numReaders; i++ )
    pthread_join( reader[ i ], NULL );

  for ( i = 0 ; i < numWriters; i++ )
    pthread_join( writer[ i ], NULL );

  return 0;
}

Output:

R2         getReadLock::Got read access                          rwIndex( 1 ) waitingWriters( 0 )
R3         getReadLock::Got read access                          rwIndex( 2 ) waitingWriters( 0 )
R4         getReadLock::Got read access                          rwIndex( 3 ) waitingWriters( 0 )
R5         getReadLock::Got read access                          rwIndex( 4 ) waitingWriters( 0 )
W6        getWriteLock::Waiting for signal                       rwIndex( 4 ) waitingWriters( 1 )
W7        getWriteLock::Waiting for signal                       rwIndex( 4 ) waitingWriters( 2 )
W8        getWriteLock::Waiting for signal                       rwIndex( 4 ) waitingWriters( 3 )
R1         getReadLock::Waiting for signal                       rwIndex( 4 ) waitingWriters( 3 )
W9        getWriteLock::Waiting for signal                       rwIndex( 4 ) waitingWriters( 4 )
W10       getWriteLock::Waiting for signal                       rwIndex( 4 ) waitingWriters( 5 )
R2     releaseReadLock::I'm done reading                         rwIndex( 3 ) waitingWriters( 5 )
R3     releaseReadLock::I'm done reading                         rwIndex( 2 ) waitingWriters( 5 )
R4     releaseReadLock::I'm done reading                         rwIndex( 1 ) waitingWriters( 5 )
R5     releaseReadLock::I'm done reading                         rwIndex( 0 ) waitingWriters( 5 )
R5     releaseReadLock::Signal waiting writers first             rwIndex( 0 ) waitingWriters( 5 )
W6        getWriteLock::Got signal                               rwIndex( 0 ) waitingWriters( 5 )
W6        getWriteLock::Got write access                         rwIndex( -1 ) waitingWriters( 4 )
W6    releaseWriteLock::I'm done writing                         rwIndex( 0 ) waitingWriters( 4 )
W6    releaseWriteLock::Wake writers now. Readers later          rwIndex( 0 ) waitingWriters( 4 )
W7        getWriteLock::Got signal                               rwIndex( 0 ) waitingWriters( 4 )
W7        getWriteLock::Got write access                         rwIndex( -1 ) waitingWriters( 3 )
W7    releaseWriteLock::I'm done writing                         rwIndex( 0 ) waitingWriters( 3 )
W7    releaseWriteLock::Wake writers now. Readers later          rwIndex( 0 ) waitingWriters( 3 )
W8        getWriteLock::Got signal                               rwIndex( 0 ) waitingWriters( 3 )
W8        getWriteLock::Got write access                         rwIndex( -1 ) waitingWriters( 2 )
W8    releaseWriteLock::I'm done writing                         rwIndex( 0 ) waitingWriters( 2 )
W8    releaseWriteLock::Wake writers now. Readers later          rwIndex( 0 ) waitingWriters( 2 )
W9        getWriteLock::Got signal                               rwIndex( 0 ) waitingWriters( 2 )
W9        getWriteLock::Got write access                         rwIndex( -1 ) waitingWriters( 1 )
W9    releaseWriteLock::I'm done writing                         rwIndex( 0 ) waitingWriters( 1 )
W9    releaseWriteLock::Wake writers now. Readers later          rwIndex( 0 ) waitingWriters( 1 )
W10       getWriteLock::Got signal                               rwIndex( 0 ) waitingWriters( 1 )
W10       getWriteLock::Got write access                         rwIndex( -1 ) waitingWriters( 0 )
W10   releaseWriteLock::I'm done writing                         rwIndex( 0 ) waitingWriters( 0 )
W10   releaseWriteLock::No writers. Signalling all readers       rwIndex( 0 ) waitingWriters( 0 )
R1         getReadLock::Got signal                               rwIndex( 0 ) waitingWriters( 0 )
R1         getReadLock::Got read access                          rwIndex( 1 ) waitingWriters( 0 )
R1     releaseReadLock::I'm done reading                         rwIndex( 0 ) waitingWriters( 0 )
R1     releaseReadLock::I'm the last reader. No waiting writers rwIndex( 0 ) waitingWriters( 0 )

Friday, January 27, 2012

C++ exception internals

In this post, we'll try to analyze the internals of how exception handling works in C++. Actually, I thought this should be pretty straightforward, given the simplicity of the usage of exceptions in C++. I was wrong. Quite a lot of interesting stuff happens internally, which wasn't very obvious for me at first. So, let's go find out.

So, what is the big deal with exception handling ?

There is one big deal at-least. When an exception is thrown, the exception handling mechanism has to unwind all stacks until it finds a stack which handles it. So, let's first focus on this section of exception handling: throwing an exception.

1. Throwing an exception:

Let's start with the simple class below:

class A {
public:
A() {}
~A() {}
};

class B {
public:
B() {}
~B() {}
};

class C {
public:
C() {}
~C() {}
};

void function1() {
A a;
B b;
C c;
}

int main() {
function1();
return 0;
}

function1() is our main function of interest here. So, let's disassemble it. Below is the assembly code for it.

(gdb) disassemble function1
Dump of assembler code for function function1():
   0x080484e4 <+00>:    push   %ebp
   0x080484e5 <+01>:    mov    %esp,%ebp
   0x080484e7 <+03>:    sub    $0x28,%esp
   0x080484ea <+06>:    lea    -0xb(%ebp),%eax
   0x080484ed <+09>:    mov    %eax,(%esp)
   0x080484f0 <+12>:    call   0x804859c <A::A()>
   0x080484f5 <+17>:    lea    -0xa(%ebp),%eax
   0x080484f8 <+20>:    mov    %eax,(%esp)
   0x080484fb <+23>:    call   0x80485a8 <B::B()>
   0x08048500 <+28>:    lea    -0x9(%ebp),%eax
   0x08048503 <+31>:    mov    %eax,(%esp)
   0x08048506 <+34>:    call   0x80485b4 <C::C()>
   0x0804850b <+39>:    lea    -0x9(%ebp),%eax
   0x0804850e <+42>:    mov    %eax,(%esp)
   0x08048511 <+45>:    call   0x80485ba <C::~C()>
   0x08048516 <+50>:    lea    -0xa(%ebp),%eax
   0x08048519 <+53>:    mov    %eax,(%esp)
   0x0804851c <+56>:    call   0x80485ae <B::~B()>
   0x08048521 <+61>:    lea    -0xb(%ebp),%eax
   0x08048524 <+64>:    mov    %eax,(%esp)
   0x08048527 <+67>:    call   0x80485a2 <A::~A()>
   0x0804852c <+72>:    leave
   0x0804852d <+73>:    ret
End of assembler dump.

Okay. All is fine here. Things are pretty straight forward. Constructors for classes A, B and C are called. Their respective destructors are called in the reverse order. None of the constructors or destructors throw an exception here. So, no problems here. Note the leave and ret instructions at the end of the function. They indicate a normal function return, meaning they get executed when we encounter a return statement ( return 10; ) in C / C++.

Now let's modify class "C" as below:

class C {
public:
C() {
    throw 1;
}
~C() {}
};

C::C() has changed as below. We call a couple of special functions here __cxa_allocate_exception and __cxa_throw. We'll cover more on what these functions do later. For now, its just enough to know that they're getting called, when we throw an exception.

(gdb) disassemble C::C()
Dump of assembler code for function C::C():
   0x0804874c <+00>:    push   %ebp
   0x0804874d <+01>:    mov    %esp,%ebp
   0x0804874f <+03>:    sub    $0x18,%esp
   0x08048752 <+06>:    movl   $0x4,(%esp)
   0x08048759 <+13>:    call   0x8048560 <__cxa_allocate_exception@plt>
   0x0804875e <+18>:    movl   $0x1,(%eax)
   0x08048764 <+24>:    movl   $0x0,0x8(%esp)
   0x0804876c <+32>:    movl   $0x804a040,0x4(%esp)
   0x08048774 <+40>:    mov    %eax,(%esp)
   0x08048777 <+43>:    call   0x8048570 <__cxa_throw@plt>
End of assembler dump.

Now, something unexpected has happened here. The code in function1() has changed too, even though we haven't touched a bit in it. Below is the updated function1().

gdb) disassemble function1
Dump of assembler code for function function1():
   0x08048654 <+00>:    push   %ebp
   0x08048655 <+01>:    mov    %esp,%ebp
   0x08048657 <+03>:    push   %ebx
   0x08048658 <+04>:    sub    $0x24,%esp
   0x0804865b <+07>:    lea    -0xb(%ebp),%eax
   0x0804865e <+10>:    mov    %eax,(%esp)
   0x08048661 <+13>:    call   0x8048734 <A::A()>
   0x08048666 <+18>:    lea    -0xa(%ebp),%eax
   0x08048669 <+21>:    mov    %eax,(%esp)
   0x0804866c <+24>:    call   0x8048740 <B::B()>
   0x08048671 <+29>:    lea    -0x9(%ebp),%eax
   0x08048674 <+32>:    mov    %eax,(%esp)
   0x08048677 <+35>:    call   0x804874c <C::C()>
   0x0804867c <+40>:    lea    -0x9(%ebp),%eax
   0x0804867f <+43>:    mov    %eax,(%esp)
   0x08048682 <+46>:    call   0x804877c <C::~C()>
   0x08048687 <+51>:    lea    -0xa(%ebp),%eax
   0x0804868a <+54>:    mov    %eax,(%esp)
   0x0804868d <+57>:    call   0x8048746 <B::~B()>
   0x08048692 <+62>:    lea    -0xb(%ebp),%eax
   0x08048695 <+65>:    mov    %eax,(%esp)
   0x08048698 <+68>:    call   0x804873a <A::~A()>
   0x0804869d <+73>:    add    $0x24,%esp
   0x080486a0 <+76>:    pop    %ebx
   0x080486a1 <+77>:    pop    %ebp
   0x080486a2 <+78>:    ret
   0x080486a3 <+79>:    mov    %eax,%ebx
   0x080486a5 <+81>:    lea    -0xa(%ebp),%eax
   0x080486a8 <+84>:    mov    %eax,(%esp)
   0x080486ab <+87>:    call   0x8048746 <B::~B()>
   0x080486b0 <+92>:    lea    -0xb(%ebp),%eax
   0x080486b3 <+95>:    mov    %eax,(%esp)
   0x080486b6 <+98>:    call   0x804873a <A::~A()>
   0x080486bb <+103>:    mov    %ebx,%eax
   0x080486bd <+105>:    mov    %eax,(%esp)
   0x080486c0 <+108>:    call   0x8048590 <_Unwind_Resume@plt>
End of assembler dump.

Note, that there is some new section of instructions ( in bold ), added after the ret instruction. Looks like some calls to destructors have been added here. Let's analyze on what this exactly means.

void function1() {
A a;
B b;
C c;
}

Now, in the above function, when C c; is read by the compiler, it sees that C's constructor throws an exception. So, it does the following in this case:

1. In case, if C::C() throws an exception at run-time, it indirectly means that objects a and b have already been constructed successfully. In which case, they SHOULD BE destroyed. So, the compiler inserts calls to call B::~B() and A::~A() ( remember the reverse order ).

2. Also, if C::C() throws an exception at run-time, then object c is not considered to be fully constructed. In that case, it need not be destructed, meaning C::~C() need not be called. So, the compiler doesn't bother about C::~C(). Hence, no calls to C::~C() in the above new code.

3. Since, we are in red alert, we need to convey this to the caller of function1() ( which is main() here ) too. So, the compiler calls _Unwind_Resume function to continue the same steps in the parent function. Note that _Unwind_Resume is perfectly placed in the end of the function, so a sequential execution will pick it up, which means we're not going by the normal leave / ret code path here. We're using a secondary code path.

This is the secret behind how compilers propagate exceptions and unwind stacks. The compiler analyzes the code while compiling, and adds extra code to handle exceptions. This means extra work and extra compile time. This also implies that your library / executable might get a little bigger than normal.

Okay, we covered the simple case of class C throwing an exception. Let's deal with a little more complex case. What happens when constructors A, B and C, all of them could possibly throw an exception ? This means that all 3 of them have the code which could trigger an exception at run-time. We won't know until run-time, who will throw one. The compiler has to generate code in function1() to accommodate all the cases. The expected behavior is listed below:

1. At run-time, if A's constructor throws an exception, then we should just exit the stack frame, without calling any destructor.

2. If B's constructor throws an exception, then we should only call A's destructor.

3. If C's constructor throws an exception, then we should call the destructors of both A and B.

Below is the assembly code generated when all 3 constructors throw an exception. The initial section of function1() has been trimmed of, since it's the same here too.

(gdb) disassemble function1
Dump of assembler code for function function1():
...
...
   0x080486a0 <+76>:    pop    %ebx
   0x080486a1 <+77>:    pop    %ebp
   0x080486a2 <+78>:    ret
   0x080486a3 <+79>:    mov    %eax,%ebx
   0x080486a5 <+81>:    lea    -0xa(%ebp),%eax
   0x080486a8 <+84>:    mov    %eax,(%esp)
   0x080486ab <+87>:    call   0x804879e <B::~B()>
   0x080486b0 <+92>:    jmp    0x80486b4 <function1()+96>
   0x080486b2 <+94>:    mov    %eax,%ebx
   0x080486b4 <+96>:    lea    -0xb(%ebp),%eax
   0x080486b7 <+99>:    mov    %eax,(%esp)
   0x080486ba <+102>:    call   0x8048768 <A::~A()>
   0x080486bf <+107>:    mov    %ebx,%eax
   0x080486c1 <+109>:    mov    %eax,(%esp)
   0x080486c4 <+112>:    call   0x8048590 <_Unwind_Resume@plt>
End of assembler dump.

If we look into the code above, we could easily sort out the answers by plain reasoning. E.g. To solve (1), jump directly from A's constructor to address 0x080486bf ( so that we don't fall in the line of the calls to B::~B() and A::~A() ). For (2), jump directly from B's constructor to address 0x080486b0 ( so that we fall in the line of the call to A::~A() and not B::~B() ). For (3), jump directly from C's constructor to address 0x080486a3 ( so that we fall in the line of the calls to A::~A() and B::~B() ).

One might quickly guess, that these addresses can be hard-coded in the constructors to close this whole issue. But, this won't work since these objects ( and their constructors ) might be used in a lot of places, not only in function1(). So, it becomes obvious that this mechanism of finding the address to jump to, should be dynamic. And the code for doing that should be in one of __cxa_allocate_exception or  __cxa_throw, since they are the one's being called once an exception is thrown. Decent guess. So, let's explore what each one of them does a bit.

Till now, we've not answered a question. When we do a 'throw someObject;' how is someObject accessible in a different stack frame ? The stack where the exception originated is already gone .. right ?

Correct. The exception is not allocated in the stack. It is allocated in the freestore. __cxa_allocate_exception is the guy responsible for allocating it. How can you verify it ? Simple. Let's modify the exception thrown by A's constructor a bit. Rather doing a "throw 1", we'll throw an object, which has a big size, and see where it gets allocated.

class Memory {

private:

  int a_[ 1024 * 1024 ];

public:

  Memory() {

    a_[ 1024 ] = 0xaaaabbbb;

  }

};

class A {

public:

  A() {

    throw Memory();

  }

  ~A() {

  }

};

Here, we've created a class Memory, whose size is 4 Mbytes ( sizeof(int) * 1024 * 1024 ). Now, let's analyze the constructor of A() to see where does it allocate this object.

Dump of assembler code for function A::A():
   0x0804878a <+00>:    push   %ebp
   0x0804878b <+01>:    mov    %esp,%ebp
   0x0804878d <+03>:    push   %ebx
   0x0804878e <+04>:    sub    $0x14,%esp
   0x08048791 <+07>:    movl   $0x400000,(%esp)
   0x08048798 <+14>:    call   0x80485a0 <__cxa_allocate_exception@plt>
   0x0804879d <+19>:    mov    %eax,%ebx
   0x0804879f <+21>:    mov    %ebx,(%esp)
   0x080487a2 <+24>:    call   0x8048778 <Memory::Memory()>
   0x080487a7 <+29>:    movl   $0x0,0x8(%esp)
   0x080487af <+37>:    movl   $0x8048918,0x4(%esp)
   0x080487b7 <+45>:    mov    %ebx,(%esp)
   0x080487ba <+48>:    call   0x80485b0 <__cxa_throw@plt>
End of assembler dump.

If the exception Memory() was allocated in the stack, the value of %esp would have been decremented by 4 Mbytes approximately. It's not. %esp is just decremented by $0x14 ( 20 bytes ). So, the Memory() object was not allocated in the stack. If we move our focus on the line in bold, things should be obvious. A value of 4 Mbytes, is passed as an argument to __cxa_allocate_exception function. So, it's this function that allocates the exception dynamically as the name suggests. Still not convinced. Try a simple trick. Break in __cxa_allocate_exception in gdb. Analyze the memory usage of the program ( pmap -x <pid> ). Run the 'finish' command in gdb to complete the execution of __cxa_allocate_exception. Now analyze the memory. It should have increased by 4 Mbytes.

So, __cxa_allocate_exception is responsible for allocation. It returns the allocated address ( in %eax, the return value register ), where the Memory() object is constructed. After this, we call __cxa_throw. 0x8(%esp) is the 3rd argument. 0x4(%esp) is the 2nd argument. (%esp) is the 1st argument. In short, we call

__cxa_throw( Memory()'s this pointer, Memory()'s typeinfo, 0 )

So, by elimination if __cxa_allocate_exception only takes care of exception memory allocation, then __cxa_throw is the guy who knows the jumping logic. This is the function where all the trick should happen, logically. This function knows where exactly to land in the previous stack call frame ( function1() in our case ), so that it will avoid calling the wrong destructors. At this point, I'm too not clear on how it does this. Will defer it for now.

2. Catching an exception:

Catching is pretty trivial. Whenever we catch an exception, we either have the choice to end the misery, or to propagate ( aka re-throwing ) it with a smile on the face. If you're propagating, then refer to the previous section. If you've handled the exception, go to sleep ;-)

Wednesday, January 25, 2012

C++ reference internals

Hmm ... where to start.

Ok .. a reference is a C++ enhancement on top of the traditional C pointers. So, a reference is NOT a totally new invention of sort. This enhancement was added to pointers, to address one main issue that exist with pointers :

Pointers are too powerful and hence prone to mishandling.

So .. are pointers evil ? Nope ... Address manipulation is basically a low level operation and pointers are masters in doing that. Pointers are susceptible to errors, since it gives too much freedom to the programmer. Consider the following program:

void function() {
  uint32_t fourGig = 0xFFFFFFFF;
  uint32_t *ptr = ( uint32_t * ) 0x1;

  write( 1, ptr, fourGig );
}
 

The above code "attempts" to dump the entire address space of the current running process on a 32-bit host to stdout. On most systems, this would probably fail. But this is just shown to demonstrate the possibilities that a pointers carries. A program normally contains data and code. Code is mostly non-modifiable. And guess what ? Pointers can totally manipulate the data part , making them kinda a "queen" in a game of Chess. Pointers are powerful, and with great power comes great responsibility, and so is the programmer expected to be. But, everyone is prone to committing mistakes, and trying to reduce the chances of recommitting the mistakes is an intelligent decision. "References" are a choice in that direction.

References are "pointers" in one sense and are "objects" in another sense. I know that most definitions to references are quirky, and I don't want to add to it. By the time, we're done with this post, we should have a much clear definition.

Ok .. let's get started. Consider the following program:

int main() {
  int a = 0x11111111;
  int &b = a;

  return 0;
}

The assembly code of the main function is displayed below.

(gdb) disassemble main
Dump of assembler code for function main:
0x00001f70 <main+00>:    push   %ebp
0x00001f71 <main+01>:    mov    %esp,%ebp
0x00001f73 <main+03>:    sub    $0x10,%esp
0x00001f76 <main+06>:    movl   $0x11111111,-0xc(%ebp)
0x00001f7d <main+13>:    lea    -0xc(%ebp),%eax
0x00001f80 <main+16>:    mov    %eax,-0x10(%ebp)
0x00001f83 <main+19>:    movl   $0x0,-0x8(%ebp)
0x00001f8a <main+26>:    mov    -0x8(%ebp),%eax
0x00001f8d <main+29>:    mov    %eax,-0x4(%ebp)
0x00001f90 <main+32>:    mov    -0x4(%ebp),%eax
0x00001f93 <main+35>:    add    $0x10,%esp
0x00001f96 <main+38>:    pop    %ebp
0x00001f97 <main+39>:    ret
End of assembler dump.

Consider the lines in bold.

-0xc(%ebp) is the address of local variable "a",

-0x10(%ebp) is the address of local reference variable "b"

The first line assigns 0x11111111 to variable "a". The second line is significant. The "lea" instruction "loads effective address" to register eax. In simple terms, "lea" moves the value -0xc(%ebp) ( which is &a ) to eax register. The next line moves the address ( &a ) to -0x10(%ebp) ( location of variable b ). So, the above code actually mean:

int main() {
int a = 0x11111111;
int &b = &a;

return 0;
}

The point here is that references actually stores the address of the object / variable assigned to it, which makes them pointers. Don't believe it ? Let's modify the above program to use pointers, as below:

int main() {
int a = 0x11111111;
int *b = &a;

return 0;
}

Disassembling the above code should give the "exact" same set of instructions, as the code for reference. Check it for yourself !! If they have the same instructions, then what's the point in using references ? Well .. that's the interesting part. Let's start with the below line.

int &b = a;

When the above line is encountered by the compiler, it converts the code internally to the below one:

int &b = &a;

How does the compiler get "address of a" from "value a" ? Good point. The compiler knows where "a" is allocated. So, it exploits this knowledge, and gets the address, whenever you specify a variable such as "a". The compiler doesn't need "&a" for this. What difference does this all make ? A world of difference. First, you are not allowed to handle "&a", an address.

You are only allowed to play around variable "a", not its address. The compiler handles the address part for you. When you don't play around with addresses, you don't need pointers. Less causality. Simple !

Hmm .. ok ... so that takes care of assigning values to the reference ... What about using them ? The answer is simple. Consider the following program:

 int main() {
  int a = 0x11111111;
  int &b = a;

  int c = b;

  return 0;
}

We already know that references deal with "variable addresses" internally and not values. So, when the compiler encounters

int c = b;

it just converts it to

int c = *b;

Hmm ... so references are indeed pointers internally !!! This gives rise to an interesting question. Is the below line valid ?

int main() {
int &b = 0x11111111;

return 0;
}

NOPE .... The compiler will give a message similar to the one below:

 error: invalid initialization of non-const reference of type 'int&' from a temporary of type 'int'

The problem as to why this failed should be obvious by now. We don't have an address to operate on. References need a variable ( from which it can extract an address ) to work. In short, references need an lvalue as an argument. Okay good ... but why does the below code work then ? It should fail too right ?

 int main() {
  const int &b = 0x11111111;

  return 0;
}

Actually, this is not valid code for the same reason mentioned above. But there is a catch here. We're operating on a "const variable". And const variables don't change value. The compiler exploits this knowledge. In this specific case, the compiler automatically creates a local variable, copies the rvalue to it, and uses it's address as shown below:

 int main() {

  int temp = 0x11111111;

  const int &b = temp;

  return 0;

}

This local variable "temp" is called a "temporary" in C++ jargon. This temporary variable has the same scope of the reference. That is, it will stay as long the reference stays. If you've noticed, there is something else that's interesting here. This code is exactly the same code that we wrote initially ( using variable name 'a' rather than 'temp' ). There is no variable naming in assembly. So, the assembly code for both the above and below code should exactly match !!!! Check it out for yourself.

So, should we always use references ?

Well .. the point is to use it as much as possible. It's not always possible to use references. References don't have all the power of pointers. Main reason being they can't be empty or re-assigned to point to another object. It was a conscious design decision made by C++ designers. If you look at it implementation-wise, it is not a problem to get it done. When the user asks the reference to point to a different object, the compiler uses the same "address of variable" logic, and will populate the reference variable with the new variable's address. So, there are no implementation difficulties here. So, why was it not allowed then ?

One main reason I could think of was that pointers already do the same job. So, there is no point in replicating the same set of operations in a reference. Given that we're coming with something new, is there a possibility of giving a new guarantee that a pointer cannot do ? That way we could also avoid the duplication, and get a new feature. So, the concept of "a reference is always valid" was introduced. Clearly, a pointer doesn't guarantee that, since it allows NULL pointers. What that means is that, the lifetime of a reference is a subset of the lifetime of an object. In short, it can be termed as an "alias" to the object. This concept of "alias" gives a high level abstraction to an object. High level abstractions are the need for solving bigger problems, something which programmers can exploit while designing, and something which C++ normally specializes in !