Riddle me this…
When last we left our heroes, they were about to start the debugger and probe the mysteries of the Objective-C objc_msgSend function.
Pull down and build the hello world project from github. You can run it in the simulator if you don’t have an iDevice or developer license.
Put a breakpoint in the first source line of touchesEnded in helloWorldViewController.m. Make sure the build settings are set for Simulator and Debug. Click “Build and Debug” and wait for the simulator to come up. Click in the window on the simulator or device. The breakpoint should hit, and you’ll see something like this:
Right after the first breakpoint
We’re now going to leave the comfy confines of the source debugger behind, and dive into the world of assembly. If you don’t see the disassembly pane on the bottom right of your debugger window, select Run→Debugger Display→Source and Disassembly, as shown:
We’ll start with the standard C call to NSLog, to introduce you to the Application Binary Interface, or ABI. The ABI defines the way that high level languages should organize the compiled code, so any compiled code can be linked together, and external libraries will work with your code.
The first thing we need to do is pass the parameters to the function that will be called. The first four 32-bit parameters are placed into processor registers for speed. Registers R0,R1,R2, and R3 are used. Any past the first four are saved to memory in an area called the stack. The stack starts at the highest address and works its way down in descending addresses. There are two parameters to this invocation of NSLog: the format string, @”touchesended: %@” and the first argument, event. The string constant is passed as a pointer to constant memory, and event relative to the stack pointer, since it’s a parameter of the current function.
The instruction that actually calls the function is BLX – that stores the return address in a special register called the Link Register – this prevents memory reference if the return address does not need to be saved. Note that I switched from the simulator in the above two images to actual device debugging.
Let’s delve into how the constants are figured out. They are stored right after the compiled code for each function, in the TEXT segment, which is the same one the code is in. They are specified as offsets from the current value of the program counter, since the compiler doesn’t know what address the function will reside at after it’s linked. All it can do is insert an offset once the function is fully compiled. Switch to the debug console (Command-Shift-R) and enter the “si” command twice. The cursor on the assembly side should now be on “mov r0,r3″. We need to add the PC to the value of R3 for the same reason as we had to load R3 from an offset of the PC. The linker places a string table in a completely different section from the compiled code, the offset to the string in the table is what is placed at the address pc+#648 in the above code, by the linker.
In the top right pane, where local variables and parameters are displayed, you can scroll to the bottom and see a category labelled “Registers” with a disclosure arrow. Click that arrow and look for $r3. That is pointing to the offset to the string in memory. In this case $r3 had the value 0x4040.
If you select Run→Show→Memory Browsers, you can look at the contents of memory. Enter 0x4040 into the address field. Set the word size to 4, since we are looking at an “indirect” memory offset. Notice that the contents of address 0x4040 is 0x3e2184e0. Let’s follow that offset, which takes us to the program’s String Table.
That is what a NSString looks like in memory.
Back to function calls and the ABI. After the BLX instruction, we’re in the function prolog. This sets things up so the function itself has storage space to run, and can call other functions and ultimately return without messing up the memory of its calling function. We need to save all the processor registers, including the Link Register, to the stack. If you compile with optimization, only the registers that are affected by the function will be saved to the stack. After that, a register called the Frame Pointer is set to the current value of the stack pointer (the frame pointer is always saved before this). This is so the stack pointer can be restored at the end of the function, in a section called the epilog. Then the number of bytes necessary for the function’s local variables, including those in subordinate scopes, is subtracted from the stack pointer. After that, the function itself runs.
The function epilog then undoes what the prolog does, so when control is transferred back to the calling function, everything is restored to what was there before the function call, except for the registers which are known to contain the return value.
So NSLog is called, and has no return so we don’t check anything. Let’s step ahead to an objective-c function, so we can see the similarities. Place a breakpoint at the first objective-c call, which will be the [[CATransition alloc] init] (the first will be the alloc), and hit Continue.
Let’s have a look at the assembly language here.
We can see that this is a call to objc_msgsend(self, cmd). There are two arguments to this function, stored in r0 and r1. R0 will contain the objective-c class CATransition, and r1 will contain the selector for the alloc method. The selector is what we’re looking for, since that is how we’ll put together our scanner for Forbidden APIs. Let’s step ahead to just before the call itself, and have a look from the debugger console:
So what we get from this is that $r0 is an object, the CATransition class, and $r1, the selector is simply an old-style C char*! This makes things a lot easier for us – all we have to do is find the selectors and basically grep out the text of the ones we want to avoid!
Whew! That was quite a bit – I hope I didn’t leave too many people behind. In my next post, which is the first of the New Year, I will talk about the organization of a program on disk, and where those selectors are.