Sunday, 6 June 2010

The Quest : Part 2

So the last try at making a small Mach-O binary didn't really work. Now I could start fiddling with the linker to see if I can make things smaller but I am not particularly up on my Apple linker usage so instead lets just straight to the binary assembler :)

Fortunately Apple choose to install nasm by default (well when you install the developers' tools), so we just need to understand how a mach-o binary is laid out. This is some official documentation about the file format (and there is the referenced header files installed with the dev tools). Using the otool utility also gives a good idea of what is in a real binary (try running with the -l command on the previous static binary to see what you get).

Anyway as with last time I still want this executable to not require any dirty tricks, although looking at the file format there isn't many obvious ones we could employ (at least compared to some of the stunts you can play with ELFs).

So what needs to be in a valid mach-o? The header, obviously, is required, then a number of load commands. It turns out (through a bit of reading the source) that you only actually need a LC_UNIXTHREAD (or LC_THREAD) command and the executable will load, seems unlike most other executable formats the entry point is not a field in the headers but is inferred by specifying the initial thread context.

Of course without any code in memory this isn't exactly that useful (well not immediately) so we also need to specify an LC_SEGMENT load command. This will map some of our binary into memory and we are ready to go. As a short aside if you look at the output of otool -l under most segments there are also sections, these are as far as I can tell unnecessary, and are more meta-data to make linking more consistent.
; A basic Mach-O executable
; (c) Tyranid 2010
BITS 32

ORG 0x1000

_program_start:

; mach_header
dd 0xfeedface ; MH_MAGIC
dd 7 ; cputype
dd 3 ; cpusubtype
dd 2 ; filetype
dd 2 ; ncmds
dd _cmd_end-_cmd_start ; sizeofcmds
dd 0x2001 ; flags

_cmd_start:

_segment_cmd:
dd 1 ; LC_SEGMENT
dd _segment_cmd_end-_segment_cmd ; sizeofcmd
_segment_name: ; segname
db "__TEXT"
times 16-$+_segment_name db 0
dd _program_start ; vmaddr
dd ((_program_end-_program_start)+4095)&~4095 ; vmsize
dd 0 ; fileofs
dd _program_end-_program_start ; filesize
dd 7 ; maxprot
dd 5 ; initprot
dd 0 ; nsects
dd 4 ; flags

_segment_cmd_end:

_thread_cmd_start:
dd 5 ; LC_UNIXTHREAD
dd _thread_cmd_end-_thread_cmd_start ; sizeofcmd
dd 1 ; flavor (i386_THREAD_STATE)
dd (_registers_end-_registers_start)/4 ; count

_registers_start:
dd 0 ; unsigned int __eax;
dd 0 ; unsigned int __ebx;
dd 0 ; unsigned int __ecx;
dd 0 ; unsigned int __edx;
dd 0 ; unsigned int __edi;
dd 0 ; unsigned int __esi;
dd 0 ; unsigned int __ebp;
dd 0 ; unsigned int __esp;
dd 0x1F ; unsigned int __ss;
dd 0 ; unsigned int __eflags;
dd _start ; unsigned int __eip;
dd 0x17 ; unsigned int __cs;
dd 0x1F ; unsigned int __ds;
dd 0x1F ; unsigned int __es;
dd 0 ; unsigned int __fs;
dd 0 ; unsigned int __gs;
_registers_end:

_thread_cmd_end:

_cmd_end:

_start:
; Call exit(42)
push byte 42
push byte 1
pop eax
push eax
int 0x80

_program_end:

Throw it through nasm in binary mode and what do we get? 172 bytes, far smaller. There are some further tricks you could play with this, such as embedding the code inside the thread context (as only EIP and probably the segment registers are important) or actually store a few of the necessary values in the context to slightly reduce the pushes. Still 172 is alright for now, can it go any lower?

Thursday, 3 June 2010

The Quest for a Small Mach-O

For my sins I have recently actually enjoyed using OS X. There is just something about its unix'ness which appeals to me (though I would rather not have to pay for it to begin with). Anyway one of the first things I tend to do on an OS is to try and write as small an executable as possible and this is not the time to change that.

So this is maybe the first post of many on creating something small :) Note: I am working on Snow Leopard and producing 32bit code, YMMV.

Step 1: What can we do with basic tools?

So lets start with a normal development environment to see what we can get without having to writing anything custom. Before we can do anything we need some code, here is a simple entry point with no reliance on external libraries, just straight into the exit syscall.
void start(void) {
// Call exit(0)
__asm__ volatile (
"push $0\n"
"movl $1, %eax\n"
"int $0x80\n"
);
}
It is worth pointing out that without this exit syscall your new application will just SIGBUS, not exactly optimal.

Now just need to link it, we will choose to link statically (which should get rid of anything to do with the dynamic linker, which might have to change as we go along).
all: test1

test1: test1.c
$(CC) -c -o test1.o test1.c
$(LD) -o test1 -s -static -e _start test1.o

clean:
rm -f test1 *.o
And our survey says? 4096 bytes, bugger. Well I guess page alignment is a killer. Of course using hexdump shows that over 3/4 of the file is empty. Still there is some hope for the future, running otool -lv over the output application shows that the entire 4k is being loaded into memory, a classic trick in making small binaries. Some nice sounding options in the ld man page (such as -pagezero_size and -seg_page_size) just don't seem to work as expected so no doubt something more custom is required next time.

Onwards and upwards.