Thursday, March 12, 2009

The Intermediate Representation

In this post I will show an example of the Intermediate Language previously described. 

As previously mentioned there are more or less four basic operations.

  • 1. operation()             <- for math, takes operands and operators
  • 2. load $register        <- implicitly retrieves data from the memory address resulting from the previous operation and puts it into $register
  • 3. store $register      <- stores $register into the implicit result of the previous operat
  • 4. branch_true           <- branch if the previous operation is != 0 , otherwise fall through

For now call and ret operations are also used to make function detection more manageable. In the future these will be removed, as they can be derived from combinations of load/branch. 

Here is some sample output from the MIPS translator on the print2lpr binary. In addition some basic analysis has been performed to resolve GOT entries and strings. This is fully automated but register propagation is very basic at the moment. Be sure to check out the code at http://rpisec.net/repositories/show/rcosbinstat. The majority of that code is in the libtransform function in mips_translator.py 

0x4023ec ($28, '+', -32460)     
0x4023ec LOAD $25     ### getenv
0x4023f0 ($4, '=', $4, '+', 1588)     %%%% "LPUSER"
0x4023f4 ($0, '=', $0, '<<', $0)     
0x4023f8 ($31, '=', $32, '+', 4)     
0x4023f8 CALL $25     
0x4023fc ($29, '+', 48)     
0x4023fc LOAD $28     
0x402400 ($6, '=', $2, '|', $0)     
0x402404 ($2, '==', $0)     
0x402404 BRANCH loc_0x402428     
---end of block---

--- block 402408 -> 402424:13--
parents:  []
branches:  0x402428 0x0
0x402408 ($28, '+', -32692)     
0x402408 LOAD $5     %% (10000000)
0x40240c ($28, '+', -32556)     
0x40240c LOAD $25     ### sprintf
0x402410 ($4, '=', $16, '|', $0)     
0x402414 ($5, '=', $5, '+', 1596)     %%%% "P%s
0x402418 ($31, '=', $32, '+', 4)     
0x402418 CALL $25     
0x40241c ($29, '+', 48)     
0x40241c LOAD $28     
0x402420 ($16, '=', $16, '+', $2)     

On MIPS arguments are passed in registers $4 - $8 and then the stack. It would be nice to automate that translation but I haven't had a chance to. Anyway, let's look at how the sprintf function is being called.

Note that register $16 points to a global write buffer in the BSS. sprintf takes 2 arguments and then a variable number of arguments:  

sprintf(dest, fmt, .... ) 
$4 is the destination <- $16 = global write buffer
$5 is the format string <- "P%s"
$6 is $2. 

Return values on MIPS are placed in $v1/$v2 which are registers $1/$2.  Looking up at the previous code block the last call was to getenv("LPBUF"); 

Our code is then

write_buf += sprintf(write_buf, "P%s", getenv("LPBUF")); 

We are still quite a ways from automating that higher level translation, but having automated string and library resolution is certainly nice.

By the way, this particular code segment was not exploitable as there is nothing good after the global BSS buffer. If you're really interested in Irix bugs, get a life, or contact me :-).

Back to the topic at hand. The IR looks to be painfully simple. This is to make analysis very easy in the long term. In addition forcing this simplicity makes translation from other architectures better possible. Analysis tools will then be much more useful as platforms change and so on.