Wednesday, February 18, 2009

Why Pink?

Because there are no other static analysis blogs decorated with pink. Roughly 20 days have passed and this project is finally rolling. Please check out the project in progress and play with the Wiki and forums.












































Alright. It's 5 minutes to go, you're me, and you need to show your boss all the work you've done. Close the lolcat tabs. Oh gosh, I really meant to save that one, thank you oh mighty and omnipotent firefox recently closed tabs history menu.


On your sketch pad you write down:


While surfing the chaos I have built up the following:
  • A notion of a higher level IR that no longer resembles asm
  • An almost complete MIPS translator into this IR
  • Really basic procedure detection and inter+intra procedure flow graphs on mips-code-generated IR

Now this one is the easiest but the most important. The Intermediate Representation is the basis for what the code from all of the other architectures is translated into. Originally something along the lines of U-Code was envisioned. Now, UCode works well for writing a compiler. UCode has a concept of a stack, a heap, global variables, and so on, it's nifty! But overly complicated for what I think is needed. We need something super simple. Here it is:


There are 4 staple operations in the IR:
  1. operation(...) (... = list of operands and operators)
  2. load [destination]
  3. store [source]
  4. branch_true [destination]

#1 covers all of the bit twiddling and math instructions. #2/#3/#4 use an implicit value which is the result of the previous operation(). For example, load is implicitly given a source memory address.


In terms of analysis #4 covers many contexts, so this one has actually been re-complicated back into jump, call, and branch_true. And, call also has a corresponding ret instruction. For what purpose? Procedure detection. What's the difference between a call and a jump? A return address. A jump and a branch_true? It's philosophical but I consider it to be locality.


The next blog post will elaborate more on the IR, how the operands work, and other instruction abstractions that are created to ease analysis. If you can't wait, I encourage you to glance at the code.




Upcoming Milestones
  • x86 translator!
  • integer under and overflow hot-spot detection
  • unchecked return values, double frees

What's needed after those
  • A GUI is needed to play with these graphs, Processing? Any suggestions? Readers please help
  • PE file format support
  • IR Transformation gadgets -- an interface to the IR for applying easy transformations for detecting external library calls, function prologues and epilogues, specialized integer operations, and ....

1 comment:

  1. i guess anyone can make GUI using Processing yet, but it would be great.

    ReplyDelete