Monday, January 26, 2009

Hello Blogosphere

Hi there, 

I'm proud to announce an open-source binary static analysis project. The name will come later, once the frustration factor is better scoped out.

There are quite a few impressive pieces of software out there. First, you have IDA Pro and the development of the hex-rays plugin which produces rather good looking higher-level output from machine code. There is even an SDK for this decompiler, pretty good stuff. 

Second, you have projects like BitBlaze which do some great anti-malware work and make some pretty scary, although entirely plausible threats. For example I like "Automated Patch-Based Exploit Generation" a.k.a. skynet. No worries, do not panic computer scientists, they also acknowledge that they too have not yet solved the halting problem and the machine uprising still has some homework to do.

And third, you have one of the oldest tools that was free as in beer, REC - The Reverse Engineering Compiler. REC is nice, it works, the output almost compiles. It was the first such tool I had experience with and I greatly appreciate it. The newer version looks to reach for interactive RE work, very good stuff. And it supports x86, m68k, ppc, and mips (no IDA but pretty good).

I should also mention that the LLVM project has an x86 translator in progress, but it has quite a bit of work left. There are also about one hundred or so papers out there on binary static analysis. I'll be compiling a full listing as I find them, they'll be dumped on the wiki, so the general public can more easily observe their concepts, terminology, and ideas.  

So what's the problem with some of these tools? (1) is pay software, and the SDK is probably nice (wouldn't know), but it's still not completely open and limits flexibility. (2) has yet to release source. And (3) is very nifty but has been targeted specifically for C-style RE but does support a few very different platforms. And although (3) is free as in beer, the source is closed. 

So what's the point of this one? Why an open source alternative? The point is to give the reverse engineering community the freedom to build crazier and cooler projects without as much grunt work. 

The projects will build a whole suite of APIs and tools for reverse engineering work. You want graphs for your obscure embedded microcontroller? You want a subgraph of all network I/O  handling related code? Easy mode. How about for bytecode for this whole new interpreted ju-JIT-su language that just came out? No problem, just get a translator in for your architecture and you're all set. You want to find bugs? How about applying some theorem solvers using tools that have already been written like these guys (2). You want higher level output? Why not take advantage of already implemented techniques for building syntax trees. Let some code that has already been written optimize those trees for your higher-level language output so that any programmer out there can recognize it (except die-hard asm masochists). 

Note that this project is made possible by the Rensselaer Center for Open Source which was very kindly created with a grant from Sean O' Sullivan, an alumn from '85. I am very grateful to RCOS for motivating  me to make this concept a free software reality. 

So let's start to build this thing. It might take quite a few lines of code and more than one language though,  so please re /join us after this commercial break. 

And Thanks,
Alex