Simply put, Reverse Engineering is the process of taking something apart to find out how it works. In regards to software, there are three general categories of data that are reverse engineered:
With File format reverse engineering, you typically have one or more files of a (unknown) structured format. The purpose is to determine that structured format. As a simple example, consider a file containing the four byte sequence:
0x41 0x42 0x43 0x44
This could be interpreted in at least the following ways:
As you can see, the possibilities for interpreting unknown data are
limitless. Now imagine that there are many of these four byte data
files. Examining each file, we see that the value of the bytes is
always between 0x41
and 0x5a
. Interpreting this range as ascii
characters, the range is 'A' .. 'Z'.
From this we infer that the
likely structure of the file format is an ascii string of length four.
Real life examples are much more complex. File formats typically contain strings, signed and unsigned integers of varying sizes, floating point numbers, padding bytes and arrays, as well as structures built up of these primitives. Although it may sound complex, file format reverse engineering is the simplest of three categories.
Reverse engineering communication protocols is a step up from reverse engineering file formats, with the added complexity being that protocols are multi directional while file formats are only unidirectional. A protocol is a conversation that follows strict rules. When reverse engineering a protocol, the data being transmitted must be determined (like with file formats). But perhaps more importantly, the rules as to when that data can be transmitted must be determined as well.
Reasons for reverse engineering protocols can be more than just to document an existing unknown protocol. These techniques can often be used to test the robustness of a particular implementation of a known protocol.
Reverse engineering executable programs is analysing a program for which you do not have access to the source code. There are multiple reasons for reverse engineering an executable program. These range from gaining a general overview of the capabilities of a program, through to analysing a specific algorithm inside a program, and finally obtaining a complete program decompilation to source code.
Often when performing file format or protocol reverse engineering, you have a particular program that uses that file format or protocol. Reverse engineering the executable may help assist in the reverse engineering of the unknown file format or protocol.
This site mainly deals with reverse engineering executable programs.
"I mean, I heard from my (lecturer/colleague/vendor/political representative) that reverse engineering is evil."
Like all software practices, reverse engineering can be used for multiple purposes and reasons.
Imagine that you are a software company. Another software company has come up with a revolutionary new product that happens to compete with yours. Your response could be to:
You suspect that a software product you recently purchased and installed is really spyware and is transmitting your private details back to the vendor. You can either:
You are a system administrator / security researcher. The machines you administer are being attacked and penetrated by a worm. You have obtained a copy of the worm program. You can:
This website discusses the tools and processes used in reverse engineering. When compared to technical discussions, morality arguments are boring. The morality of the practice of reverse engineering is left as an exercise for the interested reader.