Subject talk:Software reverse engineering

From Wikibooks, the open-content textbooks collection

Jump to: navigation, search

Contents

[edit] Topic of book

The topic of the book is "legal reverse engineering". However, it does not state in which jurisdiction the term refers to, and it would be good to state this explicitly. Is the correct topic "Legal reverse engineering in the USA" or is it acceptable if the techniques are legal in, say, Norway?

Filur 13:11, 12 August 2006 (UTC)

That's a good question. I think that it would more appropriately called "Legitimate reverse engineering", which would only serve to prevent this book from containing hacks, cracks, or other illegal nonsense. Basically, we want to teach people how to disassemble code, examine compiled binaries, and read uncommented assembly language. We don't want to teach anybody how to break the law, steal passwords, write viruses, perform DoS attacks on servers, etc. If you want to write about something that is legal in norway, but is illegal in spain (for instance), you might want to include a note that says "This information might be illegal in some jurisdictions, consult your legal counsel for help", or something like that. --Whiteknight (talk) (projects) 14:28, 12 August 2006 (UTC)

Browsing, as soon as I read that opening note, I turned away. It makes no sense. I can't think of any reverse engineering tech that someone would not claim was legal, in some circumstances -- and I can't think of any that someone would not claim was NOT legal, in some circumstances. So if you were going to consider the matter at all, you'd have to devote a great deal of space to discussing in what circumstances what tech may or may not be legal. 69.87.204.121 15:06, 25 January 2007 (UTC)

I agree with your statement. The text that you are referring to was the result of an argument some time back about this book, when it was first created. Some people feared that this book would be used as a manual for "hacking", "cracking", or "pirating". While all those things can be legal or illegal under different situations, wikibooks was trying not to be host to potential criminals. We added the warning about legality just as a way to keep people happy, and to prevent the book from being just a directory of how to break what protections, or how to crack/hack, etc. --Whiteknight (talk) (projects) 15:11, 25 January 2007 (UTC)

[edit] purposes of RE

The introduction should probably provide more information about possible application areas for reverse engineering efforts (i.e. troubleshooting/debugging/understanding broken/unmaintained/malicious software) ?

[edit] page delimiters

I am going to work on converting this wikibook over to the "forward slash" naming convention in the following days --Whiteknight TCE 19:13, 15 November 2005 (UTC)

I am curious: Why do you feel the need to change the page delimiters? If there is some reason one is really much better than the others, please tell the rest of us on page Wikibooks:Naming_policy#Page_delimiters. I don't have a preference one way or the other. (On the other hand, I do strongly feel that "flat" is better than "hierarchical" page structure. ) --DavidCary 22:28, 18 November 2005 (UTC)

I was definately planning to use a "flat" structure for simplicity (although in other books i've started, i've used hierarchical schemes as well). I wanted to change the naming scheme here to use the "forward slash" notation because i think it looks better, and the backlinks are a nice thing. Also people have mentioned that in future versions of mediawiki, forward-slash pages will have other cool features available (although right now i can't remember which). Mostly though, i want to "be bold" and just do it, and nobody will complain about it, if i do all the work myself. --Whiteknight TCE 12:34, 21 November 2005 (UTC)

The project is now done. I've removed the notice. --Whiteknight TCE 19:20, 28 November 2005 (UTC)

[edit] Note

i think the current form of this wikibook is pretty good. The book is divided into sections, and each section can contain discussion, and a series of chapters. Chapters aren't numbered, because chapters in a section can often be read in any order, and new chapters can be added in the beginning or end of the list, if needed. The 7 first sections are, i think, fundamental, but we can (and probably should) add more sections at the end to cover more topics. --Whiteknight 17:56, 15 September 2005 (UTC)


[edit] Proposed Sections

I've proposed the following sections, to be added after a significant amount of headway has been made in the "Fundamentals" sections of the book:

Security This section will deal with identifying malware, identifying and fixing security flaws such as buffer overruns etc...

Computer Networks This section will deal with identifying and reversing network protocols, and other reversing topics dealing with networks.

Proprietary file formats This section will deal with decyphering proprietary file formats, and writing clients to interact with 3rd party data files.

Anti-Reversing Shows some techniques to foil fellow reversers, including checksums, symbol stripping, Active Anti-debugger methods, etc...

Disassembly Theory This section talks about how disassemblers work, includeing linear sweep, and recursive traversal disassemblers.

Decompilation Theory Talks about decompilers in greater detail.

Cracking and patching Perhaps too controversial of a subject, this section will cover breaking encryption, extracting passwords, and conversly using better encryptions, and better password security. we will need to make sure information here is LEGAL, and all solutions are provided. --Whiteknight 18:11, 15 September 2005 (UTC)

While admittedly a controversial topic, it does have relevance in the context of a reverse engineering book, thus I would simply recommend to treat the topic with the same level of professionalism, competence and expertise-while of course also making sure that there is no controversial contents added. For example, "cracking" and "patching" are not necessarily illegal in themselves, all the required knowledge could be illustrated using abstract concepts, and self-written test cases, which are then simply being patched/modified on a step by step basis. Using real-life scenarios, where it makes simply sense to talk about "patching": lots of software has in the past been "patched" by the corresponding manufacturers themselves, just to "repair" it or add new features to it. Thus, patching certainly should be discussed here as well, given that most software companies use the distribution of "patches" to actually EASILY UPDATE their product, LEGALLY. This could be accompanied with a discussion of the advantages/disadvantages of patches vs. complete software packages. Likewise, this will also illustrate that there are different forms of "patching"-i.e. you may patch software, by simply replacing individual components (i.e. think of libraries, resources (bmp, wav etc )or scripts (vb) /logics). And then, there is of course patching itself, which refers to patching (changing/augmenting) original files. But still, this may also refer to patching non-binary or at least non-executable files as well, i.e. to replace a section of an bitmap image with something new, or shorten an audio file. Likewise, using GNU diff/patch itself is also about patching non-binary files, and can thus be easily used to illustrate the relevant concepts of taking two images/versions of a file, "diffing" them, and using the corresponding diff, to come up with a "patch", that may be applied to versions of the original file. The realms of "cracking" are only really entered when we are talking of modifying binary software that you do not have the copyright for, in a way to modify, circumvent or add certain functionality. Thus, as long the wikibooks chapter about patching/cracking is using public domain test cases, where users may patch/modify their own programs, there should be hardly any problems associated, after all patching/cracking in general is really just about "persistent modifications" to files. So, if in doubt-just name it like that "Persistent Modifications". Personally, I feel such a topic could contribute a lot to this book, in particular because you can learn a lot about the underlying concepts, simply because even writing the required tools to do a simple patch of an ASCII (or later on even binary) file, could be part of the contents.

Also, its worth noting that reverse engineering, including "cracking" and "patching", is a technique thats also used to fight threats such as computer viruses, trojan horses or worms. In fact, in many countries there are meanwhile forensic courses being given that specifically teach people how to do this sort of stuff, to make sure that there are people who are able to fight such threats.

Reverse Engineering Web Applications Due to the way most websites started out, in garages and then rapidly scaled up, they have virtually no documentation available. Secondly due to continuous fixing, they are full of patches. Most are in need of reengineering but due to lack of documents and unavailability of the original programmers, it is not possible to re engineer a live site. 1st Reverse Engineering needs to be undertaken to document what is there and then Reengineering can be undertaken. I propose a chapter on Reverse Engineering Web Applications --RYK 19:49, 22 August 2006 (UTC)

I agree with this, it makes good sense. If you want to try and start a new section for it, please be my guest. --Whiteknight (talk) (projects) 20:17, 22 August 2006 (UTC)

[edit] OS distinctions

Often, one wants to reverse engineer something that is not for whatever OS or CPU the user may be running. For example, the CVS camcorder runs ThreadX on a MIPS CPU. This is unsuited to desktop use, so all reverse engineering will almost certainly be done on some other OS and CPU.

Suppose I run Linux on a Mac. What am I looking for? A so-called "Linux" tool is unlikely to handle the ThreadX/MIPS code. A so-called "ThreadX" tool, if it exists at all, is unlikely to run on Linux/PowerPC. I need a cross-disassembler and maybe a full-hardware emulator.

The same goes for "Windows" tools of course. Do the tools run on Windows to reverse engineer MacOS X binaries, or do they run on MacOS X to reverse engineer Windows binaries?

AlbertCahalan 02:02, 22 September 2005 (UTC)

I dont think there is any way that we can account for all possible combinations of hardware and software. What if we are running MacOSX on an XBOX? what if we are running Linux on our toaster? What if my cellphone has Windows XP? To reach a large enough audience, we have to cover only the most popular Operating System/Hardware configurations. We safely assume that almost all Linux/Windows systems are on Intel-compatable machines, and we assume that MacOSX is running on Mac hardware. This isnt always the case, but we dont practically have the time, the energy, or even the expertise to cover every single possibility. --Whiteknight

It's not just that, running an OS on odd hardware. It's running one OS while trying to reverse engineer code written for something else. How else am I to reverse engineer 16-bit DSP code? I might (yeah, right) choose Windows for this work... am I then looking for Windows tools? Yes and no. The tools would run on Windows, but would not be intended to examine Windows code. AlbertCahalan 01:32, 23 September 2005 (UTC)
Note that Windows is not the most popular operating system. You have operating systems in many of your electronic devices: cell phone, calculator, DVD player, satellite TV decoder, game console, printer, wireless access point, cable modem, digital camera, TIVO, cable box, car... Look around your house at all the hackable devices, and you'll likely see that Windows is outnumbered. Also, x86 is not the most popular CPU. MIPS, ARM, and PowerPC are all common. Texas Instruments and Analog Devices make plenty of DSP chips. AlbertCahalan 01:32, 23 September 2005 (UTC)
That point not withstanding, Windows/Intel is the most common PC combination (at least in the desktop market), and in the interests of reaching a large audience, we should focus on the more popular platforms. I am certainly interested in being inclusionary, and reaching a large audience, but how much space does this book--or any book for that matter--spend on fringe cases? Certainly, my toaster and my stove might not be running Windows, or have "Intel inside", but i'm not going to hack them anyway. I think it's more important to teach the basics of "how to reverse in general" then it is to teach the specifics of each and every single platform a person might come in contact with. That said, nowhere in the book do i state that the book will only have examples of Windows OS, x86 low-level, and C as a high level language, these are just the subjects that I personally am most comfortable with. If you have examples of other chip architectures, or other code that you would like to discuss specifically, write it up in the form of a broad "case study", and put it in chapter 7. Every section would certainly benefit from the inclusion of discussions on other architectures, although i would advocate marking which chapters are "essential" chapters, versus those that are meerly a matter of interest. On a final note, i think it would be important to avoid getting too specific with certain brand-names, and we should avoid listing too much proprietary disassembled code on here, for legal interests. --Whiteknight 00:46, 26 September 2005 (UTC)

I believe the answer to all this is to give reference information at the bottom of each page under the "Further Reading" section. That way, anyone who might have the desire to look into a more specific field of the broad topics this book tackles, they can find a link at the bottom. --Macpunk 06:25, 9 July 2007 (UTC)

[edit] gaining access

Reverse engineering starts with gaining access to the code. This needs a section of the book.

For example, the CVS camcorder hacking got really started when somebody physically moved a flash memory chip from a camcorder to a board that could connect to a computer. Later, people found ways that didn't require expert soldering skills.

JTAG busses are great. With just a few wires, you may be able to get debug access to the device. You might be able to single-step a chip, read the pin states, and read internal chip registers.

Sometimes people connect wires to a bus on the board. They may use an FPGA to follow the clocking, then spit the data out over an interface that is easier to connect to a computer. Behavior can be modified in this way. The original Xbox hack was something like this.

Interpreters that run user-provided bytecode (such as Java on a smartcard) can be broken by causing bit flips. You may do this with heat, electrical noise, or radiation.

AlbertCahalan 01:48, 23 September 2005 (UTC)

All these are certainly interesting subjects indeed, Although we need to be careful not to bite off more then we can chew. If we talk about FPGAs, we will need to talk about things like logic analyzers, which are specialized (and rather expensive) peices of hardware. JTAG is an interesting subject as well, and alot of commercial integrated circuit boards can be analyzed just by hooking up a JTAG. I am open to discussion on this matter (as if i really have a veto power anyway), but i think everybody would be better suited if we kept this book "software-oriented", and started another book that was "hardware-oriented." --Whiteknight 00:51, 26 September 2005 (UTC)

On This note, i am thinking about starting a Wikibook specifically on "embedded systems" that I would like to encompass FPGAs and VLSI design, among other similar topics. We could cover JTAG, and methods of reverse engineering of existing embedded devices, and point further discussion of the material to this wikibook. I want to do a little more work on Reverse Engineering before i start up a new project, but I think it's a good next project to work on. --Whiteknight 02:26, 28 September 2005 (UTC)

I see you've already started the Embedded Systems Wikibook. Looks good. --DavidCary 22:28, 18 November 2005 (UTC)


Lots of this stuff is very interesting and important, however wouldnt most of this also be more relevant in the context of an wikibook about HARDWARE reverse engineering? This would definitely also be a cool topic, one could start simple using PICs for test cases.

[edit] Floating point

I realize that the current layout of the book ignores floating point numbers almost completely. I think, therefore, that we should add a chapter on floating point numbers in Section 5, as an advanced topic. We can then tie in exactly how floating point numbers are handled in function calls and whatnot. --Whiteknight 18:28, 29 September 2005 (UTC)

[edit] Templates

This book uses the following templates:

And the following categories:

[edit] Contributors

I started the book, and have been working on it on and off in my spare time. Contributers welcome! --Whiteknight 02:59, 20 September 2005 (UTC)

I added a small but huge link ;) Neoscandal at gmail dot com --203.90.123.58 07:04, 20 October 2005 (UTC)

Fixed some minor spelling errors (when I was not registered). --D0gg 00:40, 16 January 2006 (UTC)

Added tools to linux disassemblers 0xf001 0xf001 at gmail dot com - Tue Mar 7 20:02:44 CET 2006

Fixed a few things in windows, and added some stack protection details Hexed321 16:54, 14 July 2006 (UTC)

Do you mind me being an unoffical editor? Keep up the good work. --Dr Dnar 03:43, 25 July 2006 (UTC)

I added the Mac OS X Tools yesterday, and I plan on keeping the "PowerPC Mac" content updated as I learn more. --Macpunk 15:39, 8 January 2007 (UTC)

[edit] Title

As Reverse Engineering is far from being unique to software the current title is misleading. Could it be changed to something like Reverse Engineering of Software? 194.126.226.253 18:51, 5 January 2007 (UTC)

Point taken. Unfortunately, the software people got here first, and it would be a gigantic task to rename this entire book, and to update all the links in this book to point to the new locations. We can include a prominent note on the cover though that this book is only about software, and not other subjects. --Whiteknight (talk) (projects) 22:43, 23 January 2007 (UTC)
Why do you want to change the title?
What is so wrong about a book on every kind of reverse engineering?
Yes, all the chapters written so far have been about software.
Yes, software is not the only area of reverse engineering.
You know about some other area of reverse engineering, right?
What is stopping you from helping this book live up to its title by adding a chapter on some other area of reverse engineering?
If we must limit this book to only software reverse engineering (why?), then that prominent note would be good. Perhaps it could include link that said something like "There are many other kinds of reverse engineering, such as reverse engineering electronic hardware". --DavidCary 04:51, 3 July 2007 (UTC)

Because all reverse engineering is out of the scope of this book. A title change would be optimal, albeit difficult, but I think a cover note would suffice. Maybe in the future there could be a project that combines the best of both worlds: meatspace reversing, and software reversing. Until then, one can only intertwine them in his own ways. --Macpunk 06:31, 9 July 2007 (UTC)

[edit] Page links

I think it'd be very useful to have page links at the bottom of each page which jumps you to the next page in the 'logical print' order.

It's very frustrating to have to jump up to the TOC just to navigate...

c1de0x 10:11, 8 January 2007 (UTC)

I agree with you, the page links would be a good idea. I'm going to update the template at the top of the page to include "forward" and "backward" links, eventually. I would also like to create a page-footer template with similar links. --Whiteknight (talk) (projects) 22:41, 23 January 2007 (UTC)

[edit] MIPS, SPARC, ARM, etc

I'm doing alot of work recently with MIPS, SPARC, and ARM assembly. I would like to add information about these assembly languages to this book, so that it doesnt only focus on x86 anymore. A few questions:

  1. Where should be put this material? should we append it into the x86 sections, or should we add a whole bunch of new pages?
  2. ARM is primarily used in embedded systems, which isn't the the same as the desktop machines that we talk about in the rest of the book.
  3. I dont know anything about operating systems or software for systems built on other ISAs, just x86. Does anybody else know about these things?

--Whiteknight (talk) (projects) 22:39, 23 January 2007 (UTC)

Where to put it is gonna be a problem. Idealy, they should all have their own overviews in the disassembly section perhaps. I figure this book should focus more on the engineering of a program, rather than teaching assembly, so it doesn't need to be too in depth. In theory the rest of the book should focus on program structure, techniques and so forth, which won't require per-architechture explinations. However, it's probably best to keep example code in x86, seeing how that's the most popular assembly language by far. Hexed321 21:59, 7 February 2007 (UTC)

[edit] Possible Errata

In the section on the PE file format, it states that the thunk pointers point at "null-terminated arrays". However, based on an examination of some dlls, particularly C:\windows\system\shell32.dll on an XP system, it appears that the array pointed at by the the thunk pointers are not necessarily null-terminated, but instead is terminated by a value equal to or less than zero. Most likely defined as "terminated by an invalid RVA" in an obscure specification somewhere. Most notably, on my system, the array of hint pointers for the imports from SHLWAPI.dll appeared to contain a valid RVA pointing to the entry for ColorHLSToRGB, and than a negative number (like so: B6AE1F00 CF000080).I am not an expert, but I did go through the hex of shell32.dll (in the process of debugging a PE file parser). Perhaps I am wrong, but doing this fixed my parser...

I can't even remember where I got that information from, and there isnt a reference, so it was likely an error on my part. Perhaps it was a faulty source, or perhaps it was an error while I was writing. Do you want to fix it or should I? --Whiteknight (talk) (projects) 22:50, 23 April 2007 (UTC)
I updated it. Thanks for the article, btw, it's helped a lot.
I Are you certain about this? I happened to write that part, with the specification in mind, which says all such tables are to end with null values. I'm gonna take a further look anyway. Hexed321 18:07, 26 April 2007 (UTC)
Oh, if you have a source then that's something else entirely. It is worth research. Unfortunately, as we all know, sometimes the documentation does differ differently from the reality. I can't imagine why a compiler would use an arbitrary negative integer as a terminator, but my lack of imagination doesnt mean it can't happen. Let's check it out. --Whiteknight (talk) (projects) 18:09, 26 April 2007 (UTC)


[edit] Real Life RE cases of useful and good reversing efforts

[edit] Splitting This Book

I would like to start splitting this book up into a number of smaller books. This has many advantages, I think. First, by having multiple books, we avoid the ambiguity of saying simply "Reverse Engineering" when we actually intend to say something more specific. For instance, we could say "x86 Disassembly" for the majority of this book, because that's precisely what this book is about. We could also create separate books for some of the topics that are covered here briefly, such as the reverse engineering of computer networks, Understanding/disassembling of Java and .NET bytecodes, Computer security and code vulnerabilities, etc. this book has grown large, and is now a victim of it's own ambitions. By created smaller, more focused books, we will have a higher-quality result. If we separate out the cruft and allow individual books to focus on their core competencies, I have high hopes that a focused book on the x86/C material could become a featured book here on wikibooks. Other books would initially be in a poor, stub-like state, but they would be able to expand more freely and more naturally without having to fit into the framework that this book has established.

To that effect, I have created an outline for this project here: User:Whiteknight/Reverse Engineering. In this outline, you can see what books I intend to create, Where the existing pages from this book are going to end up, and how things will look in the end. I welcome comments/questions/suggestions on this process. If there are no objections, I would like to start this project within a week. During this process, no material is going to be deleted, but instead I expect much of it to rapidly grow and expand beyond what it is now. --Whiteknight (Page) (Talk) 16:35, 2 January 2008 (UTC)

If you think a high-quality book that focuses on one thing can be extracted from this random collection of pages, go for it! --DavidCary (talk) 09:53, 4 January 2008 (UTC)

Alas, I'm not so hot on creating yet more stubby books with the remnants. I'm a big Big Buckets First fan. So while we might *eventually* build those other books you suggested, I would prefer to keep them all in one book for now.

While we're shuffling things around: May I loosen up the first page so it claims to be about all kinds of reverse engineering, not merely software engineering? --DavidCary (talk) 09:53, 4 January 2008 (UTC)

You are right, the stubby books from the proposal are basically designs for the future, and I am going to expand outlines for them before I create those books. I want to create the "x86 Disassembly" book first, and the rest can wait until the plans are more mature. As to expanding the first page, you are more then welcome. However, once we start shuffling pages around and separating out into books, all sorts of things are going to be changed. I'm planning to do this all early next week, unless there are any major complaints posted. --Whiteknight (Page) (Talk) 15:30, 4 January 2008 (UTC)