Reverse Engineering/Introduction

From Wikibooks, the open-content textbooks collection

Jump to: navigation, search

Reverse Engineering, or "reversing," is a term that carries various connotations, many of which are negative. But Reverse Engineering does play a vital role in the legitimate process of software and hardware development.

All products (hardware and software) ship with two components: the product itself, and the documentation. Most documentation however, is poor and incomplete. Where then can an end-user turn to get more information? It turns out that all the information you need to use a product exists not in the documentation, but directly in the product itself. All that's needed are tools and skills to gather the information. Gathering such information from closed-source software is one form of reverse engineering.

A chain is only as strong as its weakest link. Software developers all depend on at least some pieces of externally prepared software: libraries, compilers, and operating systems. Almost all software is dependent on other software and ensuring the security and reliability of your own software often involves assurance as to the security and reliability of all your software's dependencies.

Reverse Engineering is the process of examining how software works, and drawing useful conclusions from that data. The "bad guys" can reverse engineer code to find bugs to exploit, so surely the "good guys" can reverse engineer code to find bugs that need fixing. Every day people are reverse engineering software components to gather information that the documentation leaves out. Every day people are deciphering proprietary file formats to maintain compatibility and lines of communication. Every day developers are examining their own code, or the code of others, to find and plug holes before they are exploited.

Common uses of reverse engineering include:

  • recovery of business data from proprietary file formats
  • creation of hardware documentation from binary drivers, often for producing Linux drivers from Windows or Macintosh drivers
  • enhancing consumer electronics devices
  • malware analysis
  • malware creation, often involving a search for security holes
  • discovery of undocumented APIs that may be useful
  • criminal investigation
  • copyright and patent litigation
  • breaking software copy protection (legally and not), often for games and expensive engineering software

Contents

[edit] Why have a Wikibook on Reverse Engineering?

The question invariably arises, "why bother?" Why should we write an entire book on this subject when half of it is considered to be taboo, and the rest can be called archaic? Or, for that matter, some people will claim that 100% of this book is taboo. Reverse engineering is associated with hackers and crackers, and various computer law breakers as a means to subvert security, steal secrets, and destroy data. However, in the context of security, reverse engineering gives the "good guys" an insight into what the "bad guys" are doing and how. In a limited sense, reverse engineering can be used to debug software. Sometimes it isn't enough to simply read the source code of a given program, but rather to examine the program in action. Reverse engineering gives us the tools to answer the tough questions:

  • What does this particular piece of software do?
  • Where can I find information on undocumented or unsupported software?
  • How do I know precisely how this software interacts with my system?
  • How do I determine if a piece of software is malicious or legitimate?
  • Where can I find information on undocumented or unsupported software?
  • How do I know precisely how this software interacts with my system?
  • How do I determine if a piece of software is malicious or legitimate?
  • How strong is this piece of hardware?

In essence, reverse engineering provides the tools to not only understand software in lieu of documentation, but in fact understand software better than with the documentation alone.

[edit] What is Reverse Engineering?


What exactly is reverse engineering? In a general sense, reverse engineering is simply an effort to try and recreate the design of a product by examining the product itself. Reverse engineering is the process of asking "how did they do that?" and then trying to do it yourself. In terms of software however, reverse engineering involves examining what a piece of software does, and how it does it.

This book covers many diverse topics. It starts with a discussion of common "reverse engineering tools" such as disassemblers, decompilers, and debuggers. It moves along to discuss low-level details of common system architectures and file formats. It then proceeds to talk about details of the compilation process, and how high-level code becomes low-level instructions.

Unfortunately, the title "Disassemblers, Debuggers, System Architectures, File Formats, Compilers and Low-level code Generation" doesn't roll off the tongue as well as "Reverse Engineering" does.

[edit] Ethics

Many people immediately ask themselves, "Isn't reverse engineering something that crackers and criminals do?" and to some degree the answer is yes. Crackers and virus writers all examine existing code to find weaknesses to exploit. However, this is not the only use of reverse engineering tools. For instance, a VCR may be used to illegally duplicate copyrighted movies, but it may also be used to play back precious home videos. Similarly there are two sides to this sword: these tools may certainly be used to hurt, but they can much more often be used to help.

[edit] Law

Certain applications of reverse engineering are illegal in certain countries around the world. This book would like to be as informative as possible without landing in legal hot water. As such, examples that may be illegal to test should be avoided. This book is written only in the interests of free information, and is not intended as a guide for criminal activity.

Computer software is often subjected to patent and copyright considerations. However, certain aspects of an application, such as algorithms, are frequently not covered by copyright. Also, many programs provide inadequate documentation to explain certain features, and reverse engineering is often legal to gain more information in this condition.

For specific issues that may or may not be illegal in your area, consult your lawyer before attempting anything.

[edit] What Will This Book Cover?

This book is going to cover many topics.

  • tools and techniques for reverse engineering compiled, machine-language code on Intel-compatible machines.
  • topics related to reverse-engineering byte-code programs, including those created with C# (and the .NET platform in general) and Java.

These first chapters form the "Fundamentals" category.

  • "Advanced Topics" chapters discuss topics that are not inter-related, and are not necessarily related (except in principal) to the fundamentals chapters. These advanced topics are intended for readers who want to push the limits of what is possible with reverse engineering techniques. The list of topics to be covered in the advanced section is tentative (pending more contributions in those areas). Topics to be discussed in the advanced section will range from reverse engineering over a network, to security, and cracking and patching.

A good reverse engineer needs to have substantial skills in reading software (if not writing it), and they also need to have solid problem-solving skills. Reverse engineers need also be able to research effectively, because reverse engineering can raise many questions. This book will not teach how to program, how to program well, how to engineer, how to study, how to research, etc. It is expected that readers of this topic will have those skills beforehand.

At the end of the table of contents is an "Examples" section, that will contain problems and case-studies.

[edit] Notes on the Text

Most sections (if not every section and chapter) will contain multiple examples on the subject matter. Reverse engineering is best taught by doing, and so many examples of code (especially uncommented code) will be presented and examined. Larger "case studies" will usually be separated out into their own chapters, included in either the respective section, or in the examples section at the end.

This book is separated into a number of section headings, and each section is divided into sub-pages. The first sub-page or two under each section generally will discuss the goals and contents of the section in more detail.

In terms of computer science, this book represents a large foundation of knowledge in computer science practice, as opposed to many other tomes of computer science theory. This book draws on many diverse topics--many of which are very advanced--and it is therefore recommended that users come in with at least some knowledge of computer science before attempting this wikibook. Successful readers will generally have knowledge of at least 1 high-level language (C or C++ recommended), and at least one assembly language (x86 Assembly is recommended).

Some materials in this book are mainly based on the x86 machine architecture, but the concepts discussed also apply in some fashion to other machine architectures as well.

[edit] Suggested Reading Order

Because of the sheer magnitude of information that may eventually be included in this wikibook, and the varying subjects that may be covered, it is not recommended that the book be read "Cover to cover." Here, we will discuss the different sections present in the table of contents, and discuss what each section covers. With this information in hand, the reader will be able to make an informed decision about what chapters to read, what chapters to skip, and what order to read things in.

Section 1 is important for everybody to read because it covers tools specific to reverse engineering that most people will have no familiarity with whatsoever. Also, this section will provide a brief overview of program construction tools, such as assemblers and compilers. It is important to familiarize yourself with all these tools before proceeding.

Section 2 covers specifics on Operating Systems and may therefore also be skipped by people with a solid understanding of their system. As an addendum, chapters in section 3 may be selected based on individual systems. A Windows user may ignore sections on Linux, and vise-versa. Also, section 2 will cover some proprietary file formats, specifically the PE and ELF executable formats. Section 2 will also discuss some other topics of operating systems, such as dynamically-linked libraries.

Section 3 covers some of the specifics of the compilation process, and discusses how high-level code is converted into machine code. Here, we will discuss how structures in the machine code can be converted back into a high-level representation. Few people who are familiar with compilation and software development will be privy to all this information, and therefore this section should be read by everybody. Section 3 will begin to examine raw assembly code, and show how various data structures and control structures from high-level languages are implemented in assembly.

Section 4 will cover advanced topics in reading and interpreting disassembled code. It is therefore an extension on Section 3, and it should not be read before reading section 3. Section 4 will cover such topics as the use of floating point numbers in program control and data structures, Code optimization, interleaving, and other issues raised by modern compilers.

Section 5 is based on bytecodes, specifically Java and .NET. people only interested in bytecode can skip sections 3 and 4. people not interested in bytecode reversing can likewise skip section 5. People new to software concepts may wish to skip section 5 at first. This section may also introduce the ".NET Assembly" intermediate language, and will show how .NET source code (primarily C# and VB.NET) are converted into the .NET assembly.

Section 6 discusses reverse engineering in the context of computer networks. Here, we will examine computer communication protocols (IP, TCP, etc), and packet sniffers. We will look at the reverse engineering of communications protocols, and programs such as AIM and Windows Networking. Information discussed in this section will not be used throughout the rest of the book. Therefore readers who are not interested in this subject may skip this section. Since section 6 does not depend on source code, reading section 6 does not require reading any previous sections or chapters.

Section 7 is going to begin the discussion of computer security issues. We will discuss some common computer problems such as stack and heap overflows. We will also discuss how to avoid such overflows, and detect malware. Chapters in section 7 are dependant on x86 stack architecture, and therefore reading sections 1, 3 and 4 are recommended, if not required. Section 7 can be skipped, but because of the relevance of the topic in contemporary society, readers are encouraged to read it.

Section 8 will discuss the reverse engineering of proprietary file formats. We will examine proprietary files and the software associated with them to determine how information is stored in those files. Once it is discovered how a file type is formatted, we can create programs to interact with proprietary files. Section 8 relies on information discussed in Sections 3 and 4.

Section 9 is going to discuss programming techniques that can help prevent reverse engineering. This section will briefly touch on such subjects as code obfuscation, opaque predicates, and code encryption. This section requires knowledge of sections 3 and 4.

Section ... will discuss reverse engineering using JTAG.

Further sections are planned for this book, but currently have not been implemented.

[edit] Development Stages

Here is a brief explanation of the development stages used in this book:

00%00% developed : The page either doesn't exist, or does exist but doesn't yet have any information. This page is most likely a stub for a section that should be included in the book, and will be written eventually.
25%25% developed : The page has a basic framework. Important headings are listed, but they all might be stubs. May have some information, but should be considered to be in a very early stage of development.
50%50% developed : The page has a solid amount of information under most of it's headings. Doesn't yet have all sections filled in. Probably doesn't have many examples, diagrams, or references.
75%75% developed : The page has nearly all the information it should have, and has several examples, diagrams, and/or references.
100%100% developed : The page has all the information it should have, in addition to several good examples of the material, diagrams and important figures, and cross-references to other sources of information.

The associated dates should reflect when the status of a page was last updated, or the last time a page received a major edit.

[edit] Notes for contributers

This section contains some self-referential information, and information that only pertains to the version of this text found at http://en.wikibooks.org/wiki/Reverse_Engineering.

  1. Reverse engineering is a discipline that requires lots of knowledge and references from many subjects in the field of computer science. Cross-references to other wikibooks, outside sources, and Wikipedia articles is highly encouraged. Background information is important, but it is more important not to lose the focus of the book by swamping the reader with too much divergent information. It is, after all, expected that the reader has a certain foundation of computer knowledge. The scope of this book simply excludes those who are new to computer science.
  2. Every chapter should contain "further reading" sections. Examples with fully-worked answers should be provided whenever possible. Reverse engineering is best learned by doing.
  3. Lists of software tools should be limited to the most popular tools in any given subject area, although less-common tools may be linked to in the "Further Reading" sections of each chapter. Everybody has a personal preference, and if this book took the time to enumerate all of them, many users would lose interest in reading long before any of the actual "material" is discussed.
  4. Sections will frequently contain links to Wikipedia articles, or other Wikimedia resources, when available. Many related articles however are stubs and should probably be expanded to be of more use to this wikibook.
  5. The topic of this wikibook is implicitly "legal reverse engineering," and that should be taken into account when new material or new examples are added.
  6. This book has a few categories and templates associated with it, that should be used correctly. Here is a quick list of the resources available:

[edit] Further Reading