Introduction to Software Engineering/Reengineering/Reverse Engineering

Reverse engineering is the process of discovering the technological principles of a human made device, object or system through analysis of its structure, function and operation. It often involves taking something (e.g., a mechanical device, electronic component, or software program) apart and analyzing its workings in detail to be used in maintenance, or to try to make a new device or program that does the same thing without using or simply duplicating (without understanding) any part of the original.

Reverse engineering has its origins in the analysis of hardware for commercial or military advantage.^[1] The purpose is to deduce design decisions from end products with little or no additional knowledge about the procedures involved in the original production. The same techniques are subsequently being researched for application to legacy software systems, not for industrial or defence ends, but rather to replace incorrect, incomplete, or otherwise unavailable documentation.^[2]

Motivation

Reasons for reverse engineering:

Interoperability.
Lost documentation: Reverse engineering often is done because the documentation of a particular device has been lost (or was never written), and the person who built it is no longer available. Integrated circuits often seem to have been designed on obsolete, proprietary systems, which means that the only way to incorporate the functionality into new technology is to reverse-engineer the existing chip and then re-design it.
Product analysis. To examine how a product works, what components it consists of, estimate costs, and identify potential patent infringement.
Digital update/correction. To update the digital version (e.g. CAD model) of an object to match an "as-built" condition.
Security auditing.
Acquiring sensitive data by disassembling and analysing the design of a system component.^[3]
Military or commercial espionage. Learning about an enemy's or competitor's latest research by stealing or capturing a prototype and dismantling it.
Removal of copy protection, circumvention of access restrictions.
Creation of unlicensed/unapproved duplicates.
Materials harvesting, sorting, or scrapping.^[4]
Academic/learning purposes.
Curiosity.
Competitive technical intelligence (understand what your competitor is actually doing versus what they say they are doing).
Learning: learn from others' mistakes. Do not make the same mistakes that others have already made and subsequently corrected.

Reverse engineering of machines

As computer-aided design (CAD) has become more popular, reverse engineering has become a viable method to create a 3D virtual model of an existing physical part for use in 3D CAD, CAM, CAE or other software.^[5] The reverse-engineering process involves measuring an object and then reconstructing it as a 3D model. The physical object can be measured using 3D scanning technologies like CMMs, laser scanners, structured light digitizers or Industrial CT Scanning (computed tomography). The measured data alone, usually represented as a point cloud, lacks topological information and is therefore often processed and modeled into a more usable format such as a triangular-faced mesh, a set of NURBS surfaces or a CAD model.

Reverse engineering is also used by businesses to bring existing physical geometry into digital product development environments, to make a digital 3D record of their own products or to assess competitors' products. It is used to analyse, for instance, how a product works, what it does, and what components it consists of, estimate costs, and identify potential patent infringement, etc.

Value engineering is a related activity also used by businesses. It involves de-constructing and analysing products, but the objective is to find opportunities for cost cutting.

Reverse engineering of software

The term reverse engineering as applied to software means different things to different people, prompting Chikofsky and Cross to write a paper researching the various uses and defining a taxonomy. From their paper, they state, "Reverse engineering is the process of analyzing a subject system to create representations of the system at a higher level of abstraction."^[6] It can also be seen as "going backwards through the development cycle".^[7] In this model, the output of the implementation phase (in source code form) is reverse-engineered back to the analysis phase, in an inversion of the traditional waterfall model. Reverse engineering is a process of examination only: the software system under consideration is not modified (which would make it re-engineering). Software anti-tamper technology is used to deter both reverse engineering and re-engineering of proprietary software and software-powered systems. In practice, two main types of reverse engineering emerge. In the first case, source code is already available for the software, but higher-level aspects of the program, perhaps poorly documented or documented but no longer valid, are discovered. In the second case, there is no source code available for the software, and any efforts towards discovering one possible source code for the software are regarded as reverse engineering. This second usage of the term is the one most people are familiar with. Reverse engineering of software can make use of the clean room design technique to avoid copyright infringement.

On a related note, black box testing in software engineering has a lot in common with reverse engineering. The tester usually has the API, but their goals are to find bugs and undocumented features by bashing the product from outside.

Other purposes of reverse engineering include security auditing, removal of copy protection ("cracking"), circumvention of access restrictions often present in consumer electronics, customization of embedded systems (such as engine management systems), in-house repairs or retrofits, enabling of additional features on low-cost "crippled" hardware (such as some graphics card chip-sets), or even mere satisfaction of curiosity.

The Certified Reverse Engineering Analyst (CREA) is a certification provided by the IACRB that certifies candidates are proficient in reverse engineering software.

Binary software

This process is sometimes termed Reverse Code Engineering, or RCE.^[8] As an example, decompilation of binaries for the Java platform can be accomplished using Jad. One famous case of reverse engineering was the first non-IBM implementation of the PC BIOS which launched the historic IBM PC compatible industry that has been the overwhelmingly dominant computer hardware platform for many years. An example of a group that reverse-engineers software for enjoyment (and to distribute registration cracks) is CORE which stands for "Challenge Of Reverse Engineering". Reverse engineering of software is protected in the U.S. by the fair use exception in copyright law.^[9] The Samba software, which allows systems that are not running Microsoft Windows systems to share files with systems that are, is a classic example of software reverse engineering,^[10] since the Samba project had to reverse-engineer unpublished information about how Windows file sharing worked, so that non-Windows computers could emulate it. The Wine project does the same thing for the Windows API, and OpenOffice.org is one party doing this for the Microsoft Office file formats. The ReactOS project is even more ambitious in its goals, as it strives to provide binary (ABI and API) compatibility with the current Windows OSes of the NT branch, allowing software and drivers written for Windows to run on a clean-room reverse-engineered GPL free software or open-source counterpart.

Binary software techniques

Reverse engineering of software can be accomplished by various methods. The three main groups of software reverse engineering are

Analysis through observation of information exchange, most prevalent in protocol reverse engineering, which involves using bus analyzers and packet sniffers, for example, for accessing a computer bus or computer network connection and revealing the traffic data thereon. Bus or network behavior can then be analyzed to produce a stand-alone implementation that mimics that behavior. This is especially useful for reverse engineering device drivers. Sometimes, reverse engineering on embedded systems is greatly assisted by tools deliberately introduced by the manufacturer, such as JTAG ports or other debugging means. In Microsoft Windows, low-level debuggers such as SoftICE are popular.
Disassembly using a disassembler, meaning the raw machine language of the program is read and understood in its own terms, only with the aid of machine-language mnemonics. This works on any computer program but can take quite some time, especially for someone not used to machine code. The Interactive Disassembler is a particularly popular tool.
Decompilation using a decompiler, a process that tries, with varying results, to recreate the source code in some high-level language for a program only available in machine code or bytecode.

Source code

A number of UML tools refer to the process of importing and analysing source code to generate UML diagrams as "reverse engineering". See List of UML tools.

Reverse engineering of protocols

Protocols are sets of rules that describe message formats and how messages are exchanged (i.e., the protocol state-machine). Accordingly, the problem of protocol reverse-engineering can be partitioned into two subproblems; message format and state-machine reverse-engineering.

The message formats have traditionally been reverse-engineered through a tedious manual process, which involved analysis of how protocol implementations process messages, but recent research proposed a number of automatic solutions ^[11]^[12]^[13]. Typically, these automatic approaches either group observed messages into clusters using various clustering analyses, or emulate the protocol implementation tracing the message processing.

There has been less work on reverse-engineering of state-machines of protocols. In general, the protocol state-machines can be learned either through a process of offline learning, which passively observes communication and attempts to build the most general state-machine accepting all observed sequences of messages, and online learning, which allows interactive generation of probing sequences of messages and listening to responses to those probing sequences. In general, offline learning of small state-machines is known to be NP-complete ^[14], while online learning can be done in polynomial time ^[15]. An automatic offline approach has been demonstrated by Comparetti at al.^[13]. and an online approach very recently by Cho et al.^[16].

Other components of typical protocols, like encryption and hash functions, can be reverse-engineered automatically as well. Typically, the automatic approaches trace the execution of protocol implementations and try to detect buffers in memory holding unencrypted packets ^[17].

Reverse engineering of integrated circuits/smart cards

Reverse engineering is an invasive and destructive form of analyzing a smart card. The attacker grinds away layer by layer of the smart card and takes pictures with an electron microscope. With this technique, it is possible to reveal the complete hardware and software part of the smart card. The major problem for the attacker is to bring everything into the right order to find out how everything works. Engineers try to hide keys and operations by mixing up memory positions, for example, busscrambling.^[18]^[19] In some cases, it is even possible to attach a probe to measure voltages while the smart card is still operational. Engineers employ sensors to detect and prevent this attack.^[20] This attack is not very common because it requires a large investment in effort and special equipment that is generally only available to large chip manufacturers. Furthermore, the payoff from this attack is low since other security techniques are often employed such as shadow accounts.

Reverse engineering for military applications

Reverse engineering is often used by militaries in order to copy other nations' technologies, devices or information that have been obtained by regular troops in the fields or by intelligence operations. It was often used during the Second World War and the Cold War. Well-known examples from WWII and later include

Jerry can: British and American forces noticed that the Germans had gasoline cans with an excellent design. They reverse-engineered copies of those cans. The cans were popularly known as "Jerry cans".
Tupolev Tu-4: Three American B-29 bombers on missions over Japan were forced to land in the USSR. The Soviets, who did not have a similar strategic bomber, decided to copy the B-29. Within a few years, they had developed the Tu-4, a near-perfect copy.
V2 Rocket: Technical documents for the V2 and related technologies were captured by the Western Allies at the end of the war. Soviet and captured German engineers had to reproduce technical documents and plans, working from captured hardware, in order to make their clone of the rocket, the R-1, which began the postwar Soviet rocket program that led to the R-7 and the beginning of the space race.
Vympel K-13/R-3S missile (NATO reporting name AA-2 Atoll), a Soviet reverse-engineered copy of the AIM-9 Sidewinder, made possible after a Taiwanese AIM-9B hit a Chinese MiG-17 without exploding; amazingly, the missile became lodged within the airframe, the pilot returning to base with what Russian scientists would describe as a university course in missile development.
BGM-71 TOW Missile: In May 1975, negotiations between Iran and Hughes Missile Systems on co-production of the TOW and Maverick missiles stalled over disagreements in the pricing structure, the subsequent 1979 revolution ending all plans for such co-production. Iran was later successful in reverse-engineering the missile and are currently producing their own copy: the Toophan.
China has reversed engineered many examples of Western and Russian hardware, from fighter aircraft to missiles and HMMWV cars.

Legality

In the United States even if an artifact or process is protected by trade secrets, reverse-engineering the artifact or process is often lawful as long as it is obtained legitimately.^[21] Patents, on the other hand, need a public disclosure of an invention, and therefore, patented items do not necessarily have to be reverse-engineered to be studied. (However, an item produced under one or more patents could also include other technology that is not patented and not disclosed.) One common motivation of reverse engineers is to determine whether a competitor's product contains patent infringements or copyright infringements.

The reverse engineering of software in the US is generally illegal because most EULA prohibit it, and courts have found such contractual prohibitions to override the copyright law; see Bowers v. Baystate Technologies.^[22]^[23] Article 6 of the 1991 EU Computer Programs Directive allows reverse engineering for the purposes of interoperability, but prohibits it for the purposes of creating a competing product, and also prohibits the public release of information obtained through reverse engineering of software.^[24]^[25]^[26]

References

↑ Chikofsky, E. J.; Cross, J. H., II (1990). "Reverse Engineering and Design Recovery: A Taxonomy". IEEE Software. 7 (1): 13–17. doi:10.1109/52.43044. {{cite journal}}: Unknown parameter |lastauthoramp= ignored (|name-list-style= suggested) (help).
↑ A Survey of Reverse Engineering and Program Comprehension. Michael L. Nelson, April 19, 1996, ODU CS 551 - Software Engineering Survey. Furthermore, reverse engineering concept is used to modify or change premade .dll files in an operating system
↑ Internet Engineering Task Force RFC 2828 Internet Security Glossary
↑ http://scrappingmetal.blogspot.com/2010/10/reverse-engineering.html
↑ T. Varady, R. R. Martin, J. Cox, Reverse Engineering of Geometric Models—An Introduction, Computer Aided Design 29 (4), 255-268, 1997.
↑ Chikofsky, E.J. (1990). "Reverse Engineering and Design Recovery: A Taxonomy in IEEE Software". IEEE Computer Society: 13–17. {{cite journal}}: Unknown parameter |coauthors= ignored (|author= suggested) (help); Unknown parameter |month= ignored (help)
↑ Warden, R. (1992). Software Reuse and Reverse Engineering in Practice. London, England: Chapman & Hall. pp. 283–305.
↑ Chuvakin, Anton (2004). Security Warrior (1st ed.). O'Reilly. {{cite book}}: |access-date= requires |url= (help); Unknown parameter |coauthors= ignored (|author= suggested) (help); Unknown parameter |month= ignored (help)
↑ See Samuelson, Pamela; Scotchmer, Suzanne (2002). "The Law and Economics of Reverse Engineering". Yale Law Journal. 111 (7): 1575–1663. doi:10.2307/797533. JSTOR 797533. {{cite journal}}: Unknown parameter |lastauthoramp= ignored (|name-list-style= suggested) (help).
↑ "Samba: An Introduction". 2001-11-27. Retrieved 2009-05-07.
↑ W. Cui, J. Kannan, and H. J. Wang. Discoverer: Automatic protocol reverse engineering from network traces. In Proceedings of 16th USENIX Security Symposium on USENIX Security Symposium, pages 1-14.
↑ W. Cui, M. Peinado, K. Chen, H. J. Wang, and L. Irún-Briz. Tupni: Automatic reverse engineering of input formats. In Proceedings of the 15th ACM Conference on Computer and Communications Security, pages 391-402. ACM, Oct 2008.
↑ ^a ^b P. M. Comparetti, G. Wondracek, C. Kruegel, and E. Kirda. Prospex: Protocol specification extraction. In Proceedings of the 2009 30th IEEE Symposium on Security and Privacy, pages 110-125, Washington, 2009. IEEE Computer Society.
↑ E. M. Gold. Complexity of automaton identification from given data. Information and Control, 37(3):302-320, 1978.
↑ D. Angluin. Learning regular sets from queries and counterexamples. Information and Computation, 75(2):87-106, 1987.
↑ C.Y. Cho, D. Babic, R. Shin, and D. Song. Inference and Analysis of Formal Models of Botnet Command and Control Protocols, 2010 ACM Conference on Computer and Communications Security.
↑ Polyglot: automatic extraction of protocol message format using dynamic binary analysis. J. Caballero, H. Yin, Z. Liang, and D. Song. Proceedings of the 14th ACM conference on Computer and communications security, p. 317-329.
↑ Wolfgang Rankl, Wolfgang Effing, Smart Card Handbook (2004)
↑ T. Welz: Smart cards as methods for payment (2008), Seminar ITS-Security Ruhr-Universität Bochum, "http://www.crypto.rub.de/its_seminar_ws0708.html"
↑ David C. Musker: Protecting & Exploiting Intellectual Property in Electronics, IBC Conferences, 10 June 1998
↑ http://www.memagazine.org/contents/current/features/trade101/trade101.html
↑ http://www.utsystem.edu/ogc/intellectualproperty/baystatevbowersdiscussion.htm
↑ http://www.infoworld.com/d/developer-world/contract-case-could-hurt-reverse-engineering-337
↑ http://eur-lex.europa.eu/LexUriServ/LexUriServ.do?uri=CELEX:31991L0250:EN:HTML
↑ http://books.google.com/books?id=KJmNGglq0nwC&pg=PA321&lpg=PA321&dq=e+European+Software+Directive+reverse+engineering&source=bl&ots=D-fjaWSI4Y&sig=47VJ-tdmg8abUjEjEtvYueC4WKU&hl=en&ei=SIGITJDxI8GLswa4kpScCg&sa=X&oi=book_result&ct=result&resnum=3&ved=0CBwQ6AEwAg#v=onepage&q=e%20European%20Software%20Directive%20reverse%20engineering&f=false
↑ http://www.jenkins.eu/articles-general/reverse-engineering.asp

External links

What Is Reverse Engineering define more nicely here
Java Call Trace to UML Sequence Diagram A reverse engineering tool for Java. This tool helps you to reverse engineer UML Sequence Diagram for your java program at runtime. It works well with both complex java programs (that have multiple threads) and J2EE applications deployed on Application Servers.
CASE Tools for Reverse Code Engineering
The Reverse Code Engineering Community

[1] Chikofsky, E. J.; Cross, J. H., II (1990). "Reverse Engineering and Design Recovery: A Taxonomy". IEEE Software. 7 (1): 13–17. doi:10.1109/52.43044. {{cite journal}}: Unknown parameter |lastauthoramp= ignored (|name-list-style= suggested) (help).

[2] A Survey of Reverse Engineering and Program Comprehension. Michael L. Nelson, April 19, 1996, ODU CS 551 - Software Engineering Survey. Furthermore, reverse engineering concept is used to modify or change premade .dll files in an operating system

[rfc2828-3] Internet Engineering Task Force RFC 2828 Internet Security Glossary

[4] ttp://scrappingmetal.blogspot.com/2010/10/reverse-engineering.html

[5] T. Varady, R. R. Martin, J. Cox, Reverse Engineering of Geometric Models—An Introduction, Computer Aided Design 29 (4), 255-268, 1997.

[6] Chikofsky, E.J. (1990). "Reverse Engineering and Design Recovery: A Taxonomy in IEEE Software". IEEE Computer Society: 13–17. {{cite journal}}: Unknown parameter |coauthors= ignored (|author= suggested) (help); Unknown parameter |month= ignored (help)

[7] Warden, R. (1992). Software Reuse and Reverse Engineering in Practice. London, England: Chapman & Hall. pp. 283–305.

[8] Chuvakin, Anton (2004). Security Warrior (1st ed.). O'Reilly. {{cite book}}: |access-date= requires |url= (help); Unknown parameter |coauthors= ignored (|author= suggested) (help); Unknown parameter |month= ignored (help)

[9] See Samuelson, Pamela; Scotchmer, Suzanne (2002). "The Law and Economics of Reverse Engineering". Yale Law Journal. 111 (7): 1575–1663. doi:10.2307/797533. JSTOR 797533. {{cite journal}}: Unknown parameter |lastauthoramp= ignored (|name-list-style= suggested) (help).

[10] "Samba: An Introduction". 2001-11-27. Retrieved 2009-05-07.

[11] W. Cui, J. Kannan, and H. J. Wang. Discoverer: Automatic protocol reverse engineering from network traces. In Proceedings of 16th USENIX Security Symposium on USENIX Security Symposium, pages 1-14.

[12] W. Cui, M. Peinado, K. Chen, H. J. Wang, and L. Irún-Briz. Tupni: Automatic reverse engineering of input formats. In Proceedings of the 15th ACM Conference on Computer and Communications Security, pages 391-402. ACM, Oct 2008.

[M._Comparetti,_G._Wondracek_pages_110-125-13] P. M. Comparetti, G. Wondracek, C. Kruegel, and E. Kirda. Prospex: Protocol specification extraction. In Proceedings of the 2009 30th IEEE Symposium on Security and Privacy, pages 110-125, Washington, 2009. IEEE Computer Society.

[14] E. M. Gold. Complexity of automaton identification from given data. Information and Control, 37(3):302-320, 1978.

[15] D. Angluin. Learning regular sets from queries and counterexamples. Information and Computation, 75(2):87-106, 1987.

[16] C.Y. Cho, D. Babic, R. Shin, and D. Song. Inference and Analysis of Formal Models of Botnet Command and Control Protocols, 2010 ACM Conference on Computer and Communications Security.

[17] Polyglot: automatic extraction of protocol message format using dynamic binary analysis. J. Caballero, H. Yin, Z. Liang, and D. Song. Proceedings of the 14th ACM conference on Computer and communications security, p. 317-329.

[18] Wolfgang Rankl, Wolfgang Effing, Smart Card Handbook (2004)

[19] T. Welz: Smart cards as methods for payment (2008), Seminar ITS-Security Ruhr-Universität Bochum, "http://www.crypto.rub.de/its_seminar_ws0708.html"

[20] David C. Musker: Protecting & Exploiting Intellectual Property in Electronics, IBC Conferences, 10 June 1998

[21] ttp://www.memagazine.org/contents/current/features/trade101/trade101.html

[22] ttp://www.utsystem.edu/ogc/intellectualproperty/baystatevbowersdiscussion.htm

[23] ttp://www.infoworld.com/d/developer-world/contract-case-could-hurt-reverse-engineering-337

[24] ttp://eur-lex.europa.eu/LexUriServ/LexUriServ.do?uri=CELEX:31991L0250:EN:HTML

[25] ttp://books.google.com/books?id=KJmNGglq0nwC&pg=PA321&lpg=PA321&dq=e+European+Software+Directive+reverse+engineering&source=bl&ots=D-fjaWSI4Y&sig=47VJ-tdmg8abUjEjEtvYueC4WKU&hl=en&ei=SIGITJDxI8GLswa4kpScCg&sa=X&oi=book_result&ct=result&resnum=3&ved=0CBwQ6AEwAg#v=onepage&q=e%20European%20Software%20Directive%20reverse%20engineering&f=false

[26] ttp://www.jenkins.eu/articles-general/reverse-engineering.asp

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]