Reverse Engineering/Print version

From Wikibooks, open books for an open world
< Reverse Engineering
Jump to: navigation, search


Reverse Engineering

The current, editable version of this book is available in Wikibooks, the open-content textbooks collection, at
http://en.wikibooks.org/wiki/Reverse_Engineering

Permission is granted to copy, distribute, and/or modify this document under the terms of the Creative Commons Attribution-ShareAlike 3.0 License.


Contents

Basic Security

Single user systems[edit]

There are several simple ways to defend a system against malicious attack, although the measures needed vary with the use of the system, with a single-user server being the easiest to secure. The most obvious method, which is applicable to almost all types of systems, is to only run the servers/daemons which you need. If you run a thousand and one different daemons just because they came with your distro, you have to keep them up to date. For example, if you are only running 2 or 3, you can consistently check for updates, confident that you didn't forget about that tftp server you never actually use. It should go without saying that you should keep these things up to date, since the sooner you can patch/upgrade a vulnerability in a daemon or the kernel, the less vulnerable you will be to attack. A large part of it is "security through obscurity"; people or programs have to find out about it before it can be attacked.

Multiple-user systems[edit]

These are the only things needed to reasonably defend a single-user system; however, when you have many user accounts, even if you trust the people given them ( you can never \really\ trust a user like this but lets not be paranoid), there are extra security measures which need to be implemented. These include strict privileges for users accounts and a good password policy. It is one thing for an attacker to find a vulnerability in a program, but if they are simply allowed to use a dangerous program with a compromised account you will be under much more risk. If you have a complicated system where files need sharing between users, remember that you can create a separate group for these users and use the group permissions to allow the users access without allowing world period between changing passwords, a user will use very simple passwords or worse use a Post-it on their monitor.

Root-Kits[edit]

Attackers often use something called root-kits which make it seem like there is nothing untoward running despite the fact that they have control of your system. They do this by replacing common system utilities with versions which don't report the whole truth; for example, a version of ps which doesn't display processes with "0wned" in their name. These are called root kits as you have to be root (or a similar all powerful user) to install these on a system, wikipedia covers root kits. You can defend against a root kit using a Intrusion Detection System (IDS); however if you suspect that something unpleasant is happening, you can check using a known clean version of the system utilities. These can be found conveniently bundled in a package called busybox which can be found at http://www.busybox.net/download.html.

In such cases it is also possible to create ahead of time statically linked versions of all of the utilites and to then write them to a CDROM that can be used in such situations.

However these days most common root-kits are kernel-based. Which means that root-kits infect (like a parasite) the core of the operating system where all basic operations are performed and where hardware is accessed. The operating system kernel manages process scheduling, memory management, device access and device drivers. When the kernel is infected by a root-kit (backdoor), for example like a device driver, through a loadable kernel module, there is no way to tell if a system is compromised because all input/output in the system can be faked. When the rootkit is in place memory usage, cpu usage, running processes and files and directories can all be hidden.

There is no remedy in this case in a running system. The only way to check a system's integrity in such cases is by booting the system from another read-only medium which is known to be trusted and to verify the integrity of the installed system through this means. It is possible to install integrity checking software such as tripwire or aide (advanced intrusion detection system) which builds a database of signatures of installed files. At a later point in time this database can be used to check if files have been changed and it is then possible to tell if a change was authorized or not.

Attackers could leave more than one means to get back in through the use of backdoors into the system so always make sure you removed all of them. If you have no database of signatures you are usually unable to be certain that you have found all backdoors and then it is recommended to reinstall the system to regain a known clean system.



Common Solutions

Protection Mechanisms[edit]

Not many good protective measures are available to programmers to prevent most overflow vulnerabilities. However, something can be done.

Bounds Checking[edit]

New languages such as Java and C# make such a big deal over their "automatic bounds checking" and "memory management" features mainly because they help prevent stack overflows (with a small performance penalty). C programmers however are left to their own devices, and need to explicitly test the bounds on every array. It's tedious, but at least crackers won't break your program, and then you won't get fired from your job.

Canary / Cookie[edit]

Some compilers help out by building in a flag value called a canary or cookie on the stack usually just above the pushed frame pointer and the return address (think of the caged bird used in coal mines to detect the buildup of poisonous gases before the workers could get intoxicated).

push CANARY
push ebp
mov ebp, esp
sub esp, 100

Now, when the function wants to return, we can perform the following operation:

add esp, 100
mov esp, ebp
pop ebp
pop ebx ;canary value
cmp ebx, CANARY
jne _STACK_ERROR_FOUND
ret

This way we can detect if the stack has been overwritten, because the Canary value has changed. A predictable Canary value however is vulnerable: attackers that insert that value onto the stack as part of their overflow data elude detection. For this reason most Canary values are randomly generated at run-time. Many Canary values also contain two null characters at the start or end: string copy functions (like strcpy or wcscpy) stop copying data after reaching and writing a null char; if the nulls are instead omitted by the attacker the overflow will be caught.

This method of protection can catch basic overflows, and prevent a function from returning to a modified address and execute arbitrary code. However the subroutine still gets executed – with compromised internal state and variables – since the overflow get detected only when it returns. This can still be exploited by an attacker: for example, a memory pointer variable can be modified to point to an arbitary location. If the subroutine then uses this pointer to write to memory, it could overwrite anything in the program’s address space.

Pointer Sanity[edit]

Many heap overflows become effective by overwriting the housekeeping data at the start of the next heap chunk, which normally contains at least one linked list. Allocating or freeing an overwritten chunk can cause data to be written at an arbitary address in memory. Most heap systems now check the data pointed to by linked lists, to ensure that they point at another heap chunk or valid data.

This method of protection is also present in the Microsoft Windows "Structured Exception Handling" routines. Before calling an exception handler (the pointer to which resides on the stack, and can be overwritten), it is first checked to ensure that the routine resides within an executable section of memory. If the handler routine does not, then it is not called.

Safe String Libraries[edit]

Because the standard library string functions are the common cause of stack overflows, a number of libraries with "safe" string functions have appeared to try to address this problem. Most of them require an explicit “string length” parameter in their functions’ arguments, and limit the data copied to that amount.

The programmer must obviously still be careful and enter accurate string length values; sloppy programming can still cause trouble.

Exercises[edit]

We will leave as an exercise for the reader to write a set of safe string functions, that take a length parameter, and perform simple bounds-checking to prevent overflow. Another option would be to take as an argument a pointer to a "maximum" stack position, and compare pointers to prevent overflow.



Cracking Windows XP Passwords

This page is about cracking (recovering) passwords on Windows XP machines, which is a computationally difficult process. If you just need to set a new password (but without need to recover the old one), then this guide is not for you. For that, you can use, for example, the free-software tool Offline NT Password & Registry Editor or other similar programs.

Background[edit]

The Windows XP passwords are hashed using LM hash and NTLM hash (passwords of 14 or less characters) or NTLM only (passwords of 15 or more characters). The hashes are stored in c:\windows\system32\config\SAM. The SAM file is encrypted using c:\windows\system32\config\system and is locked when Windows is running. This file is a registry hive which is mounted to HKLM\SAM when windows is running. The SYSTEM account is the only account which can read this part of the registry. To get the passwords, you need to shutdown Windows, decrypt the SAM file, and then crack the hashes. If everything goes well, you'll have the passwords in 15 minutes.

The hashes can be also obtained from running system using software like pwdump. However, it requires to be run under an account with administrator privleges.

Three ways to recover Windows Password[edit]

Usually, we can recover Windows admin password in two traditional ways. The first is to change Screen password with another admin account; the second is to recover the previous password with the windows password reset disk that had been created before you forgot the password. Take Windows XP for example,

  • At the Windows XP login prompt when the password is entered incorrectly click the reset button in the login failed window.
  • Insert the password reset diskette into the computer and click Next.
  • If the correct diskette Windows XP will open a window prompting for the new password you wish to use.

However, we often ignore the importance of security until we have been locked out of computer. Fortunately, there is still the last way that can unlock your computer without reinstalling - erase Windows password with Windows password reset CD, which can recover admin password for Windows 7/XP/Vista/NT/2000/2003.... Take Windows Password unlocker for example, followings are the steps to create the reset CD

  • Download Windows Password Unlocker from Password Unlocker Official site
  • Decompress the Windows password unlocker and note that there is an .ISO image file. Burn the image file onto an blank CD with the burner freely supported by Password Unlocker.
  • Insert the newly created CD into the locked computer and re-boot it from the CD drive.
  • After launched the CD, a window pop up with all your account names(if you have several accounts) select one of the accounts that you have forgotten its password to reset it.

Detailed Instructions for LoginRecovery.com Service[edit]

  • Go to http://loginrecovery.com/ and from the home page click the option to download either the floppy disk image or CD image. If you use the floppy disk image, insert a blank floppy disk into your computer, run the program and a bootable floppy will be created. If you use the CD version, you will need to manually burn the ISO image to a CD, using software which specifically burns ISO images
  • Insert the floppy disk or CD into the target computer from which you wish to extract the passwords. Then boot the computer. You may need to alter the BIOS settings to ensure the floppy drive or CD is booted from.
  • If you used the floppy drive some messages will briefly appear on the screen and then the computer will shutdown. On the floppy disk will be a newly created file called "upload.txt" which will contain the encrypted passwords. If you used the CD version, the encrypted passwords will be shown on the screen; write them down into a text file.
  • If you wish to wait up to 48 hours or pay to get your passwords, then you can upload the file onto the LoginRecovery site. Otherwise, continue reading.
  • The file will consist of several 2-line entries, one for each account. Copy the 2 lines for the account you want and paste it into this utility to decode it into the "pwdump" format.
  • Use any of the tools in the following section to decode the pwdump hash.

Top-Password.com[edit]

How to Recover Lost Microsoft Windows XP Administrator Password

    • Use another accounts with administrator rights

If there is still another user account which you remember has administrative privileges, you can opt it for Windows XP password recovery. Restart system and boot the system into Safe Mode. Click on the icon for the administrators account at the Account Log on Screen. Once system has booted to the desktop, you reset password with following steps.

  • A. Start -> Control Panel -> Administrative Tools -> Computer Management.
  • B. Double click Local Users and Groups -> folder Users.
  • C. Right click the account user name which password was lost, then click Set Password.
  • D. Reset password - keep New Password and Confirm Password blank.
  • E. After finished, restart PC and login.
    • Burn a CD/DVD to recover lost Windows XP password

With professional password recovery software, you just need burn an ISO image file to the CD/DVD on an accessible PC.

Ophcrack demo[edit]

The easiest site to use is the online demo for Ophcrack.

  • Use PWDump or other password extraction tool to extract the passwords from the target computer. (Note: In order to work, it must be run under an Administrator account )
  • Retain only the part with the two hashes and the colon in between:
CC5E9ACBAD1B25C9AAD3B435B51404EE:996E6760CDDD8815A2C24A110CF040FB
mullet

If your password is not alphanumeric (indicated by 7 dots in part of the password, or if it says "Not found"), then you will have to use one of the following more powerful sites that contain rainbow tables for symbols as well:

Plain-Text.info[edit]

  • Use PWDump or other password extraction tool to extract the passwords from the target computer. (Note: In order to work, it must be run under an Administrator account )
  • Edit the password hash to the pwdump format (add the colon-delimited username and ID number fields in the front, and 3 colons at the end):
Administrator:500:CC5E9ACBAD1B25C9AAD3B435B51404EE:996E6760CDDD8815A2C24A110CF040FB:::
  • Go to http://plain-text.info/, click "Add Hashes", enter the hashes in the box, select "lm" as the algorithm, complete the CAPTCHA, and click submit
  • They only crack 2 hashes every 15 minutes, so you may have to wait
  • After a few minutes/hours, come back, go to "Search", type in your hash (just the LM part), and see if it is cracked
  • Read their FAQ for more info.

OnlineHashCrack.com[edit]

  • Use PWDump or other password extraction tool to extract the passwords from the target computer.
  • Go to http://www.OnlineHashCrack.com and enter the LM or NTLM hash (part before the colon) into the query field and click the "Search" button.
  • Check the status page occasionally to see if they have been cracked.
  • If the hash is not in their database, the rainbow tables will be used to find it.

Notes[edit]

  • If the information retrieved from the pwdump consists of an empty first part, then the LM hash is not stored. This means that the password is blank, in which case it would look like this:
Administrator:500:0:
_31,D6,CF,E0,D1,6A,E9,31,B7,3C,59,D7,E0,C0,89,C0,xxxxx:::

If it says anything different, then they implemented better security and force you to crack the NTLM hash, which is much more difficult and out of the scope of this guide.

  • This only works if the password is 14 characters or shorter
  • If the password in Windows 2000/XP/2003 is longer than 14 characters, it will be shortened to two hashes of length seven characters each
  • An alternative, which uses the same method of comparing known hashes against unknown is called RainbowCrack, available at http://www.antsight.com/zsl/rainbowcrack/ although this program uses Rainbow Tables that can be in excess of 64 Gb; these tables can be obtained at http://rainbowtables.shmoo.com/
  • A comprehensive project of comparing known hashes against an unknown is at http://www.rainbowcrack.com/ however it requires that you submit a Rainbow Table before you can gain access to their server

Defense against attack[edit]

Mac OS X 10.3[edit]

Mac OS X 10.3 (Panther) also stores shadowed LM+NTLM hashes for each user. They can be cracked in the same way as the hashes for Windows above

  • First find the "generateduid" for the user you want with the command
$ niutil -readprop . /users/<username> generateduid
70902C33-AC79-11DA-AFDF-000A95CD9AF8
  • The hashes are stored in the file /var/db/shadow/hash/<generateduid>. The file is 104 characters long, consisting of the 64-character NTLM+LM hashes and the 40-character SHA1 hash. To retrieve the NTLM+LM hashes, you can run this command as an administrator for example
$ sudo cut -c1-64 /var/db/shadow/hash/70902C33-AC79-11DA-AFDF-000A95CD9AF8
996E6760CDDD8815A2C24A110CF040FBCC5E9ACBAD1B25C9AAD3B435B51404EE
  • The hashes are stored in the reverse order as the pwdump format (NTLM first instead of LM first), so you need to switch the 32-character halves and insert a colon between them
CC5E9ACBAD1B25C9AAD3B435B51404EE:996E6760CDDD8815A2C24A110CF040FB
  • Then follow the instructions for Windows passwords

Mac OS X 10.4[edit]

Mac OS X 10.4 (Tiger) improves the security by only storing LM+NTLM hashes for users who enable Windows Sharing for their account; and when they do enable it, it asks them to enter their password with a warning that their password is stored in a less secure format. However, for those users with Windows Sharing enabled, the above method will still work. The shadow file format is a little different, but the LM+NTLM hashes are still the first 64 characters. If the hashes are not stored, you will get all 0's when you try to retrieve the hashes.

Samba passwords[edit]

In older versions of Samba, the password hashes for Samba users were stored in the file /etc/smbpasswd (location may vary, only root has access) and are in similar format to Windows password hashes discussed above. In newer versions of Samba, run the following as root to get the same information:

pdbedit -L -w



File Formats

This section will talk about reverse-engineering proprietary file formats. Many software developers need to reverse engineer a proprietary file format, especially for the purposes of interoperability. For example, every year the Open Office project needs to reverse engineer the Microsoft Office file formats. Furthermore, reverse engineering is required for forensics purposes. The chapters in this section will talk about how to understand a proprietary file format.

Clipboard

To do:
This entire section is in need of some help and contributions. If you know anything about this field of study, please help and contribute. This chapter might eventually even include a discussion on reverse-engineering file systems.

Typical Features[edit]

File Header[edit]

Most file formats begin with a "header," a few bytes that describe the file type and version. Because there are several incompatible file formats with the same extension (for example, ".doc" and ".cod"), the header gives a program enough additional information to see if this file is one of the formats that program can handle.

Many programmers package their data in some sort of "container format" before writing it out to disk. If they use the standard software library used for data compression to hold their data in compressed form, the file will begin with the 2 bytes ASCII code for, Unit Separator in position 1 usually signals that the data stream is a file format and a software application used for file compression and decompression(in decimal, 31 139 ).

Blank Space[edit]

Some files are made up largely of blank space, for example, . Finder and automatically remove them periodically. files generated by starting with the original, bear-branded beta.. Blank space will appear as a series of 0's in a hex editor. The creators of a file format may add blank space for a variety of reasons, for example, the author of [easy way to update Store_File_Format this study] on .Finder and automatically remove them zeitweise files speculated that they exist to speed up writing data, as other data would not need to be pushed around to make room. They could also serve to prevent fragmentation.

For most purposes, blank space can be ignored.

Tools[edit]

File format reverse engineering is the domain of hex editors. Typically they are used more often to display file contents as opposed to editing them. Hex editors allow you to superimpose a data structure on top of the data (sometimes called custom views or similar), which are very helpful. Once a particular structure has been discovered in a file, these mechanisms can be used to document the structure, as well as to provide a more meaningful display of the information than just hex code.

Also useful are Unix/Linux tools like strings(1) and file(1).

strings
Finds and prints sequences of printable characters in a file. This can give hints of what data is embedded in the file.
file
Attempts to determine a file type. Sometimes file format designers re-use already well known file formats or file compression algorithms. There is a small but notable chance that file(1) can reveal this.

Windows ports of these tools are also available. E.g. as part of the Cygwin environment (strings is part of the binutils, don't ask ...).

Equally important as a hex editor is a brain. File format reverse engineering means to reason about what the hex editor and other tools displays. To guess structures, relations and the meaning of the data, to develop theories and then verify them. Very few tools can help here.

In a few limited cases additional tools are helpful. E.g. for checking brute-force if a particular part of the file consists of some embedded, compressed data. Typically such tools are written or scripted on the fly as custom tools. Another typical set of custom tools are the ones which are used to break up a file into separate components - once it has been discovered that a particular file indeed consists of separate parts, and how they are separated in the file. C/C++, Java, but also scripting languages like Perl are often used here (Perl because it can handle binary data, while classic Unix scripting tools are often limited to text data only).

In some cases a proprietary file format might contain executable code. For example, a firmware update file for some embedded device very likely contains executable code. Typically that code is wrapped into some structure, e.g. a file system, compressed, garnished with boot/flash code etc. In such cases a disassembler/decompiler for the particular executable format might be helpful.

Further, documentation of and familiarity with checksum algorithms, compression algorithms, encoding techniques, and also programming languages is very helpful.

Also very helpful is the availability of the application that produces and reads the proprietary file format. That application can be used to create test files, but also to verify if an own generated file is correct.

Strategies[edit]

Look for the obvious first. E.g. magic numbers, a block structure, ASCII text in the file. Anything that can be more or less identified clearly can be the entry ticket to more. Once a particular structure has been identified, look for in-file pointers to that data. E.g. if the data is referenced from some other part of the file with an absolute or relative address. It is also very important to find out the byte order (little endian or big endian).

Choosing the target[edit]

If you have access to the software that created the file, you can always create files with the contents of your choosing. This makes reverse engineering substantially easier. In cryptography terms, you are engaging in a w:chosen-plaintext attack.

Probing[edit]

Once you formulate theory as to what some data in the file might mean, you can verify that theory by creating a manipulated file. Replace it with some other data using a hex editor or a custom tool. Then load the manipulated file into the original application. If the application loads the file and displays the intended change, the theory is probably correct. Sometimes it is not trivial to change the application and reload it because of the defense mechanism that may be present. Some application check the hash and signature of the code before running it.

Compression, Encryption & Scrambling[edit]

Introduction[edit]

File formats which are either in part or completely compressed, encrypted or scrambled are among the toughest nuts to crack. Of course, compression is different from encryption, and typically done for a different purpose. However, the resulting file formats often look similar: A bunch of gibberish. This is the intended result when file format designers go for encryption, but it is also often a desired side effect when compression is applied.

If checking a file with a hex editor or similar reveals that it just contains gibberish and e.g. not any easy to identify text strings, patterns or similar, it might indicate that the particular file is compressed, encrypted or scrambled. The methods for reverse engineering these files are similar. There might, however, be a big difference from a legal point of view. Many countries have laws against circumventing copy protection, and encryption can be seen as some kind of copy protection. See Reverse Engineering/Legal Aspects for some more hints regarding this, and seek qualified legal advice before attempting to reverse engineer an encrypted or otherwise protected file format. Similar issues might arise when a file format just uses scrambling. The format "owner" might argue that the scrambling is used as some kind of copy protection, encryption or whatever, and circumventing it might break some law. Again, seek qualified legal advice.

The remainder of this section only deals with reverse-engineering the compression of a file. This is typically just an initial step in the complete reverse-engineering process. Once it has been successfully decompressed, other reverse engineering methods need to be applied to identify the file contents and structure.

Well-Known Compression Algorithms[edit]

Often file format designers apply well known compression algorithms. Either in the form of even using a particular, well known implementation of a certain algorithm (a well-known tool), or by re-implementing a well known algorithm unchanged. In the easiest case this has been documented. For example, it is well documented that the OpenOffice file format uses ZIP archives, and therefore there is no point in reverse engineering that format.

Unfortunately, for many formats we don't have this documentation. In case a well-known implementation of a particular algorithm has been used it is often relatively easy to reverse engineer. Such compressed file formats tend to start with a format identifier (magic number), clearly identifying the particular compressed format. The compression tool has left its "fingerprint" in place.

Example (The following is a hex dump of the first few bytes of a fictitious firmware update file for a particular SOHO router

00000000  60 ea 27 00 1e 06 01 00  10 00 02 84 84 86 dc 34  |`.'............4|
00000010  84 86 dc 34 00 00 00 00  00 00 00 00 00 00 00 00  |...4............|
00000020  00 00 44 54 41 2e 41 52  4a 00 00 b1 18 78 a6 00  |..DTA.ARJ....x..|
00000030  00 60 ea 27 00 1e 06 01  00 10 01 00 84 4c 86 dc  |.`.'.........L..|
00000040  34 9b 17 0c 00 e8 a4 25  00 25 10 0d 10 00 00 20  |4......%.%..... |
00000050  00 00 00 44 54 41 2e 4d  45 4d 00 00 50 98 0b 8f  |...DTA.MEM..P...|
00000060  00 00 1f 30 84 dd 7b db  48 da 6f fd ee fd bb da  |...0..{.H.o.....|

):

{{{2}}}


The file is compressed with ARJ. Not only does the string DTA.ARJ give it away for the human eye, but also the first two bytes 60 ea, which are known to identify ARJ-compressed files.

The Unix/linux tool file(1) is quite aware of many standard compressed file formats.

Example (file returns the following for the above mentioned firmware file

firmware.bin: ARJ archive data, v6, slash-switched, original name: DTA.ARJ, os: MS-DOS

):

{{{2}}}


The next steps after the compression format has been discovered is obvious: To obtain a version of the used compression tool and to use it to decompress the data. The result, however, often needs more reverse engineering. For example, the above mentioned router firmware might contain separate sections for separate areas of the router's flash memory, each guarded with an own checksum.

A variant of using a well known compression algorithm and tool can also sometimes be found, which is more difficult to reverse engineer. In such a case the file is prefixed with some additional data, and the actual compression format can't be identified by just checking the file format. Lets assume, for example, another fictitious SOHO router's firmware update file, which is build as it follows:

Example (Fictitious structure of another SOHO router firmware update file:

+--------------------------+
|  Boot loader             |
+--------------------------+
|  Decompression algorithm |
+--------------------------+
|  Compressed data         |
+--------------------------+

):

{{{2}}}


Of course, the format can only be known once the file format has been reverse engineered. So how is that done? Well, in the fictitious case we assume that an inspection with the Unix/Linux tool strings(1) reveals the following interesting strings in the file:

Example (Abridged output of strings:

:
:
unknown compression method
invalid window size
incorrect header check
need dictionary
incorrect data check
invalid block type
invalid stored block lengths
too many length or distance symbols
invalid bit length repeat
 inflate 1.1.3 Copyright 1995-1998 Mark Adler 
oversubscribed dynamic bit lengths tree
incomplete dynamic bit lengths tree
oversubscribed literal/length tree
incomplete literal/length tree
oversubscribed distance tree
incomplete distance tree
empty distance tree with lengths
invalid literal/length code
invalid distance code
invalid distance code
invalid literal/length code
incompatible version
buffer error
insufficient memory
data error
stream error
file error
stream end
need dictionary
1.1.3
application.bin
:
:

):

{{{2}}}


The strings are very revealing, and those knowledgeable will recognize the name Mark Adler as one of the authors of zlib zlib, which is the base for info-zip as well as GNU's gzip. Those not so knowledgeable might at least have the idea to search for the name and the keyword compression.

It is a good bet to assume that at least parts of the file are ZIP compressed. Further probing might reveal that the file does not contain a complete ZIP archive, but just a section which is compressed with the ZIP deflate algorithm, and supposed to be decompressed with the ZIP inflate algorithm (likely version 1.1.3, as the output of strings revealed). Therefore, the fictitious file might be further separated into its components by using a custom tool which iteratively applies the inflate algorithm to the file, until the generated result makes some sense (e.g. until the result contains some recognizable clear text strings).

Unknown or homemade Compression Algorithms[edit]

If the software that either creates or reads the file is available then it is very possible to reverse the file format. You can use live analysis of the running application when reading/writing the file. Doing this is likely the easiest way to determine the data structure of the file.

If the software is not available, all bets are off if there is an unknown or homemade/ad-hoc compression algorithm, or a non-standard implementation of a known algorithm. One has to be exceptionally lucky to figure out the details of the applied algorithm, so the accompanying decompression algorithm can be constructed, although cryptologists strongly discourage the use of ad-hoc encryption schemes, as they typically do not stand up to serious cryptanalysis.

Sometimes additional information can be found. E.g. if a vendor has filed a patent application for a particular algorithm, or is known to have fallen in love with a particular compression technology in other products, e.g. communication protocols. Sometimes it might turn out that the file format actually belongs to some OEM or 3rd party product, and that information about that product is available.

Otherwise, there is a small chance that trial-and-error might reveal something about the file. e.g. run-length encodings are a popular, simple and easy to implement compression algorithm, so they can sometimes be found in homemade implementations. It might be worth a try to investigate if a file might be compressed that way. An investigation of a few other well known compression techniques might also be worth a try.

Last but not least, crypto-analysis techniques might reveal something interesting about the compression. E.g. reoccurring blocks of information might point to a particular compression algorithm. However, this requires a lot of effort, time and skill.


This page or section of the Reverse Engineering book is a stub. If you have information on this topic, write about it here.



Legal Aspects

It is quite often the case that reverse code engineering a software product is teetering on the border of legal and illegal. Note that reverse engineering a competing car or a weapon is never legally challenged, nor was reverse engineering software a few decades ago. So as a reverse engineer, you should know your rights and the rights of the software owner. This chapter will focus on just that, exploring issues surrounding patents, copyrights, and licensed software. Even if you play by the rules, you are not immune to harassment lawsuits. (NB: The material here reflects the legal position in the USA. Other juridictions may have different laws.)

Patented Software[edit]

Explain the rights of the software owner under the patent law

Copyrighted Software[edit]

There are laws about the copyright that someone who reverse-engineers must take care of in open source projects, and the common approach to this problem is to divide the programmers into 2 groups:

  1. The one who disassembles the code of the program/firmware and writes the specifications.
  2. The second group that makes a program using these specifications.

Fair Use[edit]

Under a few circumstances, fair use allows the reproduction of copyrighted material without the owner's permission. The Copyright Act of 1976, 17 U.S.C. § 107 states specifically:

Notwithstanding the provisions of sections 106 and 106A, the fair use of a copyrighted work, including such use by reproduction in copies or phonorecords or by any other means specified by that section, for purposes such as criticism, comment, news reporting, teaching (including multiple copies for classroom use), scholarship, or research, is not an infringement of copyright. In determining whether the use made of a work in any particular case is a fair use the factors to be considered shall include—

  1. the purpose and character of the use, including whether such use is of a commercial nature or is for nonprofit educational purposes;
  2. the nature of the copyrighted work;
  3. the amount and substantiality of the portion used in relation to the copyrighted work as a whole; and
  4. the effect of the use upon the potential market for or value of the copyrighted work.

The fact that a work is unpublished shall not itself bar a finding of fair use if such finding is made upon consideration of all the above factors.

In terms of reverse engineering and fair use, the law tends to favor the reverser. However, negatively effecting the value of the original product will almost never result in it being categorized as "fair use." Also keep in mind that fair use does not permit breaking the user license terms.

It needs to be noted that fair use is not black and white. The line between fair use and copyright infringement is very gray. Unless you are very confident about what you are doing, you shouldn't do it.

Digital Millennium Copyright Act[edit]

The Digital Millennium Copyright Act was put into place in 1998 in order to make any service or device with purpose of undermining or removing DRM (Digital Rights Management) copyright infringement. The act forbids any service or device from being designed to circumvent, or even being marketed to circumvent any DRM.

There is, however, an exception in the DMCA stating that reverse engineering can be done under the purposes of inter-operability between software components.[1] It states the following:

REVERSE ENGINEERING.—

  1. Notwithstanding the provisions of subsection (a)(1)(A), a person who has lawfully obtained the right to use a copy of a computer program may circumvent a technological measure that effectively controls access to a particular portion of that program for the sole purpose of identifying and analyzing those elements of the program that are necessary to achieve interoperability of an independently created computer program with other programs, and that have not previously been readily available to the person engaging in the circumvention, to the extent any such acts of identification and analysis do not constitute infringement under this title.
  2. Notwithstanding the provisions of subsections (a)(2) and (b), a person may develop and employ technological means to circumvent a technological measure, or to circumvent protection afforded by a technological measure, in order to enable the identification and analysis under paragraph (1), or for the purpose of enabling interoperability of an independently created computer program with other programs, if such means are necessary to achieve such interoperability, to the extent that doing so does not constitute infringement under this title.
  3. The information acquired through the acts permitted under paragraph (1), and the means permitted under paragraph (2), may be made available to others if the person referred to in paragraph (1) or (2), as the case may be, provides such information or means solely for the purpose of enabling interoperability of an independently created computer program with other programs, and to the extent that doing so does not constitute infringement under this title or violate applicable law other than this section.
  4. For purposes of this subsection, the term ‘interoperability’ means the ability of computer programs to exchange information, and of such programs mutually to use the information which has been exchanged.

Fair use does still apply. However, it is not fair use to gain unauthorized access to copyrighted work.[2]

End User License Agreement[edit]

An end user license agreement (or EULA) is a legal contract between the software manufacturer and the user. It explains the terms under which the user may use the software, giving a list of conditions of what the user may and may not do. This contract can states anything from the number of copies that can be made to conditions under which it can be reverse engineered.

EULA and Fair Use[edit]

Fair use seems to be safe ground for reverse engineers, almost always using it as a defense. However, an EULA is a legally binding contract. If a user agrees to terms which are in conflict with fair use, the user has effectively waived their rights to fair use.

In the case of Davidson & Associates v. Jung [3], Ross Combs, Rob Crittenden, and Jim Jung reverse engineered Blizzard's protocol language to allow gamers to play pirated video games online. In this case, the reverses agreed to an EULA and TOU (Terms of Use) prohibiting reverse engineering. The judge found the EULA and TOU to be enforceable by law and that a user's right to reverse engineer a product can be contractually waived.

Famous Cases[edit]

Atari Games Corp. v. Nintendo of America Inc. [4][edit]

When Nintendo came out with the Nintendo Entertainment System, they designed a program, the 10NES, to prevent unauthorized video games from working on the NES. In order to make an authorized game, you had to become licensed with Nintendo. And the license agreement basically stated that a company could only make five games per year and prevented them from selling the same games to other home entertainment systems.

Atari attempted to crack the 10NES in order to bypass any need for a restrictive licensee agreement. In 1986 they purchased some Nintendos and started reverse engineering. By chemically dissolving top layers of the chip containing the 10NES, they were able to use a microscope to physically look the bits and accrue some of the object code. The object code was then decompiled to source. However, Atari was unable to completely reverse the 10NES using this method.

In 1988, Atari requested a copy of the 10NES source code from the Copyright Office. To get the Copyright Office to comply, they lied and said they were involved in an infringement lawsuit with Nintendo.

Once Atari completely understood the 10NES program, they built a program that defeated it. In 1989, Nintendo filed charges against them for unfair competition, patent infringement, copyright infringement, and trade secret violations.

One of Atari's defenses was that reverse engineering was fair use under the copyright law. In the end, the courts decided the act of chemically peeling back the chip and looking at the bits to get the object code on systems they purchased was fair use. It was expected that the courts would find Atari at fault for copyright infringement for stealing the source from the Copyright Office. However, in 1994 Atari and Nintendo settled out of court.

Sega Enterprises Ltd. v. Accolade Inc. [5][edit]

This case concerned Sega's video game console and cartridges. The cartridges had a 20-25 byte code segment which was interrogated by the console, as a security measure.

Accolade disassembled the code which was common to three different Sega games cartridges, to find the security segment, and included it in competing games cartridges.

The Ninth Circuit held this disassembly to be a permitted "fair use" of the copyright in the games programs that the disassembly of copyrighted object code as a necessary step in its examination of the unprotected ideas and functional concepts embodied in the code is a fair use that is privileged by section 107 of the Copyright Act. Because disassembly was the only means of gaining access to those unprotected aspects of the program, and because Accolade has a legitimate interest in gaining such access (in order to determine how to make its cartridges compatible with the Genesis console).

Jon Johansen Case[edit]

Give a description of this case

Further Reading[edit]

References[edit]



Mac OS X

Apple Computer's Mac OS X is the standard Operating System used on Apple Macintosh computers. Other operating systems, primarily Linux, have been ported onto Mac Hardware, and there has been some effort to port OS X onto non-Mac Intel-based hardware, but neither of these efforts has attained the kind of popularity that the "standard bundle" has attained.

Mac OS X has been critically acclaimed by many people in the computer world as being both beautiful and easy to use. OS X is built on a BSD and Mach core but has a certain amount of software that is Mac-specific.

Try hard to keep this on the subject of general reverse engineering for Mac OS X, and not on 'cracking', or reversing only for security purposes. I have created special sections for these subjects, and all material focused on them should be kept there. Thanks! --Macpunk 04:17, 9 July 2007 (UTC)

Hardware Architecture[edit]

Historically before OS X Macs ran the Mac OS operating system on the Motorola 68000 through the 68040 and PowerPC architectures. Steve Jobs would later leave Apple to create NeXT. After Apple had completed its hardware migration to the PowerPC platform it looked to a new kernel that could take advantage of this new hardware architecture. Many projects were started and failed and this and other factors led to the decline of Apple. In a move to capitalize on the new architecture it turned to Be Inc. to purchase its new BeOS, this would later fall through as Be Inc. wanted too much money. Apple then turned to NeXT and acquired not only the NeXT OS but Steve Jobs. Steve Jobs would quickly take control of Apple and place the NeXT architecture as the replacement for Apple's aging Mac OS. The replacement product was originally known as Rhapsody which had the older Mac OS feel to it. Steve Jobs felt the interface did not do it justice so his team of ex-NeXT engineers developed Aqua and Mac OS X was born.

Mac OS X 10.0 "Cheetah" through 10.4.3 "Tiger" would only run on the 4th and 5th generation of the PowerPC architecture. It became clear to Apple that IBM was having trouble with the 5th generation of the PowerPC known as the PowerPC G5 both in Development and Manufacturing. In addition IBM had yet to release a laptop version of the G5 process a year after it promised Apple it would. Apple then decided to migrate away from the PowerPC architecture and to an Intel based one. Apple chose the Intel 32-bit Core Duo architecture. Apple's second generation of Intel products appeared less than a year later running the Intel 64-bit Core 2 Duo architecture.

Apple originally included a Trusted Platform Module (TPM) to help curb pirating of Mac OS X. Later, Apple would turn to a simple AES encryption system where the encryption keys were stored in a kernel device driver. This led to the ability to decrypt and even encrypt Mac OS X executable binary files. The new TPM system is no longer present in any modern Mac. The new encryption system is only available to the Intel based Macs and yields all sorts of errors if attempted on the PowerPC platform.

Apple has committed to supporting both PowerPC and Intel platforms for the next few years. Every Mac OS X system today ships with its binary files in a Universal binary format which can be ran in both PowerPC and Intel based Macs. The Universal binary is simply the source files compiled multiple times, (once for each architecture), and then glued together afterwards. When the OS reads this universal binary it will then select the proper version of that compiled code and execute it. Since not all binary files are Universal, Apple released for the Intel platforms a software component called Rosetta which would dynamically translate PowerPC system calls to Intel system calls allowing the PowerPC binary to be executed on an Intel Mac.

Software Architecture[edit]

All builds of Mac OS X (OS X) are built on top of an XNU kernel and Mach-O file format. The XNU kernel is a Hybrid kernel. The kernel is divided into 4 sections.

Kernel Sections[edit]

  1. The Hardware Platform Expert
  2. The Mach 3.0 Subsystem (OSFMK)
  3. The BSD 4.4 Subsystem
  4. The IOKit Subsystem and Framework

While the traditional Mach kernel is a Microkernel, Apple has instead implemented its variation of Mach 3.0 with a Monolithic design. The Mach subsystem is only a partial implementation of the Mach 3.0 kernel that was designed by Carnegie Mellon University. This partial implementation consists of the Mach Messaging system, Mach Virtual Memory System and Mach Process Manager.

The BSD 4.4 Subsystem is a micro implementation of the FreeBSD 4.x kernel. Over time Apple has been shrinking and reducing the feature set of this kernel subsystem in the hope to eliminate all but the essential pieces to run BSD source code software on the XNU kernel. Originally the subsystem had support for BSD device drivers, which could communicate directly to hardware. Unfortunately the device driver architecture in the BSD Subsystem is only able to support direct main memory access and to interface into the UserMode part of a running process.

The IOKit Framework is a subset of the C++ programming language known as Embedded C++. The IOKit Subsystem drives the components written in the IOKit Framework. IOKit's purpose is to unify and simplify the Driver architecture while maintaining some level of compatibility between major and minor OS releases. IOKit has generally been a resounding success as some other BSD operating systems have ported or implemented an IOKit like system to it, (such as DragonFly BSD).

The Hardware Platform Expert deals with the hardware differences of the PowerPC (G3, G4 and G5), Intel 32-bit, Intel 64-bit, Intel Xeon 64-bit (Mac Pro and XServe) and ARM (iPhone) architectures.

Commonly Used Tools[edit]

The common tools used to both compile/create the software and to disassemble/debug the software has been titled by Apple as the "Developer Tools". The developer tools can be found both on the Installation DVD for Mac OS X 10.4 and higher as well as the Apple Developer Connection (ADC) site. Joining ADC is free and is highly recommended. The ADC site has up to date documentation, tools and even sample source code. The ADC site should be your 1st place to do you research. A summary of the developer tools can be found at Apple's official XCode And Tools website. The tools commonly used on the Mac OS X platform for reverse engineering besides the developer documents are found in the list below.

Developer Tools Used[edit]

  1. gdb (GNU Debugger)
  2. nm (Object File Symbol Table Viewer)
  3. otool (Object File Display Tool)
  4. fs_usage (File System Monitoring Tool)
  5. lsof (File Descriptor Table Viewer)
  6. vmmap (Virtual Memory Regions Viewer)
  7. lipo (Universal Binary Handler)
  8. file (Binary File Format Analyzer)

All of the above tools are installed during the Developer Tools Installation. As of current writting (3 Aug 2008) the current Developer Tools version is 3.1 (Build 2199).

Third party tools:

  1. [1] class-dump is useful for parsing Objective-C runtime information.

Reversing Basics[edit]

Architecture[edit]

Since most target binaries that you wish to reverse engineer on the Mac OS X platform are in the Mach-O Universal Binary format you should decide which target binary platform you wish to reverse engineer. To get a list of what formats a specific binary has you would call the "file" program. Example:

A common example using the file "/bin/ls":

 $ file /bin/ls
 /bin/ls: Mach-O universal binary with 2 architectures
 /bin/ls (for architecture i386):       Mach-O executable i386
 /bin/ls (for architecture ppc7400):    Mach-O executable ppc

Another example, this time more of a rare one, using the file "/System/Library/Frameworks/ApplicationServices.framework/ApplicationServices"

 $ file /System/Library/Frameworks/ApplicationServices.framework/ApplicationServices
 /System/Library/Frameworks/ApplicationServices.framework/ApplicationServices: Mach-O universal binary with 4 architectures
 /System/Library/Frameworks/ApplicationServices.framework/ApplicationServices (for architecture ppc7400):       Mach-O dynamically linked shared library ppc
 /System/Library/Frameworks/ApplicationServices.framework/ApplicationServices (for architecture ppc64):         Mach-O 64-bit dynamically linked shared library ppc64
 /System/Library/Frameworks/ApplicationServices.framework/ApplicationServices (for architecture i386):          Mach-O dynamically linked shared library i386
 /System/Library/Frameworks/ApplicationServices.framework/ApplicationServices (for architecture x86_64):        Mach-O 64-bit dynamically linked shared library x86_64

Symbols[edit]

Once you have identified the architecture you wish to use as your base for reverse engineering you would then dump the symbol table. This can be handy for the future. Example:

Common symbol table dump from the i386 architecture:

 $ nm -arch i386 /bin/echo
          U ___error
 00001000 A __mh_execute_header
          U _exit
          U _malloc
          U _strerror$UNIX2003
          U _strlen
          U _write$UNIX2003
          U _writev$UNIX2003

The above symbols can be broken up into 2 major categories:

Symbol Types[edit]

  1. External
  2. Internal

There is a 3rd category of symbols which are called "hidden" or "stripped" symbols. These symbols do not show up on nm and are hard to find out what they are doing and if they exist at all.
Each symbol type has a scope. The scope can either be private or public. In the past you could set the dynamic linker to a "flat namespace" which would convert the private symbols to public for your program only, however it has been reported that this functionality has been disabled on most libraries.
A private symbol is a symbol that is addressable by either the entire program or a section of the program only and can not be addressed by anyone else. A public symbol is one that is commonly known on other platforms as "Exported". The public symbols can be accessed by anything that links to that binary either at compile time or runtime.

Internal Symbols[edit]

Internal symbols are symbols that are defined within the program and thus are not imported, (dynamically linked), during runtime. An internal symbol can however be an external symbol that was linked in at compile time and the source of that symbol was an object file or a static library. You can identify an internal symbol quickly because the line with the symbol has a hexadecimal number before the symbol type letter. External symbols have a blank space where the number should be. The number specified in the symbol table denotes at what offset in the file that symbol's code or data starts at. This value is relative and WILL be different at runtime. One way a symbol is located in memory during runtime is to find the relative positions of 2 symbols on the disk, 1st being a well known symbol and the 2nd being an unknown one, then extract the difference. Once you have the difference you can find the 2nd symbol in memory by simply apply the difference to the 1st symbols address. Example:

Example[edit]

Find the 1st and 2nd symbols:

 $ nm /System/Library/Frameworks/QTKit.framework/QTKit
   /System/Library/Frameworks/QTKit.framework/QTKit(single module):
   {...}
   0005a638 T _copyBitmapRepToGWorld
   0008b017 t _createDisplayList
   {...}

In the above example our 1st symbol is "_copyBitmapRepToGWorld" which in a program is known as "copyBitmapRepToGWorld". Our 2nd symbol is "_createDisplayList" which in a program is unknown since its a private symbol, (See private symbols). Once the function definition for the symbol "_createDisplayList" can be determined then it becomes important to define that symbol for your program's use. To do this lets assume that "_createDisplayList" C function prototype would be:

   void * createDisplayList(void);

The above prototype would be defined in the source code for the QTKit which is our target. That unfortunately doesn't help us since both the function prototype and the symbol name is unknown to our program. To resolve this problem we simple compute the difference from the above symbols, (the difference is 0x309DF), and define our function prototype as this:

   void * (* createDisplayList)(void);

Then you would assign that function its address by having another function, (such as main), execute this command before you use that function for the 1st time:

   createDisplayList = copyBitmapRepToGWorld + 0x309DF;

Some programs can get away with doing the above in 1 command outside of a function, I would NOT recommend this as the Mac OS X dynamic linker dyld sometimes will change the value of the symbol address before you enter your main function but after the variable's initial values have been defined.

External Symbols[edit]

External symbols are symbols that are defined elsewhere like in a library, (see library below). To read an external symbol you simply strip the leading "_" off. If the symbol has a "$" in its name then everything past the 1st "$" is a hint to the dynamic linker that this symbol is an explicate external symbol and should be matched with that exact version of the symbol in the external library. An explicit symbol is very helpful for a program creator since it allows him/her to make it difficult to override the symbol or to have a runtime link mismatch error. The letter to the left of the symbol name, (in the above example "U"), denotes the type of symbol such as function or data structure.

PowerPC[edit]

Basic instructions include li (load immediate) and mr (move register).

The Stack[edit]

The PowerPC stack works exactly as any other stack would. It's a LIFO structure, and it grows downwards(towards lower memory addresses). The most important detail to remember when reversing PowerPC binaries is that the PowerPC chip has no built in implementation of a stack. There's no register designated to keep track of where the bottom of the stack is, and there's no instructions to push and pop data off of the stack. Everything is done via a general purpose register, and various arithmetic instructions.


(This section will contain PowerPC specific information like how PowerPC function calls are executed, how arguments are passed to functions, the stack format, et cetera.)--Macpunk 06:19, 8 July 2007 (UTC)

Intel[edit]

(This section will contain Intel specific information like how Intel function calls are executed, how arguments are passed to functions, the stack format, et cetera.)--Macpunk 06:19, 8 July 2007 (UTC)


This page or section of the Reverse Engineering book is a stub. If you have information on this topic, write about it here.


Reversing for security[edit]

This page or section of the Reverse Engineering book is a stub. If you have information on this topic, write about it here.


Reversing for 'cracking'[edit]

This page or section of the Reverse Engineering book is a stub. If you have information on this topic, write about it here.


Further Reading[edit]

  • Wikibooks: PowerPC Assembly
  • A Brief Tutorial on Reverse Engineering OS X [2]
  • Cocoa Reverse Engineering [3]
  • KellogS' Intro to OS X Reversing [4]
  • A Non Practical & Non Real World Intro to Kracking for Mac OS X [5]
  • What is Mac OS X?[6]

Special Notes[edit]

A large section of this document has been prepared and written by JosephC7, while this information has been granted for use by Wikibooks for free publication it should be noted that the author only asks that if you republish this information that you provide the author's user name and link to his user page on wikibooks.org. No fees is required or requested for this information and it is expected that if this information is republished that it too be given away freely with out compensation. This document is a work in progress and should be completed by the end of Aug 2008.



Other Compilers

This chapter will contain a listing of compilers for languages that do not warrant their own chapters. Some language implementations are significantly unique in certain aspects that they do deserve their own discussion. In the event that a section on any given language becomes large enough, it should probably be separated out into its own chapter. If no languages get listed here, perhaps we will just delete the entire chapter.

Perl[edit]

Using the utility "perlcc" program (supplied with the Perl interpreter), a user can optionally attempt to compile the Perl code into any number of other forms. These other forms include C source code (although the C code is very hard to follow), native code, and Perl bytecode.

The perlcc C code output often consists of a series of symbol tables, and calls to internal Perl functions, so the reverser would simply not find these to be of much use. The Perl bytecode however is a little more interesting, but still not easy to read by any stretch of the imagination.

The entire Perl cross-compiler suite is listed as being "very experimental", so only advanced users should put any stock in the process.

To compile a Perl script into bytecode:

perlcc -B <filename>

The bytecode should be self-executing, or it should be able to run by passing it as an argument to Perl.

To compile a Perl script into C:

perlcc -O <filename>

OR:

perlcc -c <output file> <input file>

The -O option uses the "optimized C backend" which is considered to be the most experimental component of the entire suite.

It is unlikely that too much Perl will be considered throughout the rest of this book.

Further Reading[edit]



Stack Overflows

Frequently we hear about malicious code causing a very vague problem called a stack overflow. This page is going to talk about what a stack overflow is, and how to prevent it.

What It Is[edit]

A stack-based overflow attack is the act of putting too much information into a buffer in order to overwrite a return address and hijack the control flow. The overwritten return address will, in most cases, point to some function in the programs address space. This function may already be defined in the application, or it can easily be defined by the hacker by injecting the code into the stack.

If we remember the chapter on the stack, we know a few fundamental facts about the stack when we enter into a new function:

  1. The stack "grows" downward.
  2. Local data is pushed on top of the stack.
  3. The old value for bp is stored below the local data
  4. The return address is stored below the old bp value

Consider the following buggy C code snippet:

void MyFunction(void)
{
   int a[100];
   int i;
   for(i = 0; i <= 100; i++)
   {
      a[i] = 0;
   }
   ...

What happens when i reaches 100? As discussed earlier we know that local arrays are created on the stack. If we try to write above the upper bound of "a", we will be overwriting the previous value on the stack: a[100] overwrites bp, a[101] overwrites the return address.

The program flow will then be redirected to the new address we placed. This is a stack overflow vulnerability, and it stems from bad programming where the programmer doesn't check the array bounds before writing data to the array.

Spotting a Vulnerability[edit]

How do reversers spot a stack overflow vulnerability? Let's take a look at some example ASM code:

push ebp
mov ebp, esp
sub esp, 100

This is a standard entry sequence, and we can see that this function is allocating 100 bytes of data on the stack. Either 25 integers worth of data, or an array of some sort. We examine the rest of the function, and see what kind of data it is:

call _gets
push eax
push esp
call _strcpy
...

Clearly we are accessing the data on the stack as an array, specifically an array of chars. The above assembly code fragment gets a text string from the console, and copies that data into the local variable on the stack.

Unfortunately the standard C library string functions we are using have a well-known vulnerability: they do not check the bounds of the input arguments. In fact, the <string.h> functions rarely even ask the programmer to supply the size of an array, or the maximum available memory size!

Some of the most common stack vulnerabilities stem from this fact. Offenders to look out for are strcpy, strcat and sprintf, functions whose output string arguments can be larger then the supplied buffer to hold them.

The local variable is only 100 chars (1 char = 1 byte) wide. What happens if we input a string 100 characters long? Remember, ASCIIZ strings are terminated by a null char (00h), that requires an extra slot from the array. That means that the 101st char will be a null byte, and the saved value for ebp will be lost. Now imagine what would happen if we input 104 characters, or even 108 (enough to overwrite the return address). An attacker that inputs just the right values can redirect program execution to a malicious function that may help take over the computer.

Further Reading[edit]

"Smashing The Stack For Fun And Profit", Aleph One, Phrack, 7(49), November 1996.



Terminology

"Hackers"[edit]

Hacking is a term used in popular culture to describe malicious activities of computer users. The movie Hackers was a large influence on bringing the term into common use by romanticising the Hacker as an idealistic youth seeking freedom from tyranny.

There are some fantastic books that help to explain what a real hacker is like:

  • Hackers, by Steven Levy
  • The Devouring Fungus, by Karla Jennings
  • Free as in Freedom, by Sam Williams
  • Just for Fun, by Linus Torvalds
  • The Cathedral and the Bazaar, by Eric Raymond
  • Code Book, by Simon Singh
  • In The Beginning... Was The Command Line, by Neal Stephenson
  • the cluetrain manifesto, by Rick Levine, Christopher Locke, Doc Searls, David Weinberger

This wikibook hopes to shed some light on what hackers really do, and who they actually are.

Hackers are people who enjoy playing around with computers to make things happen. This often involves circumventing some security aspects of operating systems or applications in order to gain privileged access.

The first chapter is one of the most important chapters to read. Here the term Hacking is defined, revealing some insight into what hacking really is.

The second covers the history of computing and hackers. This might help correct the false impressions propagated by news media.

The 'hacking-culture' follows up next and 'finally' the real thing is assessed. (Note: These methods are illegal if used wrongly, yet the method to prevent or 'cure' this 'attack' is given as well to remain as objective as possible).

I would appreciate it if anyone posts stuff which will help the world to deal with security issues and how to 'deal' with hackers (mostly crackers and scriptkiddies).

The Jargon File or New Hackers dictionary defines the term hacker quite nicely. It also does an exceptional job of pointing out that one does not need to be affiliated with computers at all to be considered a hacker.

The excerpt from The Jargon File:

 hacker: n.
 [originally, someone who makes furniture with an axe] 
 1. A person who enjoys exploring the details of programmable systems 
 and how to stretch their capabilities, as opposed to most users, who 
 prefer to learn only the minimum necessary. RFC1392, the Internet 
 Users' Glossary, usefully amplifies this as: A person who delights in 
 having an intimate understanding of the internal workings of a system, 
 computers and computer networks in particular.
 2. One who programs enthusiastically (even obsessively) or who enjoys 
 programming rather than just theorizing about programming. 
 3. A person capable of appreciating hack value. 
 4. A person who is good at programming quickly. 
 5. An expert at a particular program, or one who frequently does work 
 using it or on it; as in ?a Unix hacker?. (Definitions 1 through 5 are 
 correlated, and people who fit them congregate.) 
 6. An expert or enthusiast of any kind. One might be an astronomy hacker, 
 for example. 
 7. One who enjoys the intellectual challenge of creatively overcoming or 
 circumventing limitations. 
 8. [deprecated] A malicious meddler who tries to discover sensitive 
 information by poking around. Hence password hacker, network hacker. 
 The correct term for this sense is cracker.


Maybe it is helpful to note that the people that program the Linux kernel are called "Linux hackers".

A brief History of Hackers[edit]

As long as there have been computers, people were there to 'hack' them. But this activity really hit the headlines when the Internet arrived. Yet history teaches us that this event wasn't an evil thing at all, hackers actually 'maintain' the Internet as it should be. It is unimaginable for the computer/Internet society to grow so large without people who were at the cutting edge of technology (hacking internet it's way up). Just imagine how it would be if there weren't any hackers... Most technology you use today would not exist. Ask yourself: Would your computer be this powerful if it wasn't put to the edge? Would software be as reliable as it looks? Would there be a spiral of cutting-edge innovations?

You can answer all these questions with no. It might seem strange to think positively of hackers, but know that much is done on the edge of society (mostly not in the middle).

The Hacker-Culture[edit]

Advanced computer users describe themselves as hackers; those who use their skills for malevolent purposes are termed "crackers". The term crackersimplies breaking things, in the sense of cracking the integrity of a computer system; and they work through cracks in security, like climbing through a crack in a wall; but their main act is breaking into computers for example by figuring out the root password, which is like cracking a safe at a bank. This means that crackers break the law, yet this isn't enough to get indepth information about the hacker-culture.

In this chapter the main hacker-personalities will be described. In a rather unusual way: the media is used to get to know the real group. This means that you'll be able to understand that some of the people certainly not worth the name hacker.

Terminology[edit]

Hack is an onomatopoeic verb describing the noise and actions of chopping at something with a blade (i.e.: He hacked away at the underbrush with a machete), or a particularly nasty cough (i.e.: The chainsmoker hacked up some brown phlegm), but which also came to describe the act of typing on a typewriter, for the same reason (the annoying, incessant HACK HACK HACK, Ding! CRASH! HACK HACK HACK, &c).

From there it became associated not only with the action itself, but also those doing the typing. For example, a "hack"--a bad writer/journalist--would "hack out" a poorly researched or unoriginal story on his typewriter. While the less noisy tapping of computer keyboards began to replace the harsh noise of the typewriter, the old terminology was carried over to the new technology. Thus, the original "hackers," (long before the PC or word processors) were merely called that because they would spend their days "hacking away" at their console keyboards, writing code.

Note: It's worth noting that, when computing was in its infancy and console time at the giant mainframes was scarce, programmers would often hand-write code or type it out on typewriters before they manually plugged it into the machine. Also, the earliest consoles were basically automated typewriters.)

Nowadays, a hacker is, within the software development community, any skilled programmer, especially among open-source developers. A hack in turn, is a quick-and-dirty patch, fix, or utility which may not be well documented or necessarily reliable, but which gets the job done, whatever that job may be.

Crackers are skilled programmers who exploit the limitations of computer networks, and write up cracks--malicious hacks--to automate the dirty work. These hacks/cracks may attempt to break into remote machines e.g. He hacked into the school's server to increase his phys-ed grade., crack passwords (the most useful utility for this is simply called Crack), decrypt data, or simply modify proprietary software so the cracker doesn't have to pay for it e.g. He downloaded a cracked version of Dreamweaver, because he couldn't afford to buy it..

Viruses, trojans, and worms are also hacks/cracks of a sort. Crackers are especially fond of worms, which spread without user interaction and can be used to create giant, distributed supercomputers that can then be used to attack other computers (the Code Red worm used the combined power of infected computers to flood the White House web server, making it inaccessible to regular users for a time).

When a hacker finds a security hole in the software they're using, they hack out some code to patch it. When a cracker finds a hole, they hack out code to exploit it, ideally bringing remote computers under their control.

Script-kiddies use hacks and cracks created by real programmers, but they use their software without really understanding the code that's doing the work. They are generally trying to just impress their friends.

As indicated above, not all hacks are "malware" (malicious code). User-created shell scripts and batch files that automate tasks (like workstation startup, permission settings, data backup, &c.) are hacks too. Hacks are tools. They are hacked out to make someone's life easier, but not necessarily yours.

TV news and, of course, the movie "Hackers" brought the label hacker to the attention of the wider public, but failed to acknowledge its broader meaning, instead using it as a buzzword, so as not to confuse the less informed members of their audience ("Computer programs are written by people? Like books? I thought other computers made them!").

Others have latched on to the grit, glamour, and rebellion the buzzword hacker invokes; thus, they think of hacking as something of a religion. But, in short, it's nothing more than playing around with computer code. You don't even need to be connected to the internet to do some hacking, just learn a programming language.