Visual Basic for Applications/File Hashing in VBA

From Wikibooks, open books for an open world
Jump to navigation Jump to search

Summary[edit]

  • This section contains code for making file hashes, that is, hashes of entire files. Several algorithms are provided, with output options for base64 or hex. The code is made for single files, though the file-listing code elsewhere in this series could be employed in making hash listings recursively for folders. String hash routines are given in another section. The code here does not include any seed or salt values to further modify the results, though the reader could concatenate a private string of his own with the input prior to hashing, to make the output truly unique. These listed algorithms can hash any single file up to about 200MB (Mega Bytes) in length, beyond which an out of memory error will be generated in GetFileBytes(). Specific tests found that hashes work well for a 200MB zip file but fail for a 500MB zip file; the exact break point is unclear. For files larger than 200MB, other facilities exist.
  • Large file hashing, say beyond 200MB is best done with other tools. Two examples are mentioned here:
    • Microsoft's FCIV utility, is free download. It is a command-line application, capable of hashing both single files and whole folder trees. It handles large files with ease, but only for MD5 and SHA1 hashes. It sends both base64 and HEX outputs to the screen but only b64 output format to a file. Prepared files can be verified against any new run, but results only to the screen. It is a bit tricky to use, even with their instructions, so the pages Running the FCIV Utility from VBA and File Checksum Integrity Verifier (FCIV) Examples might be found of use to the novice. So far, Microsoft have not extended the coding to include contemporary algorithms.
    • PowerShell in Windows 8.1 and above, can make large single-file hashes, using all of the MD5. SHA1, SHA256, SHA384, and SHA512 algorithms. It produces output on the screen only, though the output can also be piped to the clipboard for pasting as required. There are no simple options for hashing a folder or for output to an xml file. For completion, an example of its use is given in File Checksum Integrity Verifier (FCIV) Examples.
    • An external application that can handle large files is MD5 and SHA Checksum Utility. It is a stand-alone application, and a basic version is available as a free download. It produces MD5, SHA1, SHA2/256, and SHA2/512 hashes for single files. The outputs are in HEX and are displayed together on a neat user interface. A more complex commercial version is also available.
    • FSUM Fast File Integrity Checker is another free, external application for command line use. It resembles FCIV in many ways but includes up to date algorithms. (MD2, MD4, MD5, SHA-1, SHA-2( 256, 384, 512), RIPEMD-160, PANAMA, TIGER, ADLER32, and CRC32). In addition to large file HEX hashes it can carry out flat or recursive folder hashes. The code to enter is not identical to that of FCIV but a text file is provided with examples in its use. The web page FSUM Fast File Integrity Checker has the download and other details, though it fails to mention that results can be easily piped to the clipboard with |clip. For those who prefer a basic graphical interface for FSUM, download it at FSUM GUI.
  • The permissions for files need to be considered when attempting hashing. Hashing has to access files to obtain the bytes that they contain. Although this does not involve actually running the files, some folder and file types might be found locked at run time. In fact, this type of access is the main difference between string hashing and file hashing. Whenever files are accessed, error handling tends to be needed. It is assumed here that the user will add his own error-handling, or that he will go-around files that are troublesome before the hashing attempt. Users should know that the code cannot handle an empty text file; for example, a Notepad file that has been saved without any text in it. The GetFileBytes routine will error. Although not a problem for most, in bulk automated use, users would be advised to exclude such empty text files by first testing them for zero size; a function has been included for this in case it is of use. The same procedure might also be of use in testing files for excessive size.
  • User files and folders have few restrictions. The empty file problem apart, those who want to access user files in folders that they have made themselves will not usually have any problems, and interested parties should know that there is a recursive file listing module in another section of this series that might be of related interest. To expand on this matter; although Windows folders for Music, Pictures, etc are protected, and will error on an attempted access, the files themselves that a user has placed in them are not. In fact most MS media files are not restricted. To include these files in a listing, first make another folder in the user profile and transfer all of the restricted folder's contents to it; then there will be no problem.
  • Hashing is concerned only with the content of a file, and not its name, or other file details. This means that duplicates of files under any name can be found by comparing their hashes. In secure systems with deliberately confusing file names, this means that a very long file list could be hashed until a searched-for hash value is found, rather than depending on a less secure file name to find it. Alternatively, file names are sometimes just the file's hash value, so that hashing can reveal any error or illegal change. That is to say, a hacker might change the file and the name on it, but he does not know the required hash algorithm or private string to use, so changes will always be detected when the owner runs his own hash verification.

Code Listings[edit]

Using Built-in Windows Functions in VBA[edit]

The code to make hashes of STRINGS and for bulk file hashing is given elsewhere in this set. The panel below bears code that is virtually identical to that for strings, but with only slight modification, is used to make hashes of single whole FILES. The user provides a full path to the file as the starting parameter. A parameter option allows for a choice of hex or base-64 outputs. Functions are included for MD5, SHA1, SHA2-256, SHA2-384, and SHA2-512 hashes.

In each case, coders can find the unmodified hash values in the bytes() array and at that point they are in 8-bit bytes, that is, the numbers that represent the ASCI code as it applies to a full eight-bit, 256 character set. The code that follows the filling of the bytes() array in each case decides which version of the ASCI character set to deliver. For a hex set of characters, 0-9, and A to F, the total bit set is broken into double the number of four-bit bytes, then returned for use. For the base-64 set, lower case letters,upper case letters, and integers mainly, six bit characters are made for output. These two sets are the most useful here, since they consist of commonly used characters. The 128 and 256 ASCI sets are too full of both exotic and non-printing characters to be useful. For each hash version its bit count is a constant, so the length of its output will vary according to the chosen type.

Option Explicit

Private Sub TestFileHashes()
    'run this to test the file hasher. Select or comment lines as necessary
    'enter your own paths for the files to test
    'Set a reference to mscorlib 4.0 64-bit
    
    Dim sFPath As String, b64 As Boolean, sH As String
    
    'set output type and path to target file
    'b64 = False   'output hex
    b64 = True     'output base-64
    sFPath = "C:\Users\Your Folder\Documents\test.txt"
    
    'enable any one line to test hash
    'sh=FileToMD5(sFPath, b64)
    'sh=FileToSHA1(sFPath, b64)
    'sh=FileToSHA256(sFPath, b64)
    'sH = FileToSHA384(sFPath, b64)
    sH = FileToSHA512(sFPath, b64)
    
    Debug.Print sH & vbNewLine & Len(sH) & " characters in length"
    MsgBox sH & vbNewLine & Len(sH) & " characters in length"

End Sub

Public Function FileToMD5(sFullPath As String, Optional bB64 As Boolean = False) As String
    'parameter full path with name of file returned in the function as an MD5 hash
    'Set a reference to mscorlib 4.0 64-bit
    
    Dim enc, bytes, outstr As String, pos As Integer
    
    Set enc = CreateObject("System.Security.Cryptography.MD5CryptoServiceProvider")
    'Convert the string to a byte array and hash it
    bytes = GetFileBytes(sFullPath)
    bytes = enc.ComputeHash_2((bytes))
    
    If bB64 = True Then
       FileToMD5 = ConvToBase64String(bytes)
    Else
       FileToMD5 = ConvToHexString(bytes)
    End If
        
    Set enc = Nothing

End Function

Public Function FileToSHA1(sFullPath As String, Optional bB64 As Boolean = False) As String
    'parameter full path with name of file returned in the function as an SHA1 hash
    'Set a reference to mscorlib 4.0 64-bit
    
    Dim enc, bytes, outstr As String, pos As Integer
    
    Set enc = CreateObject("System.Security.Cryptography.SHA1CryptoServiceProvider")
    'Convert the string to a byte array and hash it
    bytes = GetFileBytes(sFullPath) 'returned as a byte array
    bytes = enc.ComputeHash_2((bytes))
    
    If bB64 = True Then
       FileToSHA1 = ConvToBase64String(bytes)
    Else
       FileToSHA1 = ConvToHexString(bytes)
    End If
        
    Set enc = Nothing

End Function

Public Function FileToSHA256(sFullPath As String, Optional bB64 As Boolean = False) As String
    'parameter full path with name of file returned in the function as an SHA2-256 hash
    'Set a reference to mscorlib 4.0 64-bit
    
    Dim enc, bytes, outstr As String, pos As Integer
    
    Set enc = CreateObject("System.Security.Cryptography.SHA256Managed")
    'Convert the string to a byte array and hash it
    bytes = GetFileBytes(sFullPath) 'returned as a byte array
    bytes = enc.ComputeHash_2((bytes))
    
    If bB64 = True Then
       FileToSHA256 = ConvToBase64String(bytes)
    Else
       FileToSHA256 = ConvToHexString(bytes)
    End If
        
    Set enc = Nothing

End Function

Public Function FileToSHA384(sFullPath As String, Optional bB64 As Boolean = False) As String
    'parameter full path with name of file returned in the function as an SHA2-384 hash
    'Set a reference to mscorlib 4.0 64-bit
    
    Dim enc, bytes, outstr As String, pos As Integer
    
    Set enc = CreateObject("System.Security.Cryptography.SHA384Managed")
    'Convert the string to a byte array and hash it
    bytes = GetFileBytes(sFullPath) 'returned as a byte array
    bytes = enc.ComputeHash_2((bytes))
    
    If bB64 = True Then
       FileToSHA384 = ConvToBase64String(bytes)
    Else
       FileToSHA384 = ConvToHexString(bytes)
    End If
    
    Set enc = Nothing

End Function

Public Function FileToSHA512(sFullPath As String, Optional bB64 As Boolean = False) As String
    'parameter full path with name of file returned in the function as an SHA2-512 hash
    'Set a reference to mscorlib 4.0 64-bit
    
    Dim enc, bytes, outstr As String, pos As Integer
    
    Set enc = CreateObject("System.Security.Cryptography.SHA512Managed")
    'Convert the string to a byte array and hash it
    bytes = GetFileBytes(sFullPath) 'returned as a byte array
    bytes = enc.ComputeHash_2((bytes))
    
    If bB64 = True Then
       FileToSHA512 = ConvToBase64String(bytes)
    Else
       FileToSHA512 = ConvToHexString(bytes)
    End If
    
    Set enc = Nothing

End Function

Private Function GetFileBytes(ByVal sPath As String) As Byte()
    'makes byte array from file
    'Set a reference to mscorlib 4.0 64-bit
    
    Dim lngFileNum As Long, bytRtnVal() As Byte, bTest
    
    lngFileNum = FreeFile
    
    If LenB(Dir(sPath)) Then ''// Does file exist?
        
        Open sPath For Binary Access Read As lngFileNum
        
        'a zero length file content will give error 9 here
        
        ReDim bytRtnVal(0 To LOF(lngFileNum) - 1&) As Byte
        Get lngFileNum, , bytRtnVal
        Close lngFileNum
    Else
        Err.Raise 53 'File not found
    End If
    
    GetFileBytes = bytRtnVal
    
    Erase bytRtnVal

End Function

Function ConvToBase64String(vIn As Variant) As Variant
    'used to produce a base-64 output
    'Set a reference to mscorlib 4.0 64-bit
    
    Dim oD As Object
      
    Set oD = CreateObject("MSXML2.DOMDocument")
      With oD
        .LoadXML "<root />"
        .DocumentElement.DataType = "bin.base64"
        .DocumentElement.nodeTypedValue = vIn
      End With
    ConvToBase64String = Replace(oD.DocumentElement.Text, vbLf, "")
    
    Set oD = Nothing

End Function

Function ConvToHexString(vIn As Variant) As Variant
    'used to produce a hex output
    'Set a reference to mscorlib 4.0 64-bit
    
    Dim oD As Object
      
    Set oD = CreateObject("MSXML2.DOMDocument")
      
      With oD
        .LoadXML "<root />"
        .DocumentElement.DataType = "bin.Hex"
        .DocumentElement.nodeTypedValue = vIn
      End With
    ConvToHexString = Replace(oD.DocumentElement.Text, vbLf, "")
    
    Set oD = Nothing

End Function

Function GetFileSize(sFilePath As String, nSize As Long) As Boolean
    'use this to test for a zero file size
    'takes full path as string in sFileSize
    'returns file size in bytes in nSize
    
    Dim fs As FileSystemObject, f As File
    
    Set fs = CreateObject("Scripting.FileSystemObject")
    
    If fs.FileExists(sFilePath) Then
        Set f = fs.GetFile(sFilePath)
        nSize = f.Size
        GetFileSize = True
        Exit Function
    End If

End Function

See Also[edit]

External Links[edit]

  • MD5 and SHA Checksum Utility: a external site's free application to simultaneously display MD5,SHA1,SHA256,and SHA512 hashes of single files (Hex only). Includes a browse-for-file function and drag and drop to make life easy. This utility can also be used to hash large files; author-tested for a 500MB file.
  • FSUM Fast File Integrity Checker : The basic command-line version of the FSUM hasher download.
  • FSUM GUI : A graphical interface download site for the FSUM utility. This allows file browsing, drag and drop, and other facilities to simplify the otherwise command-line working.