Reliable file hashing?

littlebigman

Well-known member
Joined
Jan 5, 2010
Messages
75
Programming Experience
Beginner
Hello,

I'm having the following issue while looping through a directory: for each file, I need to hash its content, check if this file is already in an DB, add a record if it isn't.
The goal of this application is to check a whole drive for UltraEdit temp files, check for duplicates, and save any unique file into a backup directory.

In the following code, a record is added everytime, although this file is already in the SQLite database (I checked by opening it with a stand-alone application after running the program once):

pastebin - Anonymous - post number 1823757

The problem occurs around line 63.

I'm using TEXT to hold the hash column: Could it be that, for some reason, this data isn't reliably saved or read, which would explain why a new record is INSERTed every time, even though this item is already in the database?

Thank you for any hint.
 
I tried different solutions found through Google, but I still 8-bit characters, which could be the issue when SELECTing a row:

VB.NET:
Private Function ReadFile(ByVal Path As String) As String
    Dim ReadFileStream As FileStream
    Dim FileEncoding As New System.Text.ASCIIEncoding()
    Dim FileReader As StreamReader
    Dim HashData As New MD5CryptoServiceProvider()

    ReadFileStream = New FileStream(Path, FileMode.Open)
    FileReader = New StreamReader(ReadFileStream)

    Dim FileBytes = FileEncoding.GetBytes(FileReader.ReadToEnd)

    'How To Get Non-Binary Ascii, Ie. Only Alphanumeric Chars?
    Dim FetchedContent = FileEncoding.GetString(HashData.ComputeHash(FileBytes))
    'BAD Dim FetchedContent = FileEncoding.GetString(HashData.ComputeHash(System.Text.Encoding.ASCII.GetBytes(FileBytes)))
    'BAD Dim FetchedContent = FileEncoding.ASCII.GetString(HashData.ComputeHash(FileBytes))
    'BAD FetchedContent = System.Text.Encoding.ASCII.GetBytes(FetchedContent)
    'BAD FetchedContent = Convert.ToBase64String(FetchedContent)

    FileReader.Close()
    ReadFileStream.Close()

    Return FetchedContent
End Function

If someone knows how to get standard, displayable characters...
 
Found it: The trick is that HashData.ComputeHash() returns an array of bytes, which must then be converted to a string using ToBase64String():

VB.NET:
Dim FetchedContent = System.Convert.ToBase64String(HashData.ComputeHash(FileBytes))
Return FetchedContent

HTH,
 
Back
Top