Problem
I’m reading text from a PDF file with iTextSharp. However, there are situations when I am unable to extract text from a PDF file since it just contains graphics. Every day, I download the same PDF files, and I’d like to know if they’ve been altered. Is an MD5 checksum the most reliable approach to verify if the file has changed if the content and modification date are unavailable?
If it is, some code samples would be appreciated, because I don’t have much experience with cryptography.
Asked by broke
Solution #1
It’s really easy to use System. Security.Cryptography.MD5:
using (var md5 = MD5.Create())
{
using (var stream = File.OpenRead(filename))
{
return md5.ComputeHash(stream);
}
}
(I believe that actually the MD5 implementation used doesn’t need to be disposed, but I’d probably still do so anyway.)
It’s up to you how you compare the results after that; you can convert the byte array to base64, for example, or directly compare the bytes. (Be aware, however, that arrays do not override Equals.) If you’re solely interested in comparing hashes, base64 is easier to get right, but it’s significantly less efficient.)
If you need to represent the hash as a string, you can use BitConverter to convert it to hex:
static string CalculateMD5(string filename)
{
using (var md5 = MD5.Create())
{
using (var stream = File.OpenRead(filename))
{
var hash = md5.ComputeHash(stream);
return BitConverter.ToString(hash).Replace("-", "").ToLowerInvariant();
}
}
}
Answered by Jon Skeet
Solution #2
Here’s how I go about it:
using System.IO;
using System.Security.Cryptography;
public string checkMD5(string filename)
{
using (var md5 = MD5.Create())
{
using (var stream = File.OpenRead(filename))
{
return Encoding.Default.GetString(md5.ComputeHash(stream));
}
}
}
Answered by BoliBerrys
Solution #3
I’m aware that this question has already been addressed, but here’s what I use:
using (FileStream fStream = File.OpenRead(filename)) {
return GetHash<MD5>(fStream)
}
Where GetHash:
public static String GetHash<T>(Stream stream) where T : HashAlgorithm {
StringBuilder sb = new StringBuilder();
MethodInfo create = typeof(T).GetMethod("Create", new Type[] {});
using (T crypt = (T) create.Invoke(null, null)) {
byte[] hashBytes = crypt.ComputeHash(stream);
foreach (byte bt in hashBytes) {
sb.Append(bt.ToString("x2"));
}
}
return sb.ToString();
}
It’s probably not the ideal method, but it can be useful.
Answered by Badaro Jr.
Solution #4
I came across a significantly simpler version. It reads the entire file at once and only needs one using directive.
byte[] ComputeHash(string filePath)
{
using (var md5 = MD5.Create())
{
return md5.ComputeHash(File.ReadAllBytes(filePath));
}
}
Answered by Ashley Davis
Solution #5
I am aware that I am late to the party, but I conducted tests prior to implementing the solution.
I ran tests against the native MD5 class as well as md5sum.exe. In my situation, the inbuilt class took 13 seconds to complete, but md5sum.exe took 16-18 seconds on each run.
DateTime current = DateTime.Now;
string file = @"C:\text.iso";//It's 2.5 Gb file
string output;
using (var md5 = MD5.Create())
{
using (var stream = File.OpenRead(file))
{
byte[] checksum = md5.ComputeHash(stream);
output = BitConverter.ToString(checksum).Replace("-", String.Empty).ToLower();
Console.WriteLine("Total seconds : " + (DateTime.Now - current).TotalSeconds.ToString() + " " + output);
}
}
Answered by Romil Kumar Jain
Post is based on https://stackoverflow.com/questions/10520048/calculate-md5-checksum-for-a-file