Coder Perfect

In.NET, comparing two byte arrays

Problem

How am I going to get this done so quickly?

Sure, I’m up for it:

static bool ByteArrayCompare(byte[] a1, byte[] a2)
{
    if (a1.Length != a2.Length)
        return false;

    for (int i=0; i<a1.Length; i++)
        if (a1[i]!=a2[i])
            return false;

    return true;
}

However, I’m searching for a BCL function or a highly optimized, proven method to accomplish this.

java.util.Arrays.equals((sbyte[])(Array)a1, (sbyte[])(Array)a2);

It appears to work beautifully, however it does not appear to work for x64.

Note my super-fast answer here.

Asked by Hafthor

Solution #1

Enumerable is an option. The technique SequenceEqual is used to compare two sequences.

using System;
using System.Linq;
...
var a1 = new int[] { 1, 2, 3};
var a2 = new int[] { 1, 2, 3};
var a3 = new int[] { 1, 2, 4};
var x = a1.SequenceEqual(a2); // true
var y = a1.SequenceEqual(a3); // false

Your technique is fine if you can’t use.NET 3.5 for some reason. Your loop will be optimized by the compiler run-time environment, so you won’t have to worry about performance.

Answered by aku

Solution #2

P/Invoke powers activate!

[DllImport("msvcrt.dll", CallingConvention=CallingConvention.Cdecl)]
static extern int memcmp(byte[] b1, byte[] b2, long count);

static bool ByteArrayCompare(byte[] b1, byte[] b2)
{
    // Validate buffers are the same length.
    // This also ensures that the count does not exceed the length of either buffer.  
    return b1.Length == b2.Length && memcmp(b1, b2, b1.Length) == 0;
}

Answered by plinth

Solution #3

IStructuralEqualtable is a new built-in solution for this in.NET 4.

static bool ByteArrayCompare(byte[] a1, byte[] a2) 
{
    return StructuralComparisons.StructuralEqualityComparer.Equals(a1, a2);
}

Answered by Ohad Schneider

Solution #4

Without having to add complicated and/or non-portable fluff to your own application’s code base, SpanT> provides an exceptionally competitive alternative:

// byte[] is implicitly convertible to ReadOnlySpan<byte>
static bool ByteArrayCompare(ReadOnlySpan<byte> a1, ReadOnlySpan<byte> a2)
{
    return a1.SequenceEqual(a2);
}

As of.NET 5.0.0, the (guts of the) implementation may be found here.

I’ve updated @EliArbel’s gist to add this method as SpansEqual, remove most of the less interesting performances from others’ benchmarks, run it with different array sizes, generate graphs, and set SpansEqual as the baseline so that the different techniques are compared to SpansEqual.

The numbers below are from the results, which have been minimally altered to remove the “Error” column.

|        Method |  ByteCount |               Mean |            StdDev | Ratio | RatioSD |
|-------------- |----------- |-------------------:|------------------:|------:|--------:|
|    SpansEqual |         15 |           4.629 ns |         0.0289 ns |  1.00 |    0.00 |
|  LongPointers |         15 |           4.598 ns |         0.0416 ns |  0.99 |    0.01 |
|      Unrolled |         15 |          18.199 ns |         0.0291 ns |  3.93 |    0.02 |
| PInvokeMemcmp |         15 |           9.872 ns |         0.0441 ns |  2.13 |    0.02 |
|               |            |                    |                   |       |         |
|    SpansEqual |       1026 |          19.965 ns |         0.0880 ns |  1.00 |    0.00 |
|  LongPointers |       1026 |          63.005 ns |         0.5217 ns |  3.16 |    0.04 |
|      Unrolled |       1026 |          38.731 ns |         0.0166 ns |  1.94 |    0.01 |
| PInvokeMemcmp |       1026 |          40.355 ns |         0.0202 ns |  2.02 |    0.01 |
|               |            |                    |                   |       |         |
|    SpansEqual |    1048585 |      43,761.339 ns |        30.8744 ns |  1.00 |    0.00 |
|  LongPointers |    1048585 |      59,585.479 ns |        17.3907 ns |  1.36 |    0.00 |
|      Unrolled |    1048585 |      54,646.243 ns |        35.7638 ns |  1.25 |    0.00 |
| PInvokeMemcmp |    1048585 |      55,198.289 ns |        23.9732 ns |  1.26 |    0.00 |
|               |            |                    |                   |       |         |
|    SpansEqual | 2147483591 | 240,607,692.857 ns | 2,733,489.4894 ns |  1.00 |    0.00 |
|  LongPointers | 2147483591 | 238,223,478.571 ns | 2,033,769.5979 ns |  0.99 |    0.02 |
|      Unrolled | 2147483591 | 236,227,340.000 ns | 2,189,627.0164 ns |  0.98 |    0.00 |
| PInvokeMemcmp | 2147483591 | 238,724,660.000 ns | 3,726,140.4720 ns |  0.99 |    0.02 |

I was surprised that SpansEqual didn’t win in the max-array-size techniques, but the difference is so little that I don’t think it matters.

My system info:

BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19042
Intel Core i7-6850K CPU 3.60GHz (Skylake), 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=5.0.100
  [Host]     : .NET Core 5.0.0 (CoreCLR 5.0.20.51904, CoreFX 5.0.20.51904), X64 RyuJIT
  DefaultJob : .NET Core 5.0.0 (CoreCLR 5.0.20.51904, CoreFX 5.0.20.51904), X64 RyuJIT

Answered by Joe Amenta

Solution #5

User gil proposed dangerous code, which resulted in the following solution:

// Copyright (c) 2008-2013 Hafthor Stefansson
// Distributed under the MIT/X11 software license
// Ref: http://www.opensource.org/licenses/mit-license.php.
static unsafe bool UnsafeCompare(byte[] a1, byte[] a2) {
  if(a1==a2) return true;
  if(a1==null || a2==null || a1.Length!=a2.Length)
    return false;
  fixed (byte* p1=a1, p2=a2) {
    byte* x1=p1, x2=p2;
    int l = a1.Length;
    for (int i=0; i < l/8; i++, x1+=8, x2+=8)
      if (*((long*)x1) != *((long*)x2)) return false;
    if ((l & 4)!=0) { if (*((int*)x1)!=*((int*)x2)) return false; x1+=4; x2+=4; }
    if ((l & 2)!=0) { if (*((short*)x1)!=*((short*)x2)) return false; x1+=2; x2+=2; }
    if ((l & 1)!=0) if (*((byte*)x1) != *((byte*)x2)) return false;
    return true;
  }
}

which compares as much of the array as feasible using 64-bit based comparison. This is based on the fact that the arrays are qword aligned to begin with. It’ll work if it’s not qword aligned, but it won’t be as quick.

It runs around seven timers faster than a standard for loop. The J# library worked in the same way as the original for loop. Making use of SequenceEqual is seven times slower, which I believe is due to the fact that it uses IEnumerator. MoveNext. I expect LINQ-based solutions to be at least as slow, if not slower.

Answered by Hafthor

Post is based on https://stackoverflow.com/questions/43289/comparing-two-byte-arrays-in-net