Coder Perfect

The quickest way to spot discrepancies between two generic lists is to compare them.

Problem

What’s the quickest (and least resource-intensive) way to compare two big (>50.000 items) lists and get two lists like these:

I’m currently using the List or IReadOnlyCollection to handle this problem in a Linq query:

var list1 = list.Where(i => !list2.Contains(i)).ToList();
var list2 = list2.Where(i => !list.Contains(i)).ToList();

However, it does not perform as well as I would like. I need to process a lot of lists, so any suggestions on how to make it faster and less resource intensive?

Asked by Frank

Solution #1

Use Except:

var firstNotSecond = list1.Except(list2).ToList();
var secondNotFirst = list2.Except(list1).ToList();

I suspect there are approaches which would actually be marginally faster than this, but even this will be vastly faster than your O(N * M) approach.

You might build a method containing the above and then a return statement if you want to combine these:

return !firstNotSecond.Any() && !secondNotFirst.Any();

One thing to note is that the original code in the question and the solution here have different results: any duplicate elements that are only in one list will only be reported once with my code, whereas they would be reported as many times as they occur in the original code.

[2, 2, 2, 3] is a possibility. It would be [2, 3] with my code. Although this won’t be an issue in most circumstances, it’s something to be aware of.

Answered by Jon Skeet

Solution #2

Enumerable.SequenceEqual(list1, list2);

This is applicable to all primitive data types. You must implement IEqualityComparer if you want to use it on custom objects.

Defines methods for comparing things to see if they are equal.

Answered by miguelmpn

Solution #3

Enumerable would be more efficient. Except:

var inListButNotInList2 = list.Except(list2);
var inList2ButNotInList = list2.Except(list);

Deferred execution is used to implement this approach. That implies, for example, you could write:

var first10 = inListButNotInList2.Take(10);

It is also efficient because it compares the items using a SetT> internally. It works by collecting all distinct values from the second sequence first, then streaming the first’s results and ensuring they haven’t been seen previously.

Answered by Tim Schmelter

Solution #4

The following will work if you want the results to be case insensitive:

List<string> list1 = new List<string> { "a.dll", "b1.dll" };
List<string> list2 = new List<string> { "A.dll", "b2.dll" };

var firstNotSecond = list1.Except(list2, StringComparer.OrdinalIgnoreCase).ToList();
var secondNotFirst = list2.Except(list1, StringComparer.OrdinalIgnoreCase).ToList();

firstNotSecond would contain b1.dll

secondNotFirst would contain b2.dll

Answered by e.gad

Solution #5

using System.Collections.Generic;
using System.Linq;

namespace YourProject.Extensions
{
    public static class ListExtensions
    {
        public static bool SetwiseEquivalentTo<T>(this List<T> list, List<T> other)
            where T: IEquatable<T>
        {
            if (list.Except(other).Any())
                return false;
            if (other.Except(list).Any())
                return false;
            return true;
        }
    }
}

Sometimes all you need to know is whether two lists differ, not what those differences are. If that’s the case, you should think about including this extension method in your project. Keep in mind that all of your specified objects must implement IEquatable!

Usage:

public sealed class Car : IEquatable<Car>
{
    public Price Price { get; }
    public List<Component> Components { get; }

    ...
    public override bool Equals(object obj)
        => obj is Car other && Equals(other);

    public bool Equals(Car other)
        => Price == other.Price
            && Components.SetwiseEquivalentTo(other.Components);

    public override int GetHashCode()
        => Components.Aggregate(
            Price.GetHashCode(),
            (code, next) => code ^ next.GetHashCode()); // Bitwise XOR
}

The methods provided here for Car should be implemented roughly identically regardless of the Component type.

It’s critical to note how GetHashCode was written. Equals and GetHashCode must operate on the instance’s properties in a logically consistent manner in order to correctly implement IEquatable.

Two lists with the same contents are nonetheless distinct objects, and their hash codes will differ. We must let GetHashCode produce the same value for each of these two lists if we want them to be treated equally. We can do this by assigning the hashcode to each element in the list and combining them with the conventional bitwise XOR. Because XOR is order-agnostic, it doesn’t matter how the lists are arranged. It’s just important that they all have the same members.

Note that the unusual name alludes to the fact that the method ignores the order of the entries in the list. This solution is not for you if you care about the order of the components in the list!

Answered by Devon Parsons

Post is based on https://stackoverflow.com/questions/12795882/quickest-way-to-compare-two-generic-lists-for-differences