Coder Perfect

With “as” and nullable types, you can surprise your audience with your performance.

Problem

I’m almost finishing up Chapter 4 of C# in Depth, which is about nullable types, and I’m going to add a section about how to use the “as” operator, which allows you to write:

object o = ...;
int? x = o as int?;
if (x.HasValue)
{
    ... // Use x.Value in here
}

I thought this was really cool, and I thought it could be faster than the C# 1 counterpart, which uses “is” followed by a cast – after all, we only need to ask for dynamic type checking once, and then we can just verify the value.

However, this does not appear to be the case. I’ve attached a simple test program below that sums all the numbers in an object array, but the array contains many null and text references, as well as boxed integers. The benchmark compares code written in C# 1, code written with the “as” operator, and a LINQ solution simply for fun. To my surprise, the C# 1 code is 20 times faster in this example, and even the LINQ code (which I expected to be slower due to the iterators involved) outperforms the “as” code.

Is it just that the.NET implementation of isinst for nullable classes is slow? Is it the extra unbox.any that’s causing the issue? Is there a different explanation? At the present, it appears that I’ll have to include a disclaimer about utilizing this in performance-critical circumstances…

Results:

Code:

using System;
using System.Diagnostics;
using System.Linq;

class Test
{
    const int Size = 30000000;

    static void Main()
    {
        object[] values = new object[Size];
        for (int i = 0; i < Size - 2; i += 3)
        {
            values[i] = null;
            values[i+1] = "";
            values[i+2] = 1;
        }

        FindSumWithCast(values);
        FindSumWithAs(values);
        FindSumWithLinq(values);
    }

    static void FindSumWithCast(object[] values)
    {
        Stopwatch sw = Stopwatch.StartNew();
        int sum = 0;
        foreach (object o in values)
        {
            if (o is int)
            {
                int x = (int) o;
                sum += x;
            }
        }
        sw.Stop();
        Console.WriteLine("Cast: {0} : {1}", sum, 
                          (long) sw.ElapsedMilliseconds);
    }

    static void FindSumWithAs(object[] values)
    {
        Stopwatch sw = Stopwatch.StartNew();
        int sum = 0;
        foreach (object o in values)
        {
            int? x = o as int?;
            if (x.HasValue)
            {
                sum += x.Value;
            }
        }
        sw.Stop();
        Console.WriteLine("As: {0} : {1}", sum, 
                          (long) sw.ElapsedMilliseconds);
    }

    static void FindSumWithLinq(object[] values)
    {
        Stopwatch sw = Stopwatch.StartNew();
        int sum = values.OfType<int>().Sum();
        sw.Stop();
        Console.WriteLine("LINQ: {0} : {1}", sum, 
                          (long) sw.ElapsedMilliseconds);
    }
}

Asked by Jon Skeet

Solution #1

Clearly, the JIT compiler can generate significantly more efficient machine code in the first example. An object can only be unboxed to a variable of the same type as the boxed value, which is a very useful rule. As a result, the JIT compiler can generate extremely efficient code because no value conversions are required.

The is operator test is simple: check if the object isn’t null and of the expected type, which just requires a few machine code instructions. The cast is also simple since the JIT compiler recognizes the location of the object’s value bits and utilizes them immediately. There is no copying or conversion; all machine code is inline and only requires a few dozen instructions. Back in the days of.NET 1.0, when boxing was widespread, this had to be extremely efficient.

Casting to int? needs a great deal more effort. The memory layout of Nullableint> is incompatible with the value representation of the boxed integer. Due to the possibility of boxed enum types, a conversion is required, and the implementation is complex. To complete the task, the JIT compiler creates a call to the CLR auxiliary function JIT Unbox Nullable. There’s a lot of code to check types in this general-purpose function for any value type. The value is also copied. Because this function is sealed inside mscorwks.dll, it’s difficult to estimate the cost, but hundreds of machine code instructions are likely.

The is operator and the cast are also used by the Linq OfType() extension method. This is, however, a general type casting. The JIT compiler generates a call to JIT Unbox(), a utility function that performs a cast to any value type. I’m not sure why it’s so sluggish compared to the cast to Nullableint>, considering that there should be less work involved. I believe that ngen.exe is the source of the problem.

Answered by Hans Passant

Solution #2

On nullable types, the isinst appears to be extremely sluggish. I modified the FindSumWithCast method.

if (o is int)

to

if (o is int?)

As a result, execution is severely slowed. I can only see one difference in IL:

isinst     [mscorlib]System.Int32

gets changed to

isinst     valuetype [mscorlib]System.Nullable`1<int32>

Answered by Dirk Vollmar

Solution #3

This began as a comment to Hans Passant’s wonderful response, but it became too long, so I’ll add a few things here:

The as operator in C# will first emit an isinst IL instruction (so does the is operator). (Another intriguing instruction is castclass, which is generated when you execute a direct cast and the compiler determines that runtime checking is required.)

Isinst performs the following tasks (ECMA 335 Partition III, 4.6):

Most importantly:

In this situation, the performance killer isn’t isinst, but the additional unbox. any. Hans’ response didn’t make this clear because he only looked at the JITed code. In most cases, the C# compiler will produce an unbox. Is there any following an isinst T? (However, if you do isinst T and T is a reference type, it will be omitted).

Why does it behave in this manner? isinst T? never has the expected result, i.e. you get a T? Instead, these procedures guarantee that you will have a “boxed T” that can be unboxed to T? We still need to unbox our “boxed T” to get an actual T?, which is why the compiler emits an unbox. After isinst, you can do whatever you want. This makes sense if you consider that T”box ?’s format” is simply a “boxed T,” and having castclass and isinst execute the unbox would be inconsistent.

Here’s some information from the standard to back up Hans’ findings:

Unbox.any (ECMA 335 Partition III, 4.33):

Unbox (ECMA 335 Partition III, 4.32)

Answered by Johannes Rudolph

Solution #4

I sent comments about operator support via dynamic being an order of magnitude slower for NullableT> (similar to this early test) – I presume for identical reasons.

NullableT> is one of my favorite things. Another amusing feature is that, while the JIT detects (and eliminates) null for non-nullable structs, it fails for NullableT>:

using System;
using System.Diagnostics;
static class Program {
    static void Main() { 
        // JIT
        TestUnrestricted<int>(1,5);
        TestUnrestricted<string>("abc",5);
        TestUnrestricted<int?>(1,5);
        TestNullable<int>(1, 5);

        const int LOOP = 100000000;
        Console.WriteLine(TestUnrestricted<int>(1, LOOP));
        Console.WriteLine(TestUnrestricted<string>("abc", LOOP));
        Console.WriteLine(TestUnrestricted<int?>(1, LOOP));
        Console.WriteLine(TestNullable<int>(1, LOOP));

    }
    static long TestUnrestricted<T>(T x, int loop) {
        Stopwatch watch = Stopwatch.StartNew();
        int count = 0;
        for (int i = 0; i < loop; i++) {
            if (x != null) count++;
        }
        watch.Stop();
        return watch.ElapsedMilliseconds;
    }
    static long TestNullable<T>(T? x, int loop) where T : struct {
        Stopwatch watch = Stopwatch.StartNew();
        int count = 0;
        for (int i = 0; i < loop; i++) {
            if (x != null) count++;
        }
        watch.Stop();
        return watch.ElapsedMilliseconds;
    }
}

Answered by Marc Gravell

Solution #5

To keep this answer current, it’s worth noting that with C# 7.1 and.NET 4.7, which offer a slim syntax that also produces the finest IL code, the most of the discussion on this page is now obsolete.

The initial example given by the OP…

object o = ...;
int? x = o as int?;
if (x.HasValue)
{
    // ...use x.Value in here
}

becomes simply…

if (o is int x)
{
    // ...use x in here
}

When developing a.NET value type (i.e. struct in C#) that implements IEquatableMyStruct>, I’ve discovered that the new syntax comes in handy (as most should). You can now gracefully redirect the untyped Equals(Object obj) override (inherited from Object) to it after implementing the strongly-typed Equals(MyStruct other) method:

public override bool Equals(Object obj) => obj is MyStruct o && Equals(o);

Appendix: Here is the Release build IL code for the first two example functions in this answer (respectively). While the new syntax’s IL code is one byte shorter, it usually wins by performing zero calls (rather than two) and avoiding the unbox operation wherever possible.

// static void test1(Object o, ref int y)
// {
//     int? x = o as int?;
//     if (x.HasValue)
//         y = x.Value;
// }

[0] valuetype [mscorlib]Nullable`1<int32> x
        ldarg.0
        isinst [mscorlib]Nullable`1<int32>
        unbox.any [mscorlib]Nullable`1<int32>
        stloc.0
        ldloca.s x
        call instance bool [mscorlib]Nullable`1<int32>::get_HasValue()
        brfalse.s L_001e
        ldarg.1
        ldloca.s x
        call instance !0 [mscorlib]Nullable`1<int32>::get_Value()
        stind.i4
L_001e: ret
// static void test2(Object o, ref int y)
// {
//     if (o is int x)
//         y = x;
// }

[0] int32 x,
[1] object obj2
        ldarg.0
        stloc.1
        ldloc.1
        isinst int32
        ldnull
        cgt.un
        dup
        brtrue.s L_0011
        ldc.i4.0
        br.s L_0017
L_0011: ldloc.1
        unbox.any int32
L_0017: stloc.0
        brfalse.s L_001d
        ldarg.1
        ldloc.0
        stind.i4
L_001d: ret

See here for more testing that backs up my claim that the new C#7 syntax outperforms the previously available alternatives (in particular, example ‘D’).

Answered by Glenn Slayden

Post is based on https://stackoverflow.com/questions/1583050/performance-surprise-with-as-and-nullable-types