Coder Perfect

Why is it that 2 + 40 equals 42?

Problem

When a colleague showed me this line of JavaScript notifying 42, I was perplexed.

What appears to be a minus sign is swiftly shown to be an esoteric Unicode character with distinctly different semantics.

This made me question why when the expression is parsed, that character doesn’t cause a syntax error. I’d also like to know whether there are any more characters who act in similar manner.

Asked by GOTO 0

Solution #1

“OGHAM SPACE MARK,” which is a space character, is the character in question. As a result, the code is alert(2+ 40).

In JavaScript, any Unicode character in the Zs class is a white space character, however there don’t appear to be many.

However, Unicode characters are allowed in identifiers in JavaScript, allowing you to utilize variable names like .

Answered by Felix Kling

Solution #2

After reading the other responses, I built a simple script to locate all Unicode characters that behave like white spaces in the range U+0000–U+FFFF. There appear to be 26 or 27 of these, depending on the browser, with some confusion around U+0085 and U+FFFE.

Note that most of these characters just look like a regular white space.

Answered by GOTO 0

Solution #3

The character you’re using looks to be lengthier than the actual negative sign (hyphen).

 
-

The plus sign should be at the top, and the negative sign should be at the bottom. You appear to already be aware of this, so let’s look at why Javascript does this.

Because the character you’re using is the ogham space mark, which is a whitespace character, it’s basically translated as a space in Javascript, which means your sentence appears like alert(2+ 40).

In Javascript, there are more characters like this. A complete list can be found on Wikipedia.

Something interesting I noticed about this character is how it is interpreted in the top bar of the page by Google Chrome (and possibly other browsers).

It is a block with 1680 inside of it. That is actually the unicode number for the ogham space mark. It appears to be just my machine doing this, but it is a strange thing.

I decided to check what would happen if I tried it in different languages, and here are the results.

Python 2 and Python 3

>> 2+ 40
  File "<stdin>", line 1
    2+ 40
        ^
SyntaxError: invalid character in identifier

Ruby

>> 2+ 40
NameError: undefined local variable or method ` 40' for main:Object
    from (irb):1
    from /home/michaelpri/.rbenv/versions/2.2.2/bin/irb:11:in `<main>'

Java is a programming language that is used (inside the main method)

>> System.out.println(2+ 40);
Main.java:3: error: illegal character: \5760
            System.out.println(2+?40);
                                 ^
Main.java:3: error: ';' expected
            System.out.println(2+?40);
                                  ^
Main.java:3: error: illegal start of expression
            System.out.println(2+?40);
                                    ^
3 errors

PHP

>> 2+ 40;
Use of undefined constant  40 - assumed ' 40' :1

C

>> 2+ 40
main.c:1:1: error: expected identifier or '(' before numeric constant
 2+ 40
 ^
main.c:1:1: error: stray '\341' in program
main.c:1:1: error: stray '\232' in program
main.c:1:1: error: stray '\200' in program

exit status 1

Go

>> 2+ 40
can't load package: package .: 
main.go:1:1: expected 'package', found 'INT' 2
main.go:1:3: illegal character U+1680

exit status 1

Perl 5

>> perl -e'2+ 40'                                                                                                                                   
Unrecognized character \xE1; marked by <-- HERE after 2+<-- HERE near column 3 at -e line 1.

Scheme

>> (+ 2  40)
=> 42

(Within the Main() function) in C#

Console.WriteLine(2+ 40);

Output: 42

Perl 6

>> ./perl6 -e'say 2+ 40' 
42

Answered by michaelpri

Solution #4

I’m guessing that has something to do with the fact that it’s classified as whitespace for some inexplicable reason:

$ unicode  
U+1680 OGHAM SPACE MARK
UTF-8: e1 9a 80  UTF-16BE: 1680  Decimal: &#5760;
  ( )
Uppercase: U+1680
Category: Zs (Separator, Space)
Bidi: WS (Whitespace)

Answered by PSkocik

Solution #5

I seem to recall reading something about replacing semi-colons (U+003B) in someone’s code with U+037E, the Greek question mark, a while back.

They both appear to be the same (to the point where I believe the Greeks themselves use U+003B), but according to this article, one of them will not work.

More Wikipedia information on this can be found here: https://en.wikipedia.org/wiki/Question mark#Greek question mark

And a (closed) query from SO about utilizing this as a joke. AFAIR, it wasn’t where I first read it: Prank / Joke in JavaScript

Answered by noonand

Post is based on https://stackoverflow.com/questions/31507143/why-does-2-40-equal-42