What appears to be a minus sign is swiftly shown to be an esoteric Unicode character with distinctly different semantics.
This made me question why when the expression is parsed, that character doesn’t cause a syntax error. I’d also like to know whether there are any more characters who act in similar manner.
Asked by GOTO 0
“OGHAM SPACE MARK,” which is a space character, is the character in question. As a result, the code is alert(2+ 40).
Answered by Felix Kling
After reading the other responses, I built a simple script to locate all Unicode characters that behave like white spaces in the range U+0000–U+FFFF. There appear to be 26 or 27 of these, depending on the browser, with some confusion around U+0085 and U+FFFE.
Note that most of these characters just look like a regular white space.
Answered by GOTO 0
The character you’re using looks to be lengthier than the actual negative sign (hyphen).
Something interesting I noticed about this character is how it is interpreted in the top bar of the page by Google Chrome (and possibly other browsers).
It is a block with 1680 inside of it. That is actually the unicode number for the ogham space mark. It appears to be just my machine doing this, but it is a strange thing.
I decided to check what would happen if I tried it in different languages, and here are the results.
Python 2 and Python 3
>> 2+ 40 File "<stdin>", line 1 2+ 40 ^ SyntaxError: invalid character in identifier
>> 2+ 40 NameError: undefined local variable or method ` 40' for main:Object from (irb):1 from /home/michaelpri/.rbenv/versions/2.2.2/bin/irb:11:in `<main>'
Java is a programming language that is used (inside the main method)
>> System.out.println(2+ 40); Main.java:3: error: illegal character: \5760 System.out.println(2+?40); ^ Main.java:3: error: ';' expected System.out.println(2+?40); ^ Main.java:3: error: illegal start of expression System.out.println(2+?40); ^ 3 errors
>> 2+ 40; Use of undefined constant 40 - assumed ' 40' :1
>> 2+ 40 main.c:1:1: error: expected identifier or '(' before numeric constant 2+ 40 ^ main.c:1:1: error: stray '\341' in program main.c:1:1: error: stray '\232' in program main.c:1:1: error: stray '\200' in program exit status 1
>> 2+ 40 can't load package: package .: main.go:1:1: expected 'package', found 'INT' 2 main.go:1:3: illegal character U+1680 exit status 1
>> perl -e'2+ 40' Unrecognized character \xE1; marked by <-- HERE after 2+<-- HERE near column 3 at -e line 1.
>> (+ 2 40) => 42
(Within the Main() function) in C#
Console.WriteLine(2+ 40); Output: 42
>> ./perl6 -e'say 2+ 40' 42
Answered by michaelpri
I’m guessing that has something to do with the fact that it’s classified as whitespace for some inexplicable reason:
$ unicode U+1680 OGHAM SPACE MARK UTF-8: e1 9a 80 UTF-16BE: 1680 Decimal:   ( ) Uppercase: U+1680 Category: Zs (Separator, Space) Bidi: WS (Whitespace)
Answered by PSkocik
I seem to recall reading something about replacing semi-colons (U+003B) in someone’s code with U+037E, the Greek question mark, a while back.
They both appear to be the same (to the point where I believe the Greeks themselves use U+003B), but according to this article, one of them will not work.
More Wikipedia information on this can be found here: https://en.wikipedia.org/wiki/Question mark#Greek question mark
Answered by noonand
Post is based on https://stackoverflow.com/questions/31507143/why-does-2-40-equal-42