upvote
The page is declared as ISO 8859-1, but the actual bytes of the text appear to be UTF-8. In UTF-8, characters from U+0080 to U+00AF happen to be encoded as C2 <codepoint value>. For example, U+0092 is encoded as C2 92.

C2 in ISO 8859-1 is ””. U+0092 is the control code Private Use 2 in Unicode, and 92 is the same in ISO 8859-1. However, the standard Western Windows code page 1252 extends ISO 8859-1 by assigning “’” (right single quotation mark) to 92.

HTML5/WHATWG requires an ISO 8859-1 charset declaration to be interpreted as Windows-1252 (https://blog.whatwg.org/the-road-to-html-5-character-encodin...), hence the displayed result is “Â’”.

The original Windows-1252 content must have previously been converted to UTF-8 under the assumption that the source is ISO 8859-1, i.e. mapping 92 to U+0092 (Private Use 2) instead of to U+2019 (Right Single Quotation Mark). The resulting UTF-8 encoding was placed in the web page, which however is declared as ISO 8859-1.

reply
Delicious, thank you!
reply
I edited my post after verifying the actual bytes, it turned out to be slightly more complicated.
reply
The double-encoding path gets you there too: the original UTF-8 \xE2 \x80 \x99 mis-decoded as iso-8859-1 or Windows-1252 and saved back as UTF-8 gives \xC3 \xA2 \xC2 \x80 \xC2 \x99, which in Windows-1252 renders as ’. A WYSIWYG cleanup replacing that mojibake with the Windows-1252 ' (byte 0x92) and saving back as UTF-8 gets you to \xC2 \x92 on disk.

Edit: Although maybe that's not the most parsimonious explanation.

reply
this one does g11n....
reply
deleted
reply
They're probably Microsoft's "Smart Quotes", which are Unicode. They were presumably stored in UTF-8 but retrieved as ASCII (or ISO-8859-1).
reply