[rescue] T5220 update

Jonathan Patschke jp at celestrion.net
Tue Oct 31 12:09:56 CDT 2017


On Tue, 31 Oct 2017, Dave McGuire wrote:

>  BTW, all of your messages are littered with random 'b' characters.
>
>  Windows?

UTF-8 does that when you strip the high bit.

    b^X = 0x62, 0x18

Add back the high bits and it's 0xE298, which is an invalid Unicode code
point (3-bit prefix, only two characters).  Add a byte containing just the
high bit (as UTF-8 demands), and it's:

    0xE2, 0x80, 0x98

which is Unicode code point U+2018, or "left quotation mark."

This is Unix/Plan9 all the way down.  UTF-8 degrades poorly when you throw
away 12.5% of the data.

-- 
Jonathan Patschke
Austin, TX
USA


More information about the rescue mailing list