[rescue] UTF-8 [was T5220 update]

Jonathan Patschke jp at celestrion.net
Tue Oct 31 13:21:20 CDT 2017


On Tue, 31 Oct 2017, Mouse wrote:

> Most character encodings degrade poorly when you throw away any
> significant fraction of the data.

Indeed, most of them fare far worse.

> I'm perpetually depressed by the number of people who seem to think it's
> reasonable for them to generate UTF-8 all the time and that it's
> everyone else's duty to handle it they way they intend,

I can't speak for Lionel's MUA, but I'd lay good money on it being
up-front about the message encoding in the MIME headers.  I'd lay blame on
the software that sanitized the content without running it through iconv.

> as if Unicode were some kind of God-given One True Character Set and
> UTF-8 its One True Encoding.

UTF-8 is a reasonable compromise in a world of mutually-incompatible human
scripts.  It'd be Really Nice if the characters were the same width, but
that means weighing lots of 0-bytes in text versus freezing out anyone
whose languages aren't expressible in the 8-bit Latin encodings.  The old
school of bickering code-pages can remind us how that goes.

For its faults, UTF-8 and Unicode are _FAR_ better than their
predecessors.  There are plenty of email threads at my day job that
would be inexpressible in the older encodings because of the ways that
Big5, Shift-JIS, and CP-1252 collide.

Now, if the Unicode corporate folks could keep their politics and "Emoji"
out of it, that'd sure be nice.

> This is bad enough anywhere, but especially surprising on lists which
> are, like this one, populated with people who routinely use hardware and
> software older than a year or two.

Thompson and Pike were presenting talks on UTF-8 in the early-to-mid
1990s.  Even my crufty HP-UX 11.0 boxes have UTF-8 support (although not
my 10.20 daily driver, unfortunately).  Basic support should be a solved
problem.

-- 
Jonathan Patschke
Austin, TX
USA


More information about the rescue mailing list