[rescue] OT: broken de-MIME-ers should be shot! ;-)

Greg A. Woods woods at weird.com
Thu Apr 18 14:20:04 CDT 2002


[ On Thursday, April 18, 2002 at 10:19:37 (-0400), Kurt Mosiejczuk wrote: ]
> Subject: Re: [rescue] OT: broken de-MIME-ers should be shot!  ;-)
>
> On Wed, 17 Apr 2002, Greg A. Woods wrote:
> > 
> > Who was the idiot who wrote that code anyway?  He or she had better have
> > a really good excuse, such as being completely blind or something....
> 
> I wonder if the author just looked quick at a table and saw a funny looking
> 'o', and thouht it was a zero.  Might have just made a simple mistake
> trying to translate.  If he was being really stupid his software would
> just drop alternate character sets completely.  At least he was putting
> some effort into this.  I suspect a bug report would probably get it fixed.
> Be nice though, Greg =)

Actually I must retract my complaint entirely now that we've seen what's
really been happening.  If you strip the 8-th bit off a degree symbol
encoded in ISO-8859-1 you do end up with an ASCII zero; and we can see
that's what happened now Bill's turned off the bit stripping "feature".

No doubt the author of demime either knew what to expect when the bits
were stripped, or never even contemplated what would happen.  It seems
we can "thank" the creators of ISO-8859 for choosing to encode a degree
symbol as a zero with the high-bit set.  :-)

> Is UTF-8 useless for Asian countries?  And is UTF-8 something that is mainly
> pushed by Win2k?  If so, I take back my comment about better handling it =)

It's not as widely supported by existing systems in Asian countries, and
there are lots of debates about various encoding problems between some
of those countries that use the same character glyphs but attach
different meanings to them (and thus require different collating, etc.)
Most existing systems use regionally standardised encodings, and those
that exchange files between countries learn to convert them....  At
least that's how I understand things, and what I've observed after
visiting a couple of those countries and working with IT people in those
locations.

If some consensus were possible to make Unicode truly amenable to those
competing needs it would be an ideal solution as it provides a way to
avoid having to deal with conversions -- everyting's all in one commonly
used encoding at all times.  Unfortunately to accomodate the conflicts
of encoding in the obvious way would require at least another bit
(i.e. 17 bits).

My first exposure to Unicode/ISO-10646 (with UTF-8 encoding) was Plan-9.
Win2K (and NT) supposedly supports it, and of course modern GUkI web
browsers can (if they can find appropriately encoded fonts on the local
system).  Emacs supports Unicode/UTF-8 too, but I haven't really
explored how well.  I can see most of the glyphs in the "hello" document
though.... :-)

There's a list of Unicode-compatible stuff here:

	http://www.unicode.org/unicode/onlinedat/products.html

-- 
								Greg A. Woods

+1 416 218-0098;  <gwoods at acm.org>;  <g.a.woods at ieee.org>;  <woods at robohack.ca>
Planix, Inc. <woods at planix.com>; VE3TCP; Secrets of the Weird <woods at weird.com>



More information about the rescue mailing list