Java Forum / General / February 2006
Email charset
Roedy Green - 02 Feb 2006 05:18 GMT When sending an email with Javamail, it is safe to use UTF-8 encoding. Is there anyone who can't read it today?
 Signature Canadian Mind Products, Roedy Green. http://mindprod.com Java custom programming, consulting and coaching.
Thomas Weidenfeller - 02 Feb 2006 08:32 GMT > When sending an email with Javamail, it is safe to use UTF-8 encoding. > Is there anyone who can't read it today? Just half a year ago I had to use a mail reader which could only deal with ASCII. It was on a remote site running some embedded OS. I was happy that I even had a command line mail tool.
My suggestion would be to use the least common denominator:
- If your mail's texts only contain characters which can be encoded in ASCII, use ASCII for the mail.
- If your mail's texts only contain characters which can be encoded within the range of Latin 1, use ISO-8859-1.
- Only if you have other characters, too, and if you don't want to go through the trouble of figuring out for each destination the most common encoding which still includes your characters, then use UTF-8.
/Thomas
 Signature The comp.lang.java.gui FAQ: ftp://ftp.cs.uu.nl/pub/NEWS.ANSWERS/computer-lang/java/gui/faq http://www.uni-giessen.de/faq/archiv/computer-lang.java.gui.faq/
Roedy Green - 02 Feb 2006 09:16 GMT On Thu, 02 Feb 2006 09:32:05 +0100, Thomas Weidenfeller <nobody@ericsson.invalid> wrote, quoted or indirectly quoted someone who said :
>- If your mail's texts only contain characters which can be encoded in >ASCII, use ASCII for the mail. [quoted text clipped - 5 lines] >through the trouble of figuring out for each destination the most common >encoding which still includes your characters, then use UTF-8. My generated email alerts will be in dozens of languages. The project I am working on in the Internationaliser which among other thing will generate email alert messages to the various administrators, translators, proofreaders and programmers all speaking a hodgepodge of languages. the Internationaliser itself will of course be internationalised up the yin yang.
See http://mindprod.com/projects/internationaliser.html
I guess I should play it safe and allow the Charset to be configurable, setting everyone's to UTF-8 default.
 Signature Canadian Mind Products, Roedy Green. http://mindprod.com Java custom programming, consulting and coaching.
Thomas Weidenfeller - 02 Feb 2006 09:33 GMT > I guess I should play it safe and allow the Charset to be > configurable, setting everyone's to UTF-8 default. I would set the default to ASCII. And when reading in a mail's text for distribution probably run a check if it is all 7-bit. If not, give a warning, error, or silently switch to UTF-8.
Or work with some kind of templates, instead of raw text input. Each text intended for distribution can contain not only the text, but e-mail style headers which your tool then uses to prepare the actual messages. E.g. instead of just having
bla bla, buy our void bla, great stuff now, cheap, bla
as input, you have stuff like
Content-Type: text/plain; charset=us-ascii Organistaion: Get poor quick! From: big-email@example.com
bla bla, buy our void bla, great stuff now, cheap, bla
in your input, and you copy that stuff to the outgoing mails, overriding any defaults. That way the people who contribute the language specific texts can set the "best" encoding for the text while providing it.
/Thomas
 Signature The comp.lang.java.gui FAQ: ftp://ftp.cs.uu.nl/pub/NEWS.ANSWERS/computer-lang/java/gui/faq http://www.uni-giessen.de/faq/archiv/computer-lang.java.gui.faq/
Roedy Green - 02 Feb 2006 11:12 GMT On Thu, 02 Feb 2006 10:33:39 +0100, Thomas Weidenfeller <nobody@ericsson.invalid> wrote, quoted or indirectly quoted someone who said :
>I would set the default to ASCII. And when reading in a mail's text for >distribution probably run a check if it is all 7-bit. If not, give a >warning, error, or silently switch to UTF-8. In my case a large proportion of the messages will not be in English. To start most will be in Latvian or Serbian.
 Signature Canadian Mind Products, Roedy Green. http://mindprod.com Java custom programming, consulting and coaching.
Roedy Green - 02 Feb 2006 11:20 GMT On Thu, 02 Feb 2006 10:33:39 +0100, Thomas Weidenfeller <nobody@ericsson.invalid> wrote, quoted or indirectly quoted someone who said :
>in your input, and you copy that stuff to the outgoing mails, overriding >any defaults. That way the people who contribute the language specific >texts can set the "best" encoding for the text while providing it. my program is a tool for internationalising Java apps, and logically it should use itself to internationalise itself including all the email alerts it generates. These are things like warnings to translators work is available.
Right now I have a record for each person with the email address, preferred locale (language/country/variant), preferred L&F, and preferred email encoding.
If you are curious what I am up to, see http://mindprod.com/projects/internnationaliser.html
I am implementing one of my own student projects.
 Signature Canadian Mind Products, Roedy Green. http://mindprod.com Java custom programming, consulting and coaching.
Oliver Wong - 03 Feb 2006 17:29 GMT > When sending an email with Javamail, it is safe to use UTF-8 encoding. > Is there anyone who can't read it today? There's always *someone*, *somewhere* who can't read *something*. If your program is multilingual (as you've mentioned later on in this thread), then your best bet is probably Unicode (and thus UTF-8). ASCII certainly won't cut it.
If I understand correctly, the users of your program are going to be application translators (or more generally, computer-savy linguists), so they should have software that can handle UTF-8. If not, you could accompany your software with tutorials on setting up a unicode-enabled system for the various popular OSes.
- Oliver
Roedy Green - 03 Feb 2006 23:18 GMT > If I understand correctly, the users of your program are going to be >application translators (or more generally, computer-savy linguists), so >they should have software that can handle UTF-8. If not, you could accompany >your software with tutorials on setting up a unicode-enabled system for the >various popular OSes. What makes this program more complicated than you might expect is that it deals with the interactions between people and simultaneous updates of everything. It is not just a resource bundle editor.
There are four classes of people my program deals with.
1. programmers: they write Java code that they (or their bosses) want interationalised/localised.
2. translators: people who can translate the programmer's language (usually English) into the target languages with national and other variants. Serbians have several dialects for example.
3. proofreaders. People who check the translators' work.
4. administrators. people who manage the projects, assigning work, checking up on progress, configuring.
A person could wear all four hats.
At this stage, I am primarily concerned with computer-generated emails to alert people something is ready for them, e.g. a work assignment is ready for a translator who may work at home part time.
see http://mindprod.com/projects/internationaliser.html
It may eventually be extended to create a simple person-to-person email that would deal with the problem of selecting a suitable encoding for the recipient's email program. It would be a can of worms if I tried to handle getting it translated as well.
At some point in my life I want to write something to replace regular email that deals in a serious way with spam, spoofing and unwanted enclosures.
see http://mindprod.com/projects/mailreadernewsreader.html
 Signature Canadian Mind Products, Roedy Green. http://mindprod.com Java custom programming, consulting and coaching.
Free MagazinesGet these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...
|
|
|