Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsWhite Papers
Discussion GroupsFirst AidDatabasesJavaBeansGUIJava 3DVirtual MachineCORBASecurityToolsGeneral
Java DirectoryOpen Source ProjectsSample Book ChaptersUser GroupsWeb Resources
Related Topics
Databases.NETMore Topics ...

Java Forum / General / November 2007

Tip: Looking for answers? Try searching our database.

Convert string

Thread view: 
yaaros@gmail.com - 28 Nov 2007 15:24 GMT
Hi!!

I'd like to write a method that convert given string to string which
contains only english alphabet's letters. So when I give the methods
string with some special characters like from Polish alphabet ą, ę etc
I' d like to get the string where ą is replaced by ą etc. Is it
possible to write such a universal method ?? That will work for all
special charakters like ä, ą, ś, ü etc.

Thanks in advance
Yaaros
Joshua Cranmer - 28 Nov 2007 22:28 GMT
> Hi!!
>
[quoted text clipped - 7 lines]
> Thanks in advance
> Yaaros

What would be the correct output for the following characters?
Greek lowercase alpha
`fi' ligature
Japanese hiragana ka
Unified CJK ideograph for mountain

Technically speaking, any decidable problem can be solved in Java
(modulo certain native OS interactions), but are you looking for a
library that does this instead? Or how to write one yourself?

The Unicode normalization processes would probably be of great help as a
basis. <http://www.unicode.org/reports/tr15/>
Signature

Beware of bugs in the above code; I have only proved it correct, not
tried it. -- Donald E. Knuth

Roedy Green - 29 Nov 2007 03:56 GMT
>I'd like to write a method that convert given string to string which
>contains only english alphabet's letters. So when I give the methods
>string with some special characters like from Polish alphabet ?, ? etc
>I' d like to get the string where ? is replaced by ? etc. Is it
>possible to write such a universal method ?? That will work for all
>special charakters like ä, ?, ?, ü etc.

boolean english = 'a' <= c && c <= 'z' ||  'A' <= c && c <= 'Z';

Just do a loop going through your string composing a new one with a
StringBuilder consisting only of the chars you like.

You could also create an array indexeded by char number to what you
wanted to convert the character to, e.g. cvt[ 'à' ] -> 'a'. Then you
loop looking up each char.  Convert to 0 means leave out.

the Quoter Amanuensis contains many such tables.  See
http://mindprod.com/products1.html#QUOTER
Signature

Roedy Green Canadian Mind Products
The Java Glossary
http://mindprod.com

Patricia Shanahan - 29 Nov 2007 04:00 GMT
...
> You could also create an array indexeded by char number to what you
> wanted to convert the character to, e.g. cvt[ 'à' ] -> 'a'. Then you
> loop looking up each char.  Convert to 0 means leave out.
...

I suggest mapping to a String rather than a char, to allow for e.g. two
letter expansions.

Patricia
Roedy Green - 30 Nov 2007 11:54 GMT
>I suggest mapping to a String rather than a char, to allow for e.g. two
>letter expansions.

String also allows for 0-length transforms, to ignore a letter.
However, if you have no multi-char transforms the code will be  faster
and considerably more compact using chars.
Signature

Roedy Green Canadian Mind Products
The Java Glossary
http://mindprod.com

Daniel Pitts - 30 Nov 2007 19:59 GMT
>> I suggest mapping to a String rather than a char, to allow for e.g. two
>> letter expansions.
>
> String also allows for 0-length transforms, to ignore a letter.
> However, if you have no multi-char transforms the code will be  faster
> and considerably more compact using chars.
On the other hand, using String allows for codepoints that aren't single
characters.

Signature

Daniel Pitts' Tech Blog: <http://virtualinfinity.net/wordpress/>

Eric Sosman - 30 Nov 2007 23:25 GMT
>> I suggest mapping to a String rather than a char, to allow for e.g. two
>> letter expansions.
>
> String also allows for 0-length transforms, to ignore a letter.
> However, if you have no multi-char transforms the code will be  faster
> and considerably more compact using chars.

    ... which suggests a hybrid approach: A char[] array for the
one-to-one mappings, with a special value like '\u0000' meaning
"I don't know; check for exceptional cases."

    /* Untested, uncompiled, unscrutinized, un to the Nth: */

    static char[] translateTable = new char[65536];
    static { /* initialize it */ }

    static Map<Character,String> weirdCasesMap = ...;
    static { /* initialize it */ }

    String translate(String old) {
       StringBuilder buff = new StringBuilder();
       for (int n = old.length(), i = 0;  i < n;  ++i) {
           char oldc = old.charAt(i);
           char newc = translateTable[oldc];
           if (newc != 0) {
               buff.append(newc);
           }
           else {
               String news = weirdCasesMap.get(
                   Character.valueOf(oldc));
               if (news != null)  // allows for deletions
                   buff.append(news);
           }
       }
       return buff.toString();
    }

    If deletions (incoming characters that map to nothing in the
output) are common, consider using two special codes in translateTable:
one meaning "Check the map" and the other meaning "Ignore this."

Signature

Eric.Sosman@sun.com



Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.